Prefixes for wikipedia and wikidata keys

The list of prefixes for wikipediais

  • architect:wikipedia=*

  • artist:wikipedia=*

  • brand:wikipedia=*

  • buried:wikipedia=*

  • name:etymology:wikipedia=*

  • network:wikipedia=*

  • operator:wikipedia=*

  • species:wikipedia=*

  • subject:wikipedia=*

  • taxon:wikipedia=*

and for wikidatais:

  • architect:wikidata=*

  • artist:wikidata=*

  • building:wikidata=*

  • buried:wikidata=*

  • brand:wikidata=*

  • flag:wikidata=*

  • genus:wikidata=*

  • manufacturer:wikidata=*

  • model:wikidata=*

  • name:etymology:wikidata=*

  • network:wikidata=*

  • notable_tenant:wikidata=*

  • operator:wikidata=*

  • royal_cypher:wikidata=*

  • species:wikidata=*

  • subject:wikidata=*

  • taxon:wikidata=*

As you can see, there are more prefixes for wikidata than wikipedia, but I think they should be the same prefixes. So the following elements should be removed or added:

  • building

  • brand

  • flag

  • genus

  • manufacturer

  • model

  • notable_tenant

  • royal_cypher

<svg width="800" height="800" xmlns="http://www.w3.org/2000/svg">
  <style>
    text { font-family: Arial, sans-serif; }
  </style>

  <!-- Texto central -->
  <text x="400" y="360" text-anchor="middle" font-size="52" font-weight="bold" fill="#0057B8">wikidata=*</text>
  <text x="400" y="420" text-anchor="middle" font-size="52" font-weight="bold" fill="#0057B8">wikipedia=*</text>

  <!-- Órbita externa -->
  <g font-size="24" fill="#333">
    <text x="400" y="100" text-anchor="middle">flag</text>
    <text x="600" y="160" text-anchor="middle">notable_tenant</text>
    <text x="680" y="300" text-anchor="middle">name:etymology</text>
    <text x="680" y="500" text-anchor="middle">manufacturer</text>
    <text x="600" y="640" text-anchor="middle">royal_cypher</text>
    <text x="400" y="700" text-anchor="middle">model</text>
    <text x="200" y="640" text-anchor="middle">operator</text>
    <text x="120" y="500" text-anchor="middle">species</text>
    <text x="120" y="300" text-anchor="middle">subject</text>
    <text x="200" y="160" text-anchor="middle">taxon</text>
  </g>

  <!-- Órbita interna -->
  <g font-size="22" fill="#333">
    <text x="400" y="180" text-anchor="middle">buried</text>
    <text x="520" y="220" text-anchor="middle">artist</text>
    <text x="580" y="400" text-anchor="middle">building</text>
    <text x="520" y="580" text-anchor="middle">brand</text>
    <text x="400" y="620" text-anchor="middle">genus</text>
    <text x="280" y="580" text-anchor="middle">network</text>
    <text x="220" y="400" text-anchor="middle">architect</text>
  </g>
</svg>

Why do you think they should be the same?

Are you aware that some things are in Wikidata but not in Wikipedia?

10 Likes

building:wikidata / building:wikipedia indicates bad tagging, that someone merged building object with something else in violation of One feature, one OSM element - OpenStreetMap Wiki

similarly building:name indicates bad tagging that should be fixed

3 Likes

Do you mean “removed or added in the wiki”, rather than in the actual OSM data?

I’m not sure what you are suggesting for “brand” - both forms are in use and documented.

2 Likes

I kind of agree, but I’d describe it as a likely case of dual tagging instead. We have historically tolerated dual tagging to some extent, especially buildings with their sole occupants. Personally, I draw the line at when the building has a separate identity somehow, such as a different name or Wikidata item.

From a purist standpoint, the occupant should always be extracted into an area coextensive with the building, but many mappers (especially JOSM users) really don’t like overlapping features. But there are other precedents following the same syntax, such as bridge:name=*. Rather, the problem with building:name=* or building:wikipedia=* is more that the feature is probably tagged building=yes but all its other usual tags refer to the occupant. One would think the name=* and wikipedia=* would be for the building while shop:name=* or whatever would be for the occupant. But at that point it would be easier to just separate out a second feature for the tenant.

3 Likes

It is quite widely accepted (though I still personally consider it as a bad tagging) where you have single shop in a single building without name or strong identify.

But when you have multiple POI in one building, or building has own name or own wikipedia page…

Then stuff like building:name poi:name building:wikidata shop:wikidata are just more troublesome in mapping than proper separation. To say nothing about support among data consumers.

2 Likes

right, it is an accepted shortcut for simple (read: rough) mapping, but when you start requiring prefixes for disambiguation the time has come to divide objects and features properly.

6 Likes

The OSM wiki articles specify suffixes for both wikipedia and wikidata keys, and in principle those suffixes should be consistent across both.

For example, with flag, why should an OSM element only allow flag:wikidata=Q### but not the corresponding flag:wikipedia=XXXX? My point is simply that there are more suffixes defined for wikidata than for wikipedia, and that inconsistency is confusing.

Those extra wikidata suffixes are not very clear, and what I am asking here is whether we could consider removing them. As you already showed with your examples, they don’t seem well justified.

Screenshot of the wikipedia article:

Screenshot of the wikidata article

Has anyone said the latter should not be allowed?

Maybe all that is required is to be clearer in the wiki that the lists are common examples and other values are possible? In the case of flag: the wikipedia version is far less commonly used so maybe not necessary to list it.

Which ones specifically? I agree there is a good case for not listing building:wikidata(only 500 uses) but I’m not sure about the others.

3 Likes

flag:wikidata=* is used to avoid any ambiguity as to what flag is flying for data consumers who care, and flag:wikipedia=* is, quite simply, unnecessary

I don’t think flag:wikipedia=* is disallowed (if anything else, due to Any Tags You Like, and there hasn’t been a proposal to deprecate it), it’s simply not featured in the table because it’s usually considered unnecessary and no one has bothered to write an OSM wiki page for it and add it to the OSM wiki table

Similarly, brand:wikidata=* is used to avoid ambiguity among similarly named brands, and brand:wikipedia=* is unnecessary because it either duplicates brand=* or, in cases where Wikipedia’s article title isn’t the same as OSM’s brand text, it duplicates data from brand:wikidata=* plus Wikidata lookup

Generally, :wikipedia=* tags have been seeing reduced use as :wikidata=* tags became more popular. This is for a couple of reasons:

  • some things have Wikidata items but not Wikipedia articles (e.g. Williams Fresh Cafe - Wikidata or 190 St. George Street - Wikidata) and that will always be the case due to differing inclusion criteria between Wikidata and Wikipedia. So in some cases in OSM we’ll be able to tag :wikidata=* but not :wikipedia=*, so some OSM data consumers will want to consume Wikidata information, so then :wikipedia=* is not strictly necessary
  • easier support for multilingual items, e.g. stores might have brand=Staples or brand=Bureau en Gros, and brand:wikipedia=* would then also be different (=en:Staples Canada and there’s no French Wikipedia article), but brand:wikidata=Q17149420 is the same in both cases, allowing a simpler search for all Bureau en Gros/Staples locations. Similar deal for Postes Canada/Canada Post and many other examples
12 Likes

As a first attempt, but not as an ideal solution. I think it is generally fine until you start adding more information, at which point it is highly recommendable to divide the entities properly.

I don’t see the issue with wikipedia for Staples Canada, couldn’t you add brand:wikipedia=en:Staples Canada and have the same information?
And isn’t Staples Canada the same brand as this: Staples Inc. - Wikidata
but a different operator?

Key (prefix) Wikidata Wikipedia
building building:wikidata – 123 values, 514 objects building:wikipedia – 42 values, 61 objects
brand brand:wikidata – 19 269 values, 2 672 354 objects brand:wikipedia – 11 006 values, 903 974 objects
flag flag:wikidata – 1 492 values, 38 406 objects flag:wikipedia – 688 values, 1 780 objects
genus genus:wikidata – 236 values, 64 084 objects genus:wikipedia – 112 values, 5 004 objects
manufacturer manufacturer:wikidata – 393 values, 42 043 objects manufacturer:wikipedia – 68 values, 268 objects
model model:wikidata – 864 values, 5 101 objects model:wikipedia – 557 values, 1 215 objects
notable_tenant notable_tenant:wikidata – 83 values, 108 objects notable_tenant:wikipedia – 2 values, 2 objects
royal_cypher royal_cypher:wikidata – 11 values, 48 325 objects royal_cypher:wikipedia – 0 values, 0 objects

The Wikidata prefixes are currently being used as if they were Wikipedia prefixes. In that case, should we extend the OSM wiki table for Wikipedia articles to include them? Or should the Wikidata prefixes in the OSM wiki be considered invalid?

(I would like to give you some context: I am going to present at a Wikimedia conference about the points of intersection between Wikimedia and OpenStreetMap. That is why I want to make clear how a Wikimedian can contribute to OSM, by providing a complete and valid list of OSM tags.)

I don’t follow how the numbers in that table relate to the links.

E.g. the tables says only 2 values on 2 objects for genus - but following the link I see 236 values on 64084 objects. Are we talking about different things?

1 Like

Cardinality vs unique values.

BTW, I was also confused.

I’m sorry, I don’t understand.

Which 2 values and which 2 objects are you referring to in the table?

I’m asking because you are talking about considering prefixes invalid. Maybe some of the low-usage ones could be considered “inadvisable” if not exactly invalid, and your table seems to imply that many of these are low-usage. But this and several of the others are in fact quite popular. I can’t see any grounds for calling them invalid.

1 Like

validity is not directly correlated to popularity - for example royal_cypher:wikipedia makes far more sense than building:wikipedia despite being used far less

where you got this numbers? notable_tenant:wikipedia | Keys | OpenStreetMap Taginfo shows two usages worldwide

I would rather show popular keys without issues, it is very rare that you truly want complete list of OSM tags

almost always you have some nonsense debris

see say surface | Keys | OpenStreetMap Taginfo with gems such as surface=school or surface=naturbelassene_oberflächemit_auto_tei

1 Like

My initial question is about the OSM Wiki articles, which is the guide for mappers.

What we should do with the discrepancy between wikipedia and wikidata prefixes? include them in wikipedia or delete them from wikidata.

When explaining this to another community, and showing the differences is a bad starting point for us.

Of course, I know that kind of gems, in the OSM data, but I am talking about the OSM wiki.

These are the list of prefix discrepancies:

  • building - bad tagging (Mateusz), dual taggin (Minh) then remove from wikidata?
  • brand - avoid any ambiguity (Jarek), include in wikipedia?
  • flag - wikipedia not necessary (Jarek), keep the discrepancy between articles?
  • genus - ?
  • manufacturer - ?
  • model - ?
  • notable_tenant - ?
  • royal_cypher - makes sense (Mateusz), include in Wikipedia?