I am against. Wikidata items sometimes represent different (but strongly related) object than one in OpenStreetMap. Naming conventions in Wikidata and OpenStreetMap are different.
Many data consumers don’t support wikidata tag and it’s not trivial to add it. Name tag is one of basic tags and used in the majority of applications.
Definitely no. Wikidata names are often descriptive. In Wikidata terms this is necessary to differentiate between objects which in OSM and the real world have the same name.
I came across an example of this last week. A mapper had added names in multiple languages to Baker Street Tube Station which include Tube Station. The Wikidata/Wikipedia name obviously has to be different to the page for Baker Street.
If we did this one effect would be every station would gain the word station in the name.
While combining multiple open data sources is great, I don’t think that such an important tag should be outsourced. The name of elements is arguably their most important attribute, and it fits very well with being stored on a map database.
Relying on Wikidata to resolve names would mean that almost every use of OSM data would also have to query the Wikidata API, making things much more complex.
As valuable as Wikidata linking can be, names are so central to important OSM workflows that it would be very disruptive for OSM to get out of the naming business. Most communities haven’t even felt comfortable enough to delete ref tags from roadways in favor of route relations. On the other hand, some ancillary keys like brand:wikidata and flag:wikidata are less entrenched in OSM workflows, so there could be somewhat less attachment to their non-Wikidata variants.
This technical limitation is specific to Wikipedia, but it does not apply to Wikidata. Wikidata only requires any two items’ descriptions to differ if their labels match. Anyways, technically labels are just that – convenient labels – whereas the most proper representation of an item’s name is a name ( P2561), native name (P1705), or similar statement.
Last week, I spent some time mechanically stripping out Wikipedia-style disambiguating suffixes from Vietnamese labels of tens of thousands of Wikidata items about cities and towns, mostly in North America and Czechia, intentionally causing thousands of items’ labels to become identical to other items’ labels. For example, this Minnesota township’s label shortened from “Xã Arthur, Quận Kanabec” to just “Xã Arthur”, like another township elsewhere in the state, with aliases and descriptions to help users differentiate. This should significantly clean up the map for Vietnamese users of OpenMapTiles, Mapbox, and Google Maps, each of which relies on Wikidata for place labels in various languages.
While I appreciate linking OSM objects to Wikidata objects, I think that OSM should not depend on an external source. One can use the Wikidata attribute to compare the names in OSM and Wikidata and then perform manual corrections. But we should surely not remove the name from OSM database, and “if the renderer wants to display the name it can look it up from Wikidata.”
Note that you omitted at least problems with licensing and with basing fundamental data on source with own community, rules, moderators and so on.
Identifying problems is easy, solving them is harder.
See User:Mateusz Konieczny/failing testcases - Wikidata I was singlehandedly able to find more obvious ontological issues than entire Wikidata community can fix. I posted this listing to Ontology Wikiproject and some reports were solved but now what I reported waits for weeks, posted in 2022 to project chat central page, posted again in 2023. Many were fixed, but far more remain unfixed and what I posted is just a small sample of what is directly affecting one specific tool.
Constraint violation listings are having massive listing.
Another fundamental issue with OSM relying on wikidata is that very often one data item refers to multiple physicalconcepts - wikidata has one item for the “village and civil parish” (because it’s just come straight from wikipedia**, presumably) but OSM has two, because we care about the difference between a “village” and an “admin boundary”.
** which due to the licence there may make using the data complicated
Compared to OSM, resolving dual tagging in Wikidata is uncontroversial, even when automated. It’s often easier than the ontological issues @Mateusz_Konieczny has been flagging and would fix many constraint validation warnings at the same time. The only catch is that you may need to link the two concepts together using a property such as different from (P1889) to prevent someone unaware of the distinction from merging the items back together. This is especially a risk with items that speakers of other languages may come across, since concepts such as “village” and “civil parish” may be translated into their language as very similar terms.
Rather than taking for granted these temporary data modeling issues as “fundamental issues”, I think it’s helpful to frame it in terms of prerequisites. Before we can even consider making OSM rely more heavily on Wikidata for names, we would need a reliable way to detect dual tagging, Cebuano duplicate, and mistaken wikidata issues, and we’d need some percentage of them to be fixed. It’s premature to consider licensing and other logistics before the data is even suitable as a full replacement for OSM names.
But I would reiterate that it isn’t necessary to formally replace OSM names with Wikidata labels, because data consumers are free to use both as complementary sources. Most of the major consumer-oriented data consumers already do. We can take all the pride we want in our own name tags, but nothing in our license obliges data consumers to use OSM exclusively for names.
I would suggest that cleaning up Wikidata is worthwhile regardless of any policy change. Correcting issues there automatically improves the end user experience of OSM-based maps. A better user experience casts a better light on the attribution that we require but Wikidata does not.
This is very very complex, and may underestimate the effort required.
I have some practical experience, as I am involved in
the Natural-Earth-Vector “~Adding Wikidata concordances and names” project
just search for the “Wikidata” in the CHANGELOG
NaturalEarth importing wikidata labels as a name_<language_code> and has some complex business logic for converting geo-names.
remove last “市”“City” character
remove last “주”“State” character
But regardless, you should always review the results manually and incorporate the experience into the code. It will never be perfect.
Wikidata geo problems - No strict import policy and the impact of the duplicated Cebuano geo-data is still felt today, where there was an active local wikipedia/wikidata community it has been largely fixed, but everywhere else it is still problematic.
My own experience is that OpenStreetMap geodata is of much better quality than Wikidata.
2017: “Nonsense imported from Geonames” “Thanks to the bot filling the Cebuano Wikipedia (Q837615) with all the items in geonames, and the bot importing all the pages from ceb to here, we now have a lot of nonsense items here. Just my latest picks from looking around Thailand items …”
2017: “Dealing with our second planet”“The bot on the cebuano Wikipedia is quickly building a second planet on Wikidata with all the geographical items that it is currently duplicating at high speed.”
the Natural-Earth-Vectordata <-> Wikidata matching/concordances is still not 100%
While I understand that it might be important for many,
it would be better to focus on osm <->wikidata_id pairing/matching as a first step and develop some related data quality tools.
Some of this postprocessing is just a workaround for common Wikidata label cruft, mostly left over from Wikipedia, where technical limitations and naming conventions require disambiguators. It’s better to clean up issues like parentheticals at the source than apply quick fixes.
On the other hand, some postprocessing is presentational and specific to map labeling. For example, “New York City”, “Washington, D.C.”, and “Santiago de Chile” are preferred in many written and spoken contexts. But on maps, where space is limited and you already have spatial context, “New York”, “Washington”, and “Santiago” are preferred and the longer forms can be jarring. I’m confident Wikidata would be able to model these map-specific distinctions elegantly if necessary, but it’s quite reasonable for a data consumer to apply these shortenings systematically.
I was unaware that CJK maps omit the place type suffix that’s ubiquitous in East Asian place names. In many Western languages, maps conventionally omit “boring” street type prefixes or suffixes, such as “Road” in English or “Calle” in Spanish. Most OSM communities are including these words in name anyways (with the notable exception of Vietnam). Unfortunately, OSM’s road names are largely unstructured, so any postprocessing heuristics will have some false positives.