Is there support for removing all name tags on objects with a Wikidata QID present?

Hi, is there support in here for such a move at this time?

The advantage is that we outsource the huge burdon of keeping names updated to Wikidata whenever possible and concentrate on curating other kinds of data that we frankly do better than names.

Nominatim would have to be enhanced to support names from Wikidata.

WDYT?

1 Like

I am against. Wikidata items sometimes represent different (but strongly related) object than one in OpenStreetMap. Naming conventions in Wikidata and OpenStreetMap are different.
Many data consumers don’t support wikidata tag and it’s not trivial to add it. Name tag is one of basic tags and used in the majority of applications.

19 Likes

Ahhhh, just for starters: WD names are generally spoken, not up to any reasonable quality standard and very often simply invented.

7 Likes

Absolutely no, for start Wikidata is a licensing quagmire as far as database rights are concerned.

Community there basically pretends that problem does not exist, making their dataset problematic to use in UK and UE where sui generis database rights exist.

Despite presenting dataset as CC0 and not mentioning this issue.

(they are importing directly and indirectly databases covered by databases rights - which is entirely legal in USA but resulting work is unfree in countries where database rights exist)

Also, not sure about their data quality - but I tried to use part of Wikidata data and it was of really low quality ( User:Mateusz Konieczny/failing testcases - Wikidata ).

Also, editing experience in Wikidata is quite miserable and expecting people to have separate accounts in a separate service to maintain basic data is a terrible idea.

Also, OSM has many objects which would be not allowed to have an object in Wikidata.

15 Likes

Definitely no. Wikidata names are often descriptive. In Wikidata terms this is necessary to differentiate between objects which in OSM and the real world have the same name.

I came across an example of this last week. A mapper had added names in multiple languages to Baker Street Tube Station which include Tube Station. The Wikidata/Wikipedia name obviously has to be different to the page for Baker Street.

If we did this one effect would be every station would gain the word station in the name.

Phil (trigpoint)

10 Likes

Thanks for all the good arguments. :grinning:

3 Likes

While combining multiple open data sources is great, I don’t think that such an important tag should be outsourced. The name of elements is arguably their most important attribute, and it fits very well with being stored on a map database.

Relying on Wikidata to resolve names would mean that almost every use of OSM data would also have to query the Wikidata API, making things much more complex.

6 Likes

As valuable as Wikidata linking can be, names are so central to important OSM workflows that it would be very disruptive for OSM to get out of the naming business. Most communities haven’t even felt comfortable enough to delete ref tags from roadways in favor of route relations. On the other hand, some ancillary keys like brand:wikidata and flag:wikidata are less entrenched in OSM workflows, so there could be somewhat less attachment to their non-Wikidata variants.

This technical limitation is specific to Wikipedia, but it does not apply to Wikidata. Wikidata only requires any two items’ descriptions to differ if their labels match. Anyways, technically labels are just that – convenient labels – whereas the most proper representation of an item’s name is a name ( P2561), native name (P1705), or similar statement.

Last week, I spent some time mechanically stripping out Wikipedia-style disambiguating suffixes from Vietnamese labels of tens of thousands of Wikidata items about cities and towns, mostly in North America and Czechia, intentionally causing thousands of items’ labels to become identical to other items’ labels. For example, this Minnesota township’s label shortened from “Xã Arthur, Quận Kanabec” to just “Xã Arthur”, like another township elsewhere in the state, with aliases and descriptions to help users differentiate. This should significantly clean up the map for Vietnamese users of OpenMapTiles, Mapbox, and Google Maps, each of which relies on Wikidata for place labels in various languages.

6 Likes

While I appreciate linking OSM objects to Wikidata objects, I think that OSM should not depend on an external source. One can use the Wikidata attribute to compare the names in OSM and Wikidata and then perform manual corrections. But we should surely not remove the name from OSM database, and “if the renderer wants to display the name it can look it up from Wikidata.”

5 Likes

Searching for the name of object X in language Y in a list of 300+ tags is more miserable. Wikidata shows a few names at most and they are in languages that you’ve listed beforehand.

A better variant would be to use Wikidata only for names of countries, states, regions, counties, municipalities, cities, towns and villages. These objects have Wikidata items for most countries.

That’s a problem that should be brought up to the respective Wikidata community. They have a ton of tools on their disposal which can solve the problem in a few minutes.

We can have a bot that handles name changes. Names will still be in the OSM database, but they will come from Wikidata.

1 Like

Note that you omitted at least problems with licensing and with basing fundamental data on source with own community, rules, moderators and so on.

Identifying problems is easy, solving them is harder.

See User:Mateusz Konieczny/failing testcases - Wikidata I was singlehandedly able to find more obvious ontological issues than entire Wikidata community can fix. I posted this listing to Ontology Wikiproject and some reports were solved but now what I reported waits for weeks, posted in 2022 to project chat central page, posted again in 2023. Many were fixed, but far more remain unfixed and what I posted is just a small sample of what is directly affecting one specific tool.

Constraint violation listings are having massive listing.

Cebuano duplicates are still not cleaned up.

Can be done also for OSM.

3 Likes

Another fundamental issue with OSM relying on wikidata is that very often one data item refers to multiple physical concepts - wikidata has one item for the “village and civil parish” (because it’s just come straight from wikipedia**, presumably) but OSM has two, because we care about the difference between a “village” and an “admin boundary”.

** which due to the licence there may make using the data complicated

3 Likes

Compared to OSM, resolving dual tagging in Wikidata is uncontroversial, even when automated. It’s often easier than the ontological issues @Mateusz_Konieczny has been flagging and would fix many constraint validation warnings at the same time. The only catch is that you may need to link the two concepts together using a property such as different from (P1889) to prevent someone unaware of the distinction from merging the items back together. This is especially a risk with items that speakers of other languages may come across, since concepts such as “village” and “civil parish” may be translated into their language as very similar terms.

Rather than taking for granted these temporary data modeling issues as “fundamental issues”, I think it’s helpful to frame it in terms of prerequisites. Before we can even consider making OSM rely more heavily on Wikidata for names, we would need a reliable way to detect dual tagging, Cebuano duplicate, and mistaken wikidata issues, and we’d need some percentage of them to be fixed. It’s premature to consider licensing and other logistics before the data is even suitable as a full replacement for OSM names.

But I would reiterate that it isn’t necessary to formally replace OSM names with Wikidata labels, because data consumers are free to use both as complementary sources. Most of the major consumer-oriented data consumers already do. We can take all the pride we want in our own name tags, but nothing in our license obliges data consumers to use OSM exclusively for names.

I would suggest that cleaning up Wikidata is worthwhile regardless of any policy change. Correcting issues there automatically improves the end user experience of OSM-based maps. A better user experience casts a better light on the attribution that we require but Wikidata does not.

3 Likes

though it is better to mention it early rather than spring it on potential people who would solve other issue and then be surprised by a new one :slight_smile:

And for such data consumers all this data improvements on Wikidata side would be useful already.

2 Likes

This is very very complex, and may underestimate the effort required.

I have some practical experience, as I am involved in
the Natural-Earth-Vector “~Adding Wikidata concordances and names” project
just search for the “Wikidata” in the CHANGELOG

some comments:

  • NaturalEarth importing wikidata labels as a name_<language_code> and has some complex business logic for converting geo-names.

    • removing commas
    • remove last “市”“City” character
    • remove last “주”“State” character

    • But regardless, you should always review the results manually and incorporate the experience into the code. It will never be perfect.
  • Wikidata geo problems - No strict import policy and the impact of the duplicated Cebuano geo-data is still felt today, where there was an active local wikipedia/wikidata community it has been largely fixed, but everywhere else it is still problematic.
    My own experience is that OpenStreetMap geodata is of much better quality than Wikidata.

    • 2017: “Nonsense imported from Geonames” “Thanks to the bot filling the Cebuano Wikipedia (Q837615) with all the items in geonames, and the bot importing all the pages from ceb to here, we now have a lot of nonsense items here. Just my latest picks from looking around Thailand items …”

    • 2017: “Dealing with our second planet” “The bot on the cebuano Wikipedia is quickly building a second planet on Wikidata with all the geographical items that it is currently duplicating at high speed.”

  • the Natural-Earth-Vectordata ↔ Wikidata matching/concordances is still not 100%

My Summary:

While I understand that it might be important for many,
it would be better to focus on osm <->wikidata_id pairing/matching as a first step and develop some related data quality tools.

3 Likes

Some of this postprocessing is just a workaround for common Wikidata label cruft, mostly left over from Wikipedia, where technical limitations and naming conventions require disambiguators. It’s better to clean up issues like parentheticals at the source than apply quick fixes.

On the other hand, some postprocessing is presentational and specific to map labeling. For example, “New York City”, “Washington, D.C.”, and “Santiago de Chile” are preferred in many written and spoken contexts. But on maps, where space is limited and you already have spatial context, “New York”, “Washington”, and “Santiago” are preferred and the longer forms can be jarring. I’m confident Wikidata would be able to model these map-specific distinctions elegantly if necessary, but it’s quite reasonable for a data consumer to apply these shortenings systematically.

I was unaware that CJK maps omit the place type suffix that’s ubiquitous in East Asian place names. In many Western languages, maps conventionally omit “boring” street type prefixes or suffixes, such as “Road” in English or “Calle” in Spanish. Most OSM communities are including these words in name anyways (with the notable exception of Vietnam). Unfortunately, OSM’s road names are largely unstructured, so any postprocessing heuristics will have some false positives.

2 Likes

Maybe it would be a fitting moment to advertise https://matkoniecz.github.io/OSM-wikipedia-tag-validator-reports/ listings that I am maintaining. If someone is interested is wikidata/wikipedia linking it has a decent chance to be useful.

2 Likes