TLDR: researching ways to validate wikipedia and wikidata tags, wrote a
script to cross-check OSM and Wikidata, found many incorrect disambig
references, would love to start community discussion on best guidelines
I have been analyzing the quality of OSM’s wikipedia and wikidata tags by
cross-checking data using both OSM tags and Wikidata. My first goal is to
fix “disambiguation” references - when OSM object links to the Wikipedia
disambiguation page, instead of the real location page. I have already
fixed about 200 objects, but there are about 800+ relations left, and I
could really use some help. I don’t think its possible to add them to
MapRoulette just yet. https://www.mediawiki.org/wiki/User:Yurik/OSM_disambigs
Lastly, if you have any suggestions on different ways to validate data
using the mixture of Wikidata and OSM, let me know. At the moment I have a
list of all types of OSM objects’ wikidata IDs, and mark the bad ones with
a value. If OSM’s wikidata’s “instance of” of one of the bad types, my
script puts those OSM objects it into a separate list that I can analyze.
The list of types is here - sort by the second column: https://commons.wikimedia.org/wiki/Data:Sandbox/Yurik/OSM_object_instanceofs.tab
Feel free to modify the second value of any row to indicate that those
objects should be fixed.
In Wikipedia, on the left hand side, click “Element Wikidanych” (Wikidata item). Also, if you use JOSM, there is Wikipedia plugin that will fetch all wikidata IDs for the current elements if wikipedia tag is set. Also, in iD editor, if you add wikipedia field (not tag!), it will auto-add wikidata field.
Just thinking… Couldn’t it automated the other way round?
When you have a place and a disambiguation page, bot could browse corresponding wikidata pages. They contain coodinates that can be compared to coordinates of a place and if only one wikidata entry is placed within reasonable radius from place on OSM, you’ve got your wikidata link for a place.
There still would be places where no corresponding wikidata would be found, or some with more than one, but there would be less than 5% of manual job left.
Not really worth it for the disambig case - there are now less than 100 disambig links left (out of the initial 1500+ that accumulated ever since wikipedia tag was added), and with the wikidata tag present, they won’t happen very often. On the other hand, there are thousands of links to “lists”, and those most likely will need to be fixed, likely by hand, by replacing them with something like “wikipedia:partof:…” tag, or possibly finding better wikipedia articles. Also, the coordinates are frequently not there on wikidata, which adds to the confusion. But yes, I agree that this process should be automated more.