On OpenStreetMap, it is possible to link Wikimedia Commons files using this tag: wikimedia_commons=File:xyz.jpg
Is there a way/tool to check for files whose coordinates differ significantly between Wikimedia Commons and OpenStreetMap? Such a tool could be used for quality assurance, as it may indicate either 1. that someone has linked the wrong image on OSM or 2. that the coordinates on Wikimedia Commons need to be corrected.
Those coordinates are not the same, in OSM we record the place where something is, while for photographs usually you add the position from where the picture is taken (but they also shouldn’t be thousands of kilometers away, I agree).
That’s why I specified “differ significantly .” You would need a really good camera to capture Narbonne from Paris! Of course this tool would eventually have a tolerance threshold.
Adding wikimedia_commons=* to an object that has wikidata=* can be considered redundant because Wikidata itself often links to Wikimedia Commons. The vast majority of less experienced (newbies and occasional) users however is way more likely to understand the concept and importance of Wikimedia Commons than Wikidata. Even for experienced users, the human-readable value of wikimedia_commons=* is easier to interpret.
I don’t think it’s useful to invest effort in maintaining and curating yet another redundant tag, but your mileage may vary.
The examples I made have a wikidata=* tag by chance, wikimedia_commons=* is used by lot of users for elements that do not have enough notability for wikidata entries (e.g. guidepost). Also I know most people wouldn’t be happy to mass delete wikimedia_commons=* to elements having wikidata=*, so it would be cool at least to detect and fix the wrong ones.
It’s only redundant if the object has a wikidata entry though? Is it likely that most wikimedia commons images used in OSM are linked from wikidata? E.g. how would an individual hiking guidepost get linked from wikidata?
Wikidata entries almost always have a link to corresponding Commons category, and often a selected photo from it. Those entries exist only for, let’s say, “notable” features; an ordinary hiking trail or a guidepost seldom has one. So ok, it makes sense to link an individual Commons image from an OSM object (similar as a Mapillary photo).
Taginfo, however, reveals that out of 211,000 objects tagged with wikimedia_commons 118,500 objects also have wikidata, presumably redundant. But fair enough, that still leaves some 100,000 which are missing wikidata, for one reason or another.
In theory, Sophox and QLever would be the perfect tools for the job. But it’s a bit more difficult than it’s supposed to be. I haven’t figured it out yet, but maybe this is enough information for you to get to the finish line.
Like Wikidata, Wikimedia Commons has its own SPARQL endpoint. For example, you can query for geotagged images based on the location of either the camera or the depicted object. This relies on each file’s description page to be tagged with structured data. In theory, you could write a federated query in Sophox that nests a Commons query inside an OSM query. Unfortunately, it’s currently difficult to federate the Commons endpoint with another SPARQL endpoint, because it requires an OAuth token, presumably to prevent abuse by image scrapers. If not for that limitation, this Sophox query might stand a chance of returning OSM elements joined with Commons coordinates.
QLever indexes both OSM and Wikimedia Commons in the same triplestore, which in theory would make it possible to join the two datasets more performantly without leaving QLever. This already works quite well for OSM–Wikidata and OpenHistoricalMap–Wikidata queries, because OSM and OHM data have been postprocessed to resolve wikidata=* and wikipedia=* tags to the referenced entities. Unfortunately, wikimedia_commons=* isn’t being resolved yet: