Wikipedia tag validator - listing problems in various areas (also cases not detected by any other QA)

I created https://matkoniecz.github.io/OSM-wikipedia-tag-validator-reports/ that lists some problematic wikipedia / wikidata / subject:wikidata / etc tags.

For example wikipedia tags linking nonexisting articles or entries about humans (that typically are invalid, sometimes retaggable to subject:wikipedia) and so on.

If you are interested in specific area - let me know by sending a message or posting here and I will add it (right now report for Belarus is being generated)

If any report is bogus, false positive, invalid or otherwise problematic: also please let me know!

1 Like

how is transliteration supposed to be handled?

E.g. https://matkoniecz.github.io/OSM-wikipedia-tag-validator-reports/Crna%20Gora%20-%20Црна%20Гора%20(Montenegro,%20Czarnogóra).html lists for example this (and others) entry:

wikidata and wikipedia tags link to a different objects (Q200733 vs (missing) wikidata id assigned to linked Wikipedia article)
Nikšić - an affected OSM element that may be improved

Nikšić - Wikidata links wikipedia pages in both Cyrillic and Latin alphabets.

Linked node has wikipedia=sr:Nikšić which translates to https://sr.wikipedia.org/wiki/Nikšić which actually works in web browser (but redirects automatically to preferred Serbian wiki page in Cyrillic transliteration https://sr.wikipedia.org/wiki/%D0%9D%D0%B8%D0%BA%D1%88%D0%B8%D1%9B, e.g. Никшић — Википедија)

1 Like

hmmmm

https://sr.wikipedia.org/wiki/sr:Nikšić?uselang=en is broken and Nikšić redirects to Никшић — Википедија

So I would say that correct would be wikipedia=sr:Никшић given that it is the redirect target?

But I am not entirely sure what is going on here, maybe Wikipedia is being buggy. And I am not familiar with this specific language situation.

It would be best if someone from those communities might jump in, but as far as I recall, they actually use both Cyrillic and Latin scripts, but with former being preferred (but I could be wrong).

I have no idea how Wikipedia handles the issue, though, apart from the fact that web page loads, after Page and Discussion links there is a dropdown selector to choose preferred transliteration (which changes &variant=sr to variant=sr-el or variant=sr-ec in URL).

But if human readable web interfaces automatically redirects to transliterated page, shouldn’t it be possible for your script to detect that redirection too and handle it?

Yes, but apparently neither https://sr.wikipedia.org/wiki/sr:Nikšić?uselang=en nor API redirects. I can skip this specific report in this specific area until I figure out what is going on (I will likely wait until other issues are fixed there before investigating)

Really curious.

Indeed https://sr.wikipedia.org/wiki/sr:Nikšić?uselang=en does not autoredirect. However, when you remove that ?uselang=en then it redirects.

yes, exactly

the bigger trouble is that machine-speaking API is apparently also not redirecting

EDIT: asked on Telegram: Contact @wmhack

How come the report for Germany only contains problems in Mecklenburg-Vorpommern?

Only this region (and Hamburg) are enabled. I will enable more once reports for this one is empty or someone expresses interest in other specific region. Are you interested?

It is done this way as architecture of processing is still not very smart (but got significantly improved recently) and I am running this on my laptop.

I’d be interested in fixing these for Schleswig-Holstein.

1 Like

@Discostu36

https://matkoniecz.github.io/OSM-wikipedia-tag-validator-reports/ now has also reports for Schleswig-Holstein

1 Like

212 problems :scream:

Maybe you could tweak the wikipedia wikidata missmatch validator. See this item for example. It links to a specific chapter of a wikipedia page. Wikidata does not allow linking items to anchors, but I don‘t see a problem from the OSM perspective. I think you should exclude links with a # from the list.

1 Like

But it allows linking redirects pointing to anchors, see this.

Still, this kind of tagging is tricky. I will look into it again, I am already handling wikipedia tags with # specially in few places.

But in this specific case - why this wikipedia page is not linked?

Ok, this could be a solution, but I am not sure this is possible / follows Wikipedia redirect rules in all possible cases.

Oh, I missed that, you are right, this is not the best example.

Maybe you could tweak the wikipedia wikidata missmatch validator. See this item for example. It links to a specific chapter of a wikipedia page. Wikidata does not allow linking items to anchors, but I don‘t see a problem from the OSM perspective. I think you should exclude links with a # from the list.

agreed, in general we should try
to avoid the perception that the wikidata tag has to correspond to the wikidata of the wikipedia tag, this was true only at the starting point of wikidata.

How you put it it sounds like you suggest that the validator should be disabled completely. I wouldn’t agree with that.