Bulk edit announcement - addition of wikipedia:en/he values

I’m planning on adding wikipedia:en and wikipedia:he values to existing OSM entities.
The following sheets (it has two tabs, one for Hebrew and one for English) contains the expected output of this bulk edit.
I have allowed reviewing it before I do the bulk edit:

These additions are the result of a merge algorithm I have wrote to find places in OSM that has the same name as wikipedia and are relatively close to one another.
This applies only to wikipedia entities that has a location.

Please use this thread and the linked sheets to post your comments.
I’ll also reference this topic in our Telegram channel.

Hi, sorry to jump in as a non-local, but have you seen this discussion? It seems like it’s going in the other direction.

Yes, it’s complimentary to this bulk edit.

Have you considered using OSM ↔ Wikidata matcher - OpenStreetMap Wiki instead? It seems that it will allow much more efficient review of matches and has some tools to skip spurious matches.

Also, are you sure about adding old style wikipedia tags? I would expect wikipedia=en:??? or wikipedia=he:??? tags.

Have you considered adding also wikidata tags?

1 Like

Also, if wikipedia=he: tags are desirable to be added to say Way: ‪גן זואולוגי באר שבע‬ (‪117396580‬) | OpenStreetMap - then it would be much easier to add them based on existing wikidata tags.

In fact I run such bot edit in Poland and if desired I can easily run it in Israel.

I was not aware of the tool, but the docs state that it doesn’t support nodes, which are part of this bulk edit.
Also I’ll need to work hard to get to the results I currently have, which feels counter productive.
Currently, the main goal, from my point of view, is to bring closer the description and images of wikipedia to osm elements, which is my main interest.
wikidata is out of scope of this bulk edit, as it creates an extra layer of indirect reference, much like adding a site containing an image and not the link to the image itself, and it’s a lot less helpful to my needs.

If anyone is interested in improving the wikidata situation, feel free to do so, I certainly won’t object.
If someone would like to validate that the wikidata tag is equal to the wikipedia tag that a valid effort too, and if there’s a bot that checks it I won’t mind going over its results and fixing the relevant osm elements in Israel. But again, out of scope for this bulk edit.

Technically that’s what a Wikipedia tag is. :grin: Most images on Wikipedia come from Wikimedia Commons. There’s also a wikimedia_commons key for linking directly to an image. Wikidata is half a step closer than Wikipedia to the Commons images, since it’s machine-readable.

It support nodes when matching, old version has not supported matching around nodes (you were selecting area to work in)

New version - Wikidata items linked to OSM also definitely supports node matching.

if you already fetch image from wikipedia then supporting also wikidata tags ads one extra API call, but that is a minimal overhead (and much smaller to adding semi-manually wikipedia tags!)

And you will likely want to support wikidata anyway - if image property is set there then it is much more reliable than trying to extract it from Wikipedia article

Also, if you want to add wikipedia tags - have you considered adding them based on wikidata tags? That will be much more reliable than name matching.

The description is the interesting part, images are a bonus.
I already know how to use wikipedia tags as opposed to reverse engineer the very very poor wikimedia API documentation, finding a library I can use in the relevant coding language I’m writing in and making it work.
As I said, people who are interested in wikidata tags are free to improve OSM with it. This is out of scope for this bulk edit and I don’t think it will cause any damage.
The review process here is optional and I’m planning to bulk edit unless someone specifies that there is a mistake within a limited time frame (end of month probably).

To elaborate, wikipedia:xy=* subkeys were the norm in the very early days but were soon replaced by wikpedia=xy:* tags, out of a recognition that Wikipedia already has a mechanism for linking translated articles to each other (and that mechanism is Wikidata rather than OSM).

wikipedia:xy=* subkeys have effectively been deprecated for years. The wikipedia documentation discourages them, except in a very particular situation, and some validators warn about them. If the local community prefers to pair wikipedia=* with a redundant wikipedia:en=* and wikipedia:he=*, so that a given element links to Wikimedia four times over, then this decision and the reasons should be documented to avoid surprising mappers and software developers other than yourself.

I’ve read this article number of times and never understood that using wikipedia:xy=* is discouraged. Nor I see any mention of deprecation. Also the discussion is providing both opinions. Therefore the only conclusion I can derive is that it’s your opinion.
I see the value in the simplicity of linking from the relevant tag to the relevant language, without the need to change a language in wikipedia page itself, for example if I open the following article, I won’t be able to switch the language as I have no clue where to look for the language change button:

I don’t see a good reason to repeat the arguments in the discussion over there, and this bulk edit discussion is for our community to see and comment.
I also posted it on our telegram channel.
If someone in the future would like to remove these tags I would expect a discussion in here to notify and explain why.

I’m sorry you came to this conclusion. The boldfacing is verbatim from the documentation:

In almost all cases, a single wikipedia tag using the primary language for the subject, as described above, is sufficient .

If this guidance is inaccurate or varies by region, it should be adjusted. But clearly this is not just an opinion held by me of all people.[1] Just look at the prevalence of the most popular wikipedia:xy=* keys over time.

Dramatic falls like this don’t normally happen even with the most fervently deprecated of keys.

The outlier is wikipedia:he=*, which spiked a few days ago due when this mechanical edit duplicated every wikipedia=he:* tag as a wikipedia:he=* tag:

I think there might be a misunderstanding. I’m certainly not advocating for anyone to tag an element with its Cebuano Wikipedia article. The wikipedia=xy:* key is normally set according to the local language, with the prefix of the value indicating the language. In the past, there have been some disagreements about how to tag elements in multilingual countries. That may account for some of the vestigial usage of wikipedia:xy=* subkeys, but I don’t have any information about that specifically.

As an outsider, it isn’t for me to insist that the local community adopt a novel tagging scheme, but hopefully you can see that this is not a new discussion. :man_shrugging:

  1. A decade ago, I used to be a proponent of the style you’re proposing. ↩︎

@Harel_M , for the table you need/want a full review or mostly sporadic QA for only a percent is enough? If full, may I propose to order them by importance so at least more important nodes will be updated sooner?

Sporadic is fine from my point of view, I don’t think anyone would like to go over 3k records manually.

1 Like

“the primary language for the subject” - what is the primary language for a subject in a place fraught with what I will gently call extreme societal division?

1 Like

Yes, it’s a good question, especially since these aren’t the only two languages spoken in the region. I suppose the idea is that the language of wikipedia would match the language of name on each feature, but it would be tricky for features that have mixed-language name tags. This is probably why some wikipedia:xy usage remains, apart from the recent mechanical edit in Israel. My point is that the documentation could say so explicitly, if that is indeed the reason.

In any case, the potential for linguistic conflicts is one of the original arguments in favor of Wikidata, which is language-neutral. I would encourage any data consumer to consider processing wikidata or at least use Wikidata to transform wikipedia into the end user’s expected language, as applications like OsmAnd do, rather than relying on the language to be tagged explicitly in wikipedia:xy and using that verbatim.

1 Like

I’ve looked into focusing the effort from wikipedia links to wikidata links.
So far this looks fine. I still hate wikimedia documentation…
I’ll update the original google docs and title of this thread accordingly.

1 Like

If you are working with wikidata tags, consider installing the wikidata plugin that I wrote for chrome and firefox. It will make your life alot easier when looking at the data on osm.org.

1 Like