Merging multiple wikidata tags?

How should we deal with a feature that was double-mapped and has subsequently been issued two different wikidata tags?

Red Hook Park - Wikidata (currently on a multipolygon)
Red Hook Recreational Area - Wikidata (currently on a node)

Thanks, J

1 Like

When you see a Wikidata item like Q34855962 that has a sitelink to the Cebuano Wikipedia (ceb) and no other Wikipedia, there’s a very good chance that it’s import cruft. The Cebuano Wikipedia created a stub article on every park in GNIS, replete with an almanac at that very location. Several years ago, every Wikipedia article got imported into Wikidata, to serve that project’s original purpose as a clearinghouse for interwiki links. Unfortunately, the tiny Cebuano community hadn’t gotten very far in cleaning up the GNIS import by that point. If you’re logged into Wikidata, there should be a “Merge” option near the “History” tab that lets you easily merge this item with another. It’s very common to do this on sight when encountering a Cebuano Wikipedia–based item.

5 Likes

Plus Swedish wikipedia, which was cleansed recently thankfully. It had the same bot-import issue. It did that for several stuff all over the world.

You won’t be able to merge them as long as each of them has its own article in the same Wikipedia language version.

Ah, good point. So there’s another step: you’d need to edit the Cebuano Wikipedia article to turn it into a redirect to the correct article, then delete the ceb sitelink from the item you want to merge into the main item.

2 Likes

If you’re familiar with the area and know that one wikidata link is correct and one not, then delete the incorrect link from OSM.

If you’re not familiar with the area but it looks like a garbage import of wikidata into OSM without discussion, report it to the DWG.

If you’re not familiar with the area but it looks like an honest mistake from a human mapper, leave both so that someone who is familiar with the area can have a look. Maybe add a fixme tag if you think that that is necessary?

2 Likes

Thanks for the background, Minh!

I’m indeed familiar with the area. The original node Node: ‪Red Hook Recreational Area‬ (‪357581770‬) | OpenStreetMap (now tagged with Q34855962) was part of the United States GNIS import in 2009. (Whether that was a garbage import is up to interpretation! :wink:) The park was already mapped as a way at that time (& eventually remapped as a multipolygon) but nobody ever merged in the imported GNIS node. In 2017 a mapper dropped by and tagged the way with Q7304317, and then three years later the very same mapper came back and tagged the old GNIS node with Q34855962.

I’ll stick with Q7304317, which is the older & better-tagged wikidata item. Unfortunately the required wiki fiddling to correctly merge Q34855962 into Q7304317 is outside my skillset. I’ve left a changeset comment for the wikidata-tagger in question; we’ll see what comes of it.

Cheers, J

1 Like

My general advise is to treat wikidata entries with just ceb entry as useless bot spam.

If you care about Wikidata quality you can merge them, but ignoring them and just replacing it with real wikidata in OSM is also entirely fine.

Cebuano Wikipedia is almost entirely bot generated, with more than 6 millions of articles and therefore also over 6 million duplicated entries with many of them still unmatched. See Cebuano Wikipedia - Wikipedia

In other words, letting people to run bots without supervision or with minimal supervision has also bad consequences.

you may need to enable some gadget to get it (within preferences)

2 Likes

Would it make sense to remove every wikidata reference from OSM that only references a “ceb” wikipedia article? Here https://www.wikidata.org/wiki/Q7304317 looks like real data but https://www.wikidata.org/wiki/Q34855962 just looks like rubbish. It doesn’t add any value except perhaps to say “there is nothing useful on wikipedia / wikidata about this OSM object”.

1 Like

A mass removal could be just as problematic, because sometimes there really is a need for multiple items, and the Cebuano Wikipedia just happened to reflect that fact because of its comprehensive GNIS import. But I think flagging cases where both items have been linked to similar OSM features would be very helpful to both projects. (The GNIS feature ID on the latter item is useful information.)

1 Like

I found a few Croatian islets that have no other article, just the ceb bot one. I was happy to find them so I can fill the wikidata=* tag, but usefulness is questionable.

3 Likes

Wikipedia language editions besides Cebuano continue to add new articles, so it’s possible for the item to become more useful in the future. If nothing else, it can already serve a space for transliterations of the names of these islets. In general, Wikidata’s inclusion of transliterations is better aligned with cartographic needs than OSM’s (understandably) stricter criteria.

1 Like

I frequently come across these entries, and their associated Wikipedia pages, when engaged in a completely different area. Anything created by these bots is a pernicious nuisance, because of the high value that wikipedia and wikidata pages get in search queries.

Let me say that I have doubts about “it’s possible for the item to become more useful in the future”, when there are, I think, over a million entries, covering not just GNIS, but a large number of biological species. I won’t waste my time checking, but I suspect at least the biological data is already subject to substantial bitrot.

1 Like

I don’t necessarily disagree with your assessment. It’s Thanksgiving Eve around here and I’m just trying to leave a glimmer of hope lit, that’s all. :wink:

The problems with the Cebuano Wikipedia are well-known, and it’s no accident that the Swedish and Ilocano Wikipedias have had the same problems in the past. I think we’re all in agreement that if there is any duplication in Wikidata, it would be perfectly reasonable to ensure firstly that OSM is internally consistent in linking features to the same item instead of ensuring that each of the duplicates is linked.

1 Like

Changeset 129,308,761 merges the stray “Red Hook Recreational Area” node into the main “Red Hook Park” multipolygon relation. Meanwhile, on Wikidata, Red Hook Recreational Area (Q34855962) has been merged into Red Hook Park (Q7304317). Apparently the duplication happened because the English Wikipedia calls it by its common name while GNIS calls it by its official name.

Incidentally, Q34855948 has been unmerged from Q7304317 and renamed from “Red Hook Park” to “Coffey Park”. Apparently the Cebuano Wikipedia also imported parks from GeoNames, which continues to call this park by the wrong name. The same changeset tags Coffey Park with the newly unconflated item. The Cebuano Wikipedia article has also been renamed.

2 Likes

What a mess! Thanks Minh! :turkey:

1 Like