wikidata tag added to thousands of place nodes. Automated mass edit?

escada · November 25, 2016, 4:33pm

Thanks a lot for this tool @PlaneMad

DevonF · November 27, 2016, 10:12pm

@pigsonthewing it looked stale because it was still written as a proposal from years ago and I can’t seem to find any bot/tool associated with it on that page. Maybe that wiki page needs updating?

@PlaneMad cool map! Interesting to pick out some of the mistakes which have propagated. For example check out the town of Chesterville ON. The place node. @DenisCarriere added the wikipedia page which is legit. Then recently @LogicalViolinist added the wikidata ID but clearly whatever tool he used didn’t notice that the Chesterville wikipedia was a redirect to North Dundas and so added that wikidata ID instead. And so now it’s obvious that Mapbox prefers to use names based on the wikidata id, not the OSM database since now there are two North Dundas.

LogicalViolinist · November 28, 2016, 2:06pm

Didnt use a tool see reason for what wiki id I put: http://www.openstreetmap.org/changeset/43544541

rmikke · May 25, 2017, 10:55am

I’m doing it almost the same way (inspired by nyuriks ) but I find iD much faster for manual updates.

Actually, I think the automatic updates (i.e. matching wikidata ID to existing wikipedia link) should be done by some bot regurarly, so that only objects requiring manual actions are left. In Poland there are a few new wikipedia links missing wikidata ID every day, also in other countries I could see nyuriks has been before me, but new wikipedia links have been added since.

SomeoneElse · May 25, 2017, 11:37am

Not exactly the same way I hope - I’m still trying to persuade them to tidy up some of the mismatches that they created.

rmikke · May 25, 2017, 2:48pm

But they didn’t. WE didn’t, should I say. Write. Whatever.

The mismatches were there before, just nobody caught it earlier until wikidata IDs were added. Don’t look at it as creating mismatches, but as noticing them. Mass adding wikidata IDs is preparing grounds for cleaning wikipedia issues, like nyuriks’ list of disambiguation pages. We would be better off if some bot did it on daily basis, it’s a mechanical job really. Only what is left after this mechanical job, requires mappers’ attention. This includes:

correcting wikipedia titlesa that have changed since copying them to OSM

Finding incorrect links to wikipedia

Creating relations for rivers, highways and others(*)

…

I think all these tasks are easier when we have wikidata IDs.

(*)I have placed GitHub issue to reveal that an object belongs to relation that has a wikipedia link and/or website defined so that users won’t try to add wikipedia link to every member of relation - please back me up there if you think it makes sense

SomeoneElse · May 25, 2017, 3:48pm

In this case they weren’t. What tends to happen is something like:

o OSM has an object for a village and an admin entity

o An OSM user adds a wikipedia tag to the admin entity. The wikipedia entry describes itself as covering both the village and the admin entity, so that’s OK.

o A wikipedian writes a bot that creates a wikidata item from the wikipedia article. The bot creates wikidata entries for villages, not admin entities. That’s not entirely wrong, because the wikipedia article actually covers both.

o A different wikipedian spots that there is an OSM admin entity and a wikidata item with the same name in a similar location and links them via a wikidata tag. This results in the wrong OSM entity being linked to a wikidata item.

rmikke · May 26, 2017, 9:23am

SomeoneElse:

rmikke:

The mismatches were there before, just nobody caught it earlier until wikidata IDs were added.

In this case they weren’t. What tends to happen is something like:

o OSM has an object for a village and an admin entity

o An OSM user adds a wikipedia tag to the admin entity. The wikipedia entry describes itself as covering both the village and the admin entity, so that’s OK.

o A wikipedian writes a bot that creates a wikidata item from the wikipedia article. The bot creates wikidata entries for villages, not admin entities. That’s not entirely wrong, because the wikipedia article actually covers both.

o A different wikipedian spots that there is an OSM admin entity and a wikidata item with the same name in a similar location and links them via a wikidata tag. This results in the wrong OSM entity being linked to a wikidata item.

That’s not exactly the case here. What happens now is: there are two OSM objects with the same Wikipedia link, so they get the same wikidata IDs. At least if we are still talking about the semiautomated adding of wikidata IDs that nyuriks and I do.

SomeoneElse · May 26, 2017, 11:35am

I think that the bottom line is that if you’re adding a wikidata link to OSM you have to check that the wikidata article actually applies to the OSM object - you can’t rely on what’s happened between wikipedia and wikidata to ensure that.

rmikke · May 26, 2017, 2:03pm

And that’s what we do for every link that does not get wikidata ID on batch run.
Still, I think it’s better to do this batch run as the majority of wikipedia links is correct and then catch doubled wikidata IDs than do all this job manually link by link.

One thing worries me, cause I may not understand wikidata correctly: Is it all right to have two wikidata entries pointing to the same wikipedia article?

escada · May 28, 2017, 12:25pm

I think that is possible. It is possible to describe a group of objects in Wikipedia (e.g. a museum and the paintings in it, while the painting might already be a separate Wikidata item.

Gotegomadi · May 28, 2017, 2:53pm

Just look at this map: https://a.safe.moe/3cC0P.png
This map entirely created from wikipedia data, no OSM at all.
Pointer at POI with text “Wikipedia: Messages about errors/Archive/2011/09”

rmikke · May 30, 2017, 9:12am

If so, we should definitely add wikidata=* to wikipedia=* as automatically as possible, then hunt for doubled wikidata values and create additional wikidata entries for separate entities.

SomeoneElse · May 30, 2017, 10:41am

That’s what appears to have happened so far, and it has devalued the work that people have done adding valid wikidata entries. Someone processing wikidata values in OSM would have to do some postprocessing:

“Was this tag added by someone who has added a lot of wikidata links? If so, I’d better ignore it”.

The link between wikipedia and wikidata is already present in wikidata - simply duplicating it in OSM without any checking adds no value. Adding wikidata links for which you have local knowledge does add value.

rmikke · May 30, 2017, 2:36pm

I think I don’t understand.

Let’s assume there is no mass adding. Then someone would have to:

Incidentally find an object without wikidata

Get an idea “Hey, it would be nice to add wikidata tag!”

Make a research, which wikipedia article is correct (OK, let’s assume wikipedia tag has a value for a start), does it have a wikidata entry and is this entry correct for OSM object or should new wikidata entry be created.

How fast, do you think, wikidata IDs will be added to OSM?

Now back to the current situation. with mass adding of wikidata IDs. Most OSM objects have correct wikidata IDs (everywhere there is a 1:1 OSM-Wikipedia relation) and all that has to be done is:

Get duplicated values of wikidata e.g. from taginfo

Make a research, if wikipedia article covers both OSM objects and for which OSM object the existing wikidata entry is correct

Create wikidata entry for remaining OSM object(s)

And there are at least two people (nyuriks and me) dedicated for this and trying to clean up the wikidata issues. Is there some postprocessing needed? Yes, there is, but not for every user - we will have to do this additional hunt for doubled wikidata entries.

YES! Not exactly. but yes. I even do think we should rely on wikidata more. But for that we need correct wikidata IDs. So I think it’s bettter to add the IDs then resolve issues (and there are more than duplicate IDs, nyuriks has created a page that covers at least some of them and gets local help with the issues), than to not have them at all.

And the knowledge required is not exactly local. All the issues so far require knowledge of local:

language

rules

Otherwise it's totally armchair mapping. I could even correct some issues in China, with absolutely no understanding what I'm editing (after some Chinese guy assigned correct wikipedia articles instead of disambiguation, but left out the wikidata IDs).

So thanks for noticing the problem of duplicate wikidata IDs, we will take care of that, among other issues.