Persistent and stable identifiers

Hi

I’m mostly active on Wikidata and I would really love to see this community fix the long standing issue of persistent identifiers for every single entity in the database.

How do we best do that? Create relations for everything and link to Wikidata from them?

As a starter I would like all our POIs to be stable.

I want big fat warning in the editors when the changes result in loosing a persistent identifier.

WDYT?

5 Likes

From a technical perspective, relations are not more persistent or stable than any other object in OSM. The only reason why Wikidata links to relations but not to other types is that users tend to delete and re-create relations less often than nodes, areas or ways.

There are users here that know much more about the OSM data model than me, but as a normal mapper, it is hard for me to imagine how stable identifiers could work. At the moment, ways, nodes, and relations have their separate identifier namespace. So, when I want to replace a node by a way for a more detailed representation, I couldn’t keep the identifier because it might already be taken (for example I can’t convert node 380498211 into way 380498211).

Adding a separate identifier for the object that then could be migrated into another object type would greatly increase data size. It would also not answer the question of how users could be “forced” to maintain the identifier, unless editors would add a “convert into x” feature.

To sum up, I agree that this would be a nice thing to have. But I don’t know if it will ever be possible.

3 Likes

Hello, just now I am editing identifiers for a third-party site, I also want to make unique objects now. It is very difficult to validate data:

  • we often add wikipedia/wikidata tags for each segment for highways (I really want to move this tag to the associated street relation, I completely agree that there should be a unique identifier, but some cartographers are against it);
  • cartographers rarely care about the correctness of the data, sometimes I see silly mistakes;
  • sometimes memorials (statues, busts, …) are tagged with a famous person or event in the wikipedia/wikidata tag, although this must be specified in subject:wikipedia/subject:wikidata (manual verification is also required).

I fully support you in this idea, but it is advisable to make an official proposal so that it does not turn into an edit war.

2 Likes

You can link ways and nodes as well with P10689, I do it all the time.

7 Likes

Oh, interesting, that didn’t exist yet when I was more active at Wikidata.

I wrote some ideas about persistent identifiers on OSM a long time ago: There are a couple of relevant links in the blog too. I still hold that one really needs a coherent (and elaborate) data model to support implementing them. I’m still somewhat sceptical about wikidata’s implementation because many entries refer to two or more divergent concepts, or very poorly defined ones.

6 Likes

The usual question: pls detail which changes to a restaurant make it no longer the restaurant that the stable id referred to.

15 Likes

This is a tricky case. The first thing that comes into mind is a name change. However, sometimes names change only slightly while everything else stays the same. So in this case it will still be the same restaurant afterwards and the ID should stay the same. In other cases the name could stay the same but the cuisine changes drastically. Still the same restaurant with the same ID then? Not sure.

5 Likes

Nodes, ways, and relations are OpenStreetMap elements, not OpenStreetMap objects.

I made a suggestion to change this point.

2 Likes

Actually they are commonly referred to as objects in the OpemStreetMap community.

1 Like

Wikidata accepts alternative names. If they are elements, they should be called “elements” in the main name, regardless of whether they can be called by other names in English.

Accuracy is also important for translations. Not in all languages “object” means the same thing in the OSM environment. For example, “object” is used to mean “feature” in Spanish, and “element” is translated as “elemento”.

2 Likes

Thanks for pointing that out. I was not aware of that property.

Other possible changes:moving down the road, moving to a different quarter, moving to a completely different city, changing the chef, changing the owner … and so on.

5 Likes

If I remember correctly, in Wikidata the discussion about having a property for node/way ID exists since the property for the relation ID was proposed, and similar to the here and there discussions, opinions were differentiated mainly due to the fact that the ID of especially node/way tends to change more often than the relation’s one.

Not to mention the issue of which ID would you put in an item, if in OpenStreetMap happens to be mapped in several objects and not a single one.
For example, a museum, especially one which has outdoor space aswell. Would you put there the ID of the main building, the node with the museum tags, or create a relation to contain every object that is mapped within the museum’s space? Because as far as I’m aware, there’s no such specific tag (and usually I just tag it tourism=museum without adding the building=yes, since I’m used to the iD presets).

1 Like

Thanks for pointing out the issues with my suggestion. Maybe POI is a difficult thing to start with.
Maybe business slots/buildings/routes is a better start?

I was thinking that we create an entity and link the current osm element to that entity.

E.g. I’m sitting in a big mall called Avion. There are a number of businesses in slots here. Every slot has a geometry or at least a point.

Slot Q1 links to a node with tags for coffee shop. Then the coffee shop closes and the node is emptied but kept.

Looking up Q1 I’ll see the history of elements it links to.
Going to the current one I find an empty node or abandoned:whatever

Then suddenly a new restaurant open in the same slot. Someone creates a new node for the restaurant unaware of the current empty node.

A third mapper knows that there is an entity pointing to an empty node but a new node without an entity link very close and investigates

The third mapper merges the nodes and now the newer node is deleted and the entity did not change.
——
Another example
Someone maps paths in the forest.
The city council decides to publish a new hiking path Hike1.

A wikidata volunteer imports the hiking path there and all the subsegments if any but cannot find an entity for the same in OSM

A second mapper comes along discovers that the open data in WD describes a route we don’t seem to have yet but all the paths are already there. They create a new relation for the route and create a new entity and link it to Wikidata.

A second wikidata volunteer comes along and discovers that the official name of the path has changed in the councils dataset and update Wikidata accordingly.

A bot operator on OSM looks for mismatches between linked entities in WD and OSM and propose to a user to fix the name in OSM.

Or perhaps when an entity has a WD link we don’t allow changes to the name on the element. :wink:

Does this sound like something we would want?

My suggestion is to have entity gatekeepers, someone with a good track record and has the trust of the local community can edit the entities in a specific area.

Deleting entities should be just as hard as it is in Wikidata.

Removing links between entity and osm element could be a 2 step process where two independent mappers must approve to make the change.

We could keep the entities in the current osm database or in a separate database.

A separate database like wikibase could possibly offer some benefits because it has a more powerful search language SPARQL. (I’m aware of sophox but it seems like a slap on that is not supported by the community and has no permanent and stable identifiers.)

This suggestion would provide:

  • gatekeeping which would prevent name hijacking like the new york incident
  • increased community involvement and cohesion
  • better alignment with the web of linked open data
  • possibility of tightening the integration between OSM and Wikidata. Together we are stronger than apart as we are now.
  • possibility of running bots and validate entities. If Wikibase is chosen we have a ton of tools already available to help curate them effectively
  • this increases complexity by adding another layer on top of OSM that we have to keep updated.
3 Likes

We could run bots on the entities that structure information based on tags on the linked items. E.g. if the currently linked node of the business slot Q123 has amenity=restaurant we add a statement to the entity reflecting that and a reference to the osm id and timestamp.

We could validate that based on government data set of all hospitals in a country we have got all of them mapped. We can’t do that easily today. Not even if they are in Wikidata.

1 Like

If we keep these statements as the underlying osm data evolves we can answer questions like:

  • which restaurants could be found in Umeå in 2022?
  • which ones disappeared in the last year?
  • show me a map of all slots where restaurants disappeared between 2000-2022
1 Like

Note that any graph database could be used. I have professional experience with Wikibase so I might be biased :wink:
Neo4j and others exist.

2 Likes

Regarding licenses we could license the collection of entities as CC0 and that would make my job as a tool writer much easier because then I can update Wikidata based on OSM which is practically impossible today.

Also Wikidata has grown a lot. They don’t really want to have an entity for every business slot in the world. They are not really in scope there.

By leveraging the best of OSM and integrating we will gain a lot of new users I predict. We will be the goto reference for geoentities for the whole word.

2 Likes

Sorry, but the data-model of osm is not like that…

Keeping your example: There is a slot S1 in your mall. S1 is occupied by a restaurant R1. S1+R1 are OSM-node N123 (to keep it very simple).

Now you want to define N123 equals pID1. Even without any changes: pID1 is now referring the restaurant or the slot or the combination of both?

If one of the first answers, how you going to address the second one with a permanent ID?
If third option: Is this useful? It would mean as soon as any tag of the OSM node changes, you getting a new permanent ID, maybe even if the lat and lon are changing :wink:

EDIT: Sorry, but I hit the wrong reply button. Should have been a reply to @pangoSE

1 Like