Persistent and stable identifiers

We could run bots on the entities that structure information based on tags on the linked items. E.g. if the currently linked node of the business slot Q123 has amenity=restaurant we add a statement to the entity reflecting that and a reference to the osm id and timestamp.

We could validate that based on government data set of all hospitals in a country we have got all of them mapped. We can’t do that easily today. Not even if they are in Wikidata.

1 Like

If we keep these statements as the underlying osm data evolves we can answer questions like:

  • which restaurants could be found in Umeå in 2022?
  • which ones disappeared in the last year?
  • show me a map of all slots where restaurants disappeared between 2000-2022
1 Like

Note that any graph database could be used. I have professional experience with Wikibase so I might be biased :wink:
Neo4j and others exist.

2 Likes

Regarding licenses we could license the collection of entities as CC0 and that would make my job as a tool writer much easier because then I can update Wikidata based on OSM which is practically impossible today.

Also Wikidata has grown a lot. They don’t really want to have an entity for every business slot in the world. They are not really in scope there.

By leveraging the best of OSM and integrating we will gain a lot of new users I predict. We will be the goto reference for geoentities for the whole word.

2 Likes

Sorry, but the data-model of osm is not like that…

Keeping your example: There is a slot S1 in your mall. S1 is occupied by a restaurant R1. S1+R1 are OSM-node N123 (to keep it very simple).

Now you want to define N123 equals pID1. Even without any changes: pID1 is now referring the restaurant or the slot or the combination of both?

If one of the first answers, how you going to address the second one with a permanent ID?
If third option: Is this useful? It would mean as soon as any tag of the OSM node changes, you getting a new permanent ID, maybe even if the lat and lon are changing :wink:

EDIT: Sorry, but I hit the wrong reply button. Should have been a reply to @pangoSE

1 Like

BTW in case somebody didn’t realize, object/element id + version (for ways and relations you need to combine that with at least one member node) are already persistent and somewhat stable identifiers. But they are literally frozen in time, so they are less useful for finding the current object that represents the same thing (though often they can be helpful).

3 Likes

Thanks for sharing.

My idea was:
Q1 is the entity. A node in a graph database.
It is modeling a slot in a building for any type of business.
An editor can link that to the node of the business using a statement.
Other statements can be added if needed eg sameAs Wikidata Qid

We can have an entity for the whole building also Q2

We can infer statements if we want. Eg this business slot Q1 is inside Q2 because we have a polygon element linked from Q2 and the coordinates of Q1s currently linked node is inside that polygon.

The idea is that the editors keep the links between the osm element and the entity.

1 Like

My idea makes it dead simple to find the current element for any given entity and solves the problem with unstable identifiers as well as frozen in time identifiers. :wink:

1 Like

A bonus to my suggestion is that both ID and JOSM support interacting with the Wikibase API if I understood correctly.
Also the Wikibase API is governed by a stable api policy which is really nice for developers.

1 Like

I have some doubts that this is adding lots of value. Even breaking changes to stable parts of the API require a four week (!) announcement period only.

2 Likes

Oh, I haven’t actually read the policy :sweat_smile:
At least they have one so we know what to expect :grinning:
It seems to be quite stable over time as I have been working with it.
I mostly use libraries to interface with the API but looking at the issues and release history of eg GitHub - LeMyst/WikibaseIntegrator: A Python module to manipulate data on a Wikibase instance (like Wikidata) through the MediaWiki Wikibase API and the Wikibase SPARQL endpoint. I don’t see a lot of changes based on API breakage.

I made this proposal awhile ago looking to solve this issue: Proposed features/Relation:feature - OpenStreetMap Wiki . See the discussion page though for feedback on it.

2 Likes

Personally, I can see both the benefits and draw backs to OpenStreetMap having persistent and stable identifiers. On the one hand it would be nice to “carry over” POIs for objects that have moved, for instance businesses without having to literally drag them for miles across the map window. On the other hand though, I think it would just add an extra layer of complexity to some that is already hard enough for most people to parse out as it is.

As a side to that, IMO persistent and stable identifiers are only as useful as the project they are being implemented in is mature. A lot of OpenStreetMap is rather half baked and the database clearly isn’t at a mature point yet. Really, it’s debatable if it ever will be. Whatever the case with specific parts of it are, like Europe being “mostly” complete and stable, persistent identifies would only serve an extremely small percentage of map consumers in most cases at this point.

Expecting a world wide implementation of them when most places are essentially empty is kind of un-realistic though. That’s one of the things I like about how we currently use Wikidata IDs. They allow local communities to scale up or down the unique identifiers as they see fit when they are personally ready to implement them. Not just adopt them because people in a country where the map is complete thinks they are great and that everyone else should use them or whatever. Although, conversely, persistent and stable identifiers across the board and implemented in OpenStreetMap itself instead of being outsourced to a third party is clearly the future. It’s just not quite there yet.

Many Wikidata items suffer from dual-tagging just as many OSM features do, though I think there’s a general awareness that dual-tagging should be reduced over time. Wikidata’s built-in validator encourages contributors to clean up constraint violations often caused by dual-tagging.

Splits, merges, and other geometry changes do complicate any kind of persistent referencing, and that’s before getting into the linear referencing required for routing. I’m reminded of how, years ago, my edits to smooth out curved roads by adding new nodes would cause Mapbox’s traffic maps to show a banded effect along the road because the traffic data had been associated with segments between nodes that were no longer adjacent.

I guess you weren’t thinking of OpenHistoricalMap when you called for a separate database. :grin: But you can perform these exact queries using OHM’s Overpass API endpoint, provided that the restaurants have been mapped and tagged with start_date and end_date. For example, this query shows the bank branches that closed in my hometown since 2000:

[out:json][timeout:25];
nwr["amenity"="bank"]["end_date"~"^20"]({{bbox}});
out geom;

Even so, a given OHM way ID isn’t strictly guaranteed to be permanent. The way might get revised or deleted as mappers learn more about the local history. This is not uncommon among datasets. For example, U.S. mappers like to think of a GNIS Feature ID as a permanent, unique ID, but these IDs can be fungible too under certain circumstances.

(Most of the data in OHM is CC0-licensed, by the way.)

A similar problem even arises when mapping things that aren’t changing. So far, just about every proposed solution for relating the elements of a street together has run up against the challenge of delimiting a street: Is “West Market Street” the same street as “East Market Street”? What if a street briefly becomes a pedestrian mall? What if the physical roadway curves to the left but the name continues onto a side street to the right?

I think this uncertainty is partly what has motivated a non-relation-based approach to relating sidewalks to streets, but at the cost of some indirection, duplicating information about the intended target on the related ways:

By analogy, in the absence of robust linking between OSM and other datasets such as Wikidata, there’s bound to be some less formal duplication. Population figures, translated names, etymologies, and owners can end up in both the OSM silo and the Wikidata silo.

Apart from losing a Michelin star because of the new chef, these are the cases where OpenHistoricalMap would duplicate the feature. When in doubt, it’s a distinct feature. However, this adds another headache on top of OSM’s version of the stable identifier problem. A chronology relation can tie together the time-separated copies of the feature, and this relation can serve as a relatively stable entity for external linking. If someone wants to map the ship of Theseus, well, that’s fine: there can be multiple chronology relations for different viewpoints about when it became a new ship.

2 Likes

I believe the way to make it feasible is a leveled approach. I’ve been doing some tests, but without a leveled approach, even if things later could be mostly imported, it is still necessary to have the concepts that explain how they react with each other.

So, it would mean starting with world/continents/countries (e.g. UN m49) with far more humans willing to use it. Example of this logic, from world-level to a country:

  • 1 World
    • 2 Africa
      • 14 Eastern Africa
        • 508 Mozambique

If something looks for a rule for Mozambique, but does not have, then it would try to go up until it defaults to whatever rule is at World level. After this point, it becomes harder to decide how to level priorities (but in general, we start by generic things, like “administrative boundary” vs “road” vs “river” before more specialized types like a residential road).

This not only allows focus, even if edited by hand, but also massive amounts of existing Information (such as validation rules that are either global or country-level, but we need to attach to something) already have some default.

While we could use Wikidata for labels, for structural things that could break validation/reasoning, we need to be realistic and expect to do it one by one. As much as eventually it could be automatic to import data (like is the case for some tags), it is not viable to rely on another Authority control.

However, even if focusing initially on both very important concepts (like the countries) and structural concepts, we can still have few examples just to know upfront what types of encoding we would need. I might explain examples for this later, but they’re likely to be represented even as SPARQL/Overpass queries on how to return what they mean, like the “Footways in East Anglia” example on the “Wiki: Relations are not categories”

Edit: typo and link on Authority control

1 Like

Yeah, this is an unsolved philosophical problem in the general case. We first have to solve the philosophical problem for OSM, and only then the technical problems that come along.


Hmm kind of? If I understood correctly, you just changed what is actually uniquely identified. Instead of uniquely identifying an element/object of the real world, you suggest creating a “proxy” element/object that will itself be used as the unique identifier. This doesn’t solve the problem, because that said element can be modified/deleted/etc for any reason (as you’ve already pointed out).

I’m specifically refering to

and


Interesting!

This wouldn’t be as good as what @pangoSE suggested, I think, because you can only have start_date + end_date of the currently mapped feature.

Also interesting! With this I guess the more advanced queries would be possible. But there’s still the problem of this proxy element/object going missing for some reason.

2 Likes

Potential approach on amenities like the initial example: use of RFC 4122 UUIDs as identifier

Based on my inferences here, I suspect the approach we would saw in next months from that other foundation on the hyped “Global Entity Reference System” might be not much more than simply… long random UUIDs that could be generated by any tool (since they’re so random unlikely to clash unless the user wants it).

I’m personally more interested in getting permanent place identifiers for the big, more reusable things (still drafting it), BUT, yes, UUIDs, for data that is from outside, can be useful as keys for conflagration (this is why I’m already comminenting thishere, so others might get interested to explore this approach). The internal persistent ID on OpenStreetMap (like is on Wikidata, but there is prefixed with Q) could still be numeric (and have all the advantages of versioning) but this is later discussion, but something like UUIDs might be relevant in special for things that are like amenities and points of interest. But this could even be implemented today with tag-values, and the rest become syntactic sugar (e.g. something that just displays metadata, without geometries, based purely on a new kind of key or namespace key for complex cases).

1 Like

There is also

They propose to use somewhat semantic IDs for POIs which encode the type of POI and the geocoordinates.

It started October 2020 with backing from Esri, Safegraph, Carto to name the most prominent backers.

Has anyone heard about this getting adopted beyond the founding companies? E.g. the Twitter account has ceased activity since October 2021.

1 Like

EDIT: I initially used the term “maintenance”/“maintaining” for different ideas. Placekey are the “maintainers” (i.e. gatekeepers) of the Placekey IDs in the sense that they to decide which IDs exist and what object(s) those IDs they identify. The other kind of maintenance is keeping the IDs on OSM updated. So I updated the rest of the post to use “gatekeeping” and “maintaining”, respectively, for these two ideas as I initially meant them.


From a quick read of their homepage, I think they serve as the gatekeepers of IDs. Is that something acceptable for OSM? I’m not sure… See for example this excerpt (emphasis mine):

If a specific place has a location name (like “Central Park”) and is already included in the Placekey reference datasets

In more practical terms, how would this be used? This question applies also to @fititnt’s previous post.

Other than that, Placekey doesn’t seem to be a suitable ID system because someone has to be the gatekeeper (the OSM infrastructure), and it’s not stable (see next).


If I understood correctly, it can be used for locations as well as amenities/businesses/etc.

So let’s assume there’s a feature X at place Y, assume that X is the set of tags that represent the type of feature (e.g. restaurant, bakery, clothes shop, …), its name, and other things that may distinguish it from other features around it, assume that Y is the set of addr:* tags (complete and corect) and/or the geospatial coordinates.

From X & Y we can get a placekey ID (say, X@Y). Do we add a placekey=X@Y? If yes, who’s gonna maintain it? How can we enforce it being added? How can we prevent it being removed?

What happens if said restaurant changes places? Now, instead of at Y, it’s at Z. The placekey changes because it has the place encoded in it: X@Z is the new ID. If we’re supposed to add placekey, we have to change it to placekey=X@Z, right? Hence it’s not stable.

And what happens if, for example, the name changes (by mistake, the name really changed, or whatever else)? W is now the set of tags representing the type of feature. The Placekey ID is now W@Z. Not stable, again.


In short: whatever stable unique ID system we want, whether it already exists or not, it can’t depend on geospatial location; it can’t depend on small differences in tagging; at the same time, it mustn’t depend totally on the features not changing at all; and ideally it doesn’t need gatekeeping.

4 Likes

Thanks for pointing this out. Hm. My idea above both adds complexity and requires maintenance. But it also adds stability in form of the entities.
A big tech player can easily check via the entities whether a feature is linked or not.
Say they want all POIs for current restaurants in Umeå:
They extract all business slots which are currently linked to an element in osm and filter all the restaurants.
The advantage of that is that an experienced in the best of worlds local mapper has made sure that the entities are updated and reflecting the current status on the ground.
How will we incentivize a local community to keep the entities updated?
I don’t know.
Maybe a bot can warn the local community once something gets changed and is not reflected in the entities?
Say a new user adds a new restaurant where there is no business slot?
I would like to vet all restaurants in Umeå during the holidays that has not been edited in the last six months. How do I do that easily?
Say want to collaborate on the task with others. Should I create a new map roulette task?

2 Likes