Persistent and stable identifiers

A bonus to my suggestion is that both ID and JOSM support interacting with the Wikibase API if I understood correctly.
Also the Wikibase API is governed by a stable api policy which is really nice for developers.

1 Like

I have some doubts that this is adding lots of value. Even breaking changes to stable parts of the API require a four week (!) announcement period only.


Oh, I haven’t actually read the policy :sweat_smile:
At least they have one so we know what to expect :grinning:
It seems to be quite stable over time as I have been working with it.
I mostly use libraries to interface with the API but looking at the issues and release history of eg GitHub - LeMyst/WikibaseIntegrator: A Python module to manipulate data on a Wikibase instance (like Wikidata) through the MediaWiki Wikibase API and the Wikibase SPARQL endpoint. I don’t see a lot of changes based on API breakage.

I made this proposal awhile ago looking to solve this issue: Proposed features/Relation:feature - OpenStreetMap Wiki . See the discussion page though for feedback on it.


Personally, I can see both the benefits and draw backs to OpenStreetMap having persistent and stable identifiers. On the one hand it would be nice to “carry over” POIs for objects that have moved, for instance businesses without having to literally drag them for miles across the map window. On the other hand though, I think it would just add an extra layer of complexity to some that is already hard enough for most people to parse out as it is.

As a side to that, IMO persistent and stable identifiers are only as useful as the project they are being implemented in is mature. A lot of OpenStreetMap is rather half baked and the database clearly isn’t at a mature point yet. Really, it’s debatable if it ever will be. Whatever the case with specific parts of it are, like Europe being “mostly” complete and stable, persistent identifies would only serve an extremely small percentage of map consumers in most cases at this point.

Expecting a world wide implementation of them when most places are essentially empty is kind of un-realistic though. That’s one of the things I like about how we currently use Wikidata IDs. They allow local communities to scale up or down the unique identifiers as they see fit when they are personally ready to implement them. Not just adopt them because people in a country where the map is complete thinks they are great and that everyone else should use them or whatever. Although, conversely, persistent and stable identifiers across the board and implemented in OpenStreetMap itself instead of being outsourced to a third party is clearly the future. It’s just not quite there yet.

Many Wikidata items suffer from dual-tagging just as many OSM features do, though I think there’s a general awareness that dual-tagging should be reduced over time. Wikidata’s built-in validator encourages contributors to clean up constraint violations often caused by dual-tagging.

Splits, merges, and other geometry changes do complicate any kind of persistent referencing, and that’s before getting into the linear referencing required for routing. I’m reminded of how, years ago, my edits to smooth out curved roads by adding new nodes would cause Mapbox’s traffic maps to show a banded effect along the road because the traffic data had been associated with segments between nodes that were no longer adjacent.

I guess you weren’t thinking of OpenHistoricalMap when you called for a separate database. :grin: But you can perform these exact queries using OHM’s Overpass API endpoint, provided that the restaurants have been mapped and tagged with start_date and end_date. For example, this query shows the bank branches that closed in my hometown since 2000:

out geom;

Even so, a given OHM way ID isn’t strictly guaranteed to be permanent. The way might get revised or deleted as mappers learn more about the local history. This is not uncommon among datasets. For example, U.S. mappers like to think of a GNIS Feature ID as a permanent, unique ID, but these IDs can be fungible too under certain circumstances.

(Most of the data in OHM is CC0-licensed, by the way.)

A similar problem even arises when mapping things that aren’t changing. So far, just about every proposed solution for relating the elements of a street together has run up against the challenge of delimiting a street: Is “West Market Street” the same street as “East Market Street”? What if a street briefly becomes a pedestrian mall? What if the physical roadway curves to the left but the name continues onto a side street to the right?

I think this uncertainty is partly what has motivated a non-relation-based approach to relating sidewalks to streets, but at the cost of some indirection, duplicating information about the intended target on the related ways:

By analogy, in the absence of robust linking between OSM and other datasets such as Wikidata, there’s bound to be some less formal duplication. Population figures, translated names, etymologies, and owners can end up in both the OSM silo and the Wikidata silo.

Apart from losing a Michelin star because of the new chef, these are the cases where OpenHistoricalMap would duplicate the feature. When in doubt, it’s a distinct feature. However, this adds another headache on top of OSM’s version of the stable identifier problem. A chronology relation can tie together the time-separated copies of the feature, and this relation can serve as a relatively stable entity for external linking. If someone wants to map the ship of Theseus, well, that’s fine: there can be multiple chronology relations for different viewpoints about when it became a new ship.


I believe the way to make it feasible is a leveled approach. I’ve been doing some tests, but without a leveled approach, even if things later could be mostly imported, it is still necessary to have the concepts that explain how they react with each other.

So, it would mean starting with world/continents/countries (e.g. UN m49) with far more humans willing to use it. Example of this logic, from world-level to a country:

  • 1 World
    • 2 Africa
      • 14 Eastern Africa
        • 508 Mozambique

If something looks for a rule for Mozambique, but does not have, then it would try to go up until it defaults to whatever rule is at World level. After this point, it becomes harder to decide how to level priorities (but in general, we start by generic things, like “administrative boundary” vs “road” vs “river” before more specialized types like a residential road).

This not only allows focus, even if edited by hand, but also massive amounts of existing Information (such as validation rules that are either global or country-level, but we need to attach to something) already have some default.

While we could use Wikidata for labels, for structural things that could break validation/reasoning, we need to be realistic and expect to do it one by one. As much as eventually it could be automatic to import data (like is the case for some tags), it is not viable to rely on another Authority control.

However, even if focusing initially on both very important concepts (like the countries) and structural concepts, we can still have few examples just to know upfront what types of encoding we would need. I might explain examples for this later, but they’re likely to be represented even as SPARQL/Overpass queries on how to return what they mean, like the “Footways in East Anglia” example on the “Wiki: Relations are not categories”

Edit: typo and link on Authority control

1 Like

Yeah, this is an unsolved philosophical problem in the general case. We first have to solve the philosophical problem for OSM, and only then the technical problems that come along.

Hmm kind of? If I understood correctly, you just changed what is actually uniquely identified. Instead of uniquely identifying an element/object of the real world, you suggest creating a “proxy” element/object that will itself be used as the unique identifier. This doesn’t solve the problem, because that said element can be modified/deleted/etc for any reason (as you’ve already pointed out).

I’m specifically refering to



This wouldn’t be as good as what @pangoSE suggested, I think, because you can only have start_date + end_date of the currently mapped feature.

Also interesting! With this I guess the more advanced queries would be possible. But there’s still the problem of this proxy element/object going missing for some reason.


Potential approach on amenities like the initial example: use of RFC 4122 UUIDs as identifier

Based on my inferences here, I suspect the approach we would saw in next months from that other foundation on the hyped “Global Entity Reference System” might be not much more than simply… long random UUIDs that could be generated by any tool (since they’re so random unlikely to clash unless the user wants it).

I’m personally more interested in getting permanent place identifiers for the big, more reusable things (still drafting it), BUT, yes, UUIDs, for data that is from outside, can be useful as keys for conflagration (this is why I’m already comminenting thishere, so others might get interested to explore this approach). The internal persistent ID on OpenStreetMap (like is on Wikidata, but there is prefixed with Q) could still be numeric (and have all the advantages of versioning) but this is later discussion, but something like UUIDs might be relevant in special for things that are like amenities and points of interest. But this could even be implemented today with tag-values, and the rest become syntactic sugar (e.g. something that just displays metadata, without geometries, based purely on a new kind of key or namespace key for complex cases).

1 Like

There is also

They propose to use somewhat semantic IDs for POIs which encode the type of POI and the geocoordinates.

It started October 2020 with backing from Esri, Safegraph, Carto to name the most prominent backers.

Has anyone heard about this getting adopted beyond the founding companies? E.g. the Twitter account has ceased activity since October 2021.

1 Like

EDIT: I initially used the term “maintenance”/“maintaining” for different ideas. Placekey are the “maintainers” (i.e. gatekeepers) of the Placekey IDs in the sense that they to decide which IDs exist and what object(s) those IDs they identify. The other kind of maintenance is keeping the IDs on OSM updated. So I updated the rest of the post to use “gatekeeping” and “maintaining”, respectively, for these two ideas as I initially meant them.

From a quick read of their homepage, I think they serve as the gatekeepers of IDs. Is that something acceptable for OSM? I’m not sure… See for example this excerpt (emphasis mine):

If a specific place has a location name (like “Central Park”) and is already included in the Placekey reference datasets

In more practical terms, how would this be used? This question applies also to @fititnt’s previous post.

Other than that, Placekey doesn’t seem to be a suitable ID system because someone has to be the gatekeeper (the OSM infrastructure), and it’s not stable (see next).

If I understood correctly, it can be used for locations as well as amenities/businesses/etc.

So let’s assume there’s a feature X at place Y, assume that X is the set of tags that represent the type of feature (e.g. restaurant, bakery, clothes shop, …), its name, and other things that may distinguish it from other features around it, assume that Y is the set of addr:* tags (complete and corect) and/or the geospatial coordinates.

From X & Y we can get a placekey ID (say, X@Y). Do we add a placekey=X@Y? If yes, who’s gonna maintain it? How can we enforce it being added? How can we prevent it being removed?

What happens if said restaurant changes places? Now, instead of at Y, it’s at Z. The placekey changes because it has the place encoded in it: X@Z is the new ID. If we’re supposed to add placekey, we have to change it to placekey=X@Z, right? Hence it’s not stable.

And what happens if, for example, the name changes (by mistake, the name really changed, or whatever else)? W is now the set of tags representing the type of feature. The Placekey ID is now W@Z. Not stable, again.

In short: whatever stable unique ID system we want, whether it already exists or not, it can’t depend on geospatial location; it can’t depend on small differences in tagging; at the same time, it mustn’t depend totally on the features not changing at all; and ideally it doesn’t need gatekeeping.


Thanks for pointing this out. Hm. My idea above both adds complexity and requires maintenance. But it also adds stability in form of the entities.
A big tech player can easily check via the entities whether a feature is linked or not.
Say they want all POIs for current restaurants in Umeå:
They extract all business slots which are currently linked to an element in osm and filter all the restaurants.
The advantage of that is that an experienced in the best of worlds local mapper has made sure that the entities are updated and reflecting the current status on the ground.
How will we incentivize a local community to keep the entities updated?
I don’t know.
Maybe a bot can warn the local community once something gets changed and is not reflected in the entities?
Say a new user adds a new restaurant where there is no business slot?
I would like to vet all restaurants in Umeå during the holidays that has not been edited in the last six months. How do I do that easily?
Say want to collaborate on the task with others. Should I create a new map roulette task?


I conflated two different ideas under the single term “maintenance”/“maintaining” in that post. I updated it now to use distinct terms for the two ideas.

This is a really hard problem… The more I think about it the more I think it’s impossible to solve lol

I think it’s computationally impossible to identify a single feature based on its tags. There’s no fixed schema; the tags as described on the Wiki are very much open to interpretation; and what somebody perceives a thing to be when surveying is also open to interpretation – what might be a shop=bakery for some, may be a amenity=cafe for others[1]. This is too arbitrary, and computers don’t deal with arbitrary, they deal with definitive. They would need a strict set of rules to decide if a certain feature is still the same or not, and I think no such a set of rules exist.

Note #1

In Portugal almost every bakery is a cafe too as described on the Wiki, so whether a certain bakery is mapped as a bakery or a cafe is basically random.

Alternative: We include official government IDs in the tags (a new ref:*). For exammple, in Portugal every restaurant/cafe/shop/etc has a single business number. In that case, we have a pretty good stable ID. If the shop closes or changes owners or whatnot, the ID changes; OTOH if it only moves to a different place, it’ll have the same ID, and that’s what we want.

The problem with this is that it’s hard already to find the opening hours[2], let alone an ID number that is usually not of the consumer’s business.

Note #2

Seriously, almost seems like some businesses around here don’t show the opening hours on purpose… it’s really annoying.

Alternative: We collectively maintain, and serve as gatekeepers for, a new external database of IDs (something like Wikidata but without the graph part probably). The obvious problem: it’s hard enough to keep OSM up to date, we just don’t have enough hands. How do you sell this?

Possible pitch:

Person A: We need this new DB to get stable IDs. Stable IDs are cool because …
Person B: How are you gonna maintain that and OSM, if you can’t even maintain OSM alone?

And here we have a choice. Is it more important to have OSM (the map) as up to date as possible but without stable IDs, or is it more important to have OSM more out of date but with stable IDs? This may be different for different people, but for me the answer is as easy as it can get: OSM being more up to date is much more important. And for the OSM project it makes sense too, OSM must put itself first, not another external project. (To make it as clear as possible: note that this comment/opinion is about this alternative scenario only, not my opinion on stable IDs in general)

It’s plausible a big enough company with enough money can make some half-assed “solution” to the problem even on their own. But I couldn’t care less for companies. I’m much more interested in why you are interested in stable IDs! :slight_smile: This may be different from “why you think they are/can be useful”.

IIRC from an earlier post of yours, those “slots” would be areas that represented business places? For example the space of a shop in a mall. Is that it or did I misunderstand?

If that’s the case, then I don’t understand how it solves the problem. Let’s say there’s a Starbucks in a slot A, and that that same Starbucks changes places to slot B. The slot changed, so how do you know if they’re the same Starbucks or not?

And how would you deal with semi-persistent shops? For example, a trailer hot dog stand or something, that moves to different places every few months or weeks. There probably is no slot where they stop. Do you create a slot on each place? Do we create a slot and move it with the trailer stand itself?

Yeah, that’s the real problem of it all.[3] We could have a better/more complete/more up to date map with more mappers, but how do we get more mappers?

Note #3

And like I said above, I think it’s more important/useful to have two amenities mapped with no stable IDs, than only one of them with a stable ID.

You can use Overpass to get features that have been edited between two dates:

out body;

Then you can extract the results and load them in an editor. Every Door has a similar feature. To collaborate you can use one of the tasking managers, yeah.

Maybe I’m missing the point and you already know all this because you seem to be an experienced mapper. ^^’


But is this the intention of a stable ID?
If you stick to your restaurant. If they stop cooking Italian pasta and start doing Greek cuisine. Is it still the “same” restaurant? For most end users I doubt it, for Government it might no change. Same goes for the other way. Restaurant is passed to the next generation. So for Government it might be a new restaurant, but for most end users, it’s still the same.

IMHO, something like ONE stable ID is not possible.
Of course, as an external gate-keeper it will be possible. You can define your criteria and maintain based on those an object ID.

1 Like

This is, by definition, impossible. And a hint is that a popular name for those who have such IDs for others use is authority control.

Even features that seem 100% procedural, let’s say latitude and longitude in a popular reference system WGS84, still need to be maintained: the algorithm needs to be published. Those who create hardware and software need to understand how to translate to other forms. The “maintaining” might be a book in a library that’s good enough to still be usable, but need to be maintained. This is less clear with things we take for granted, like what “0 1 2 3 4 5 6 7 8 9” means, but even datetime formats have conventions.

By logic, IDs created by direct procedural translation of another thing also need to be maintained (even if it means publishing and incentive people to use it over alternatives).

Then, we have the opaque identifiers, which are considered the best for the very long term, because new people have less incentive to change then without actually having funcional issues. The arks have discussion on why they are against even having organization as part of the prefix (like happens with some old DOIs) because eventually organizations might change names, so they might try to change the entire prefix or… give up keeping record of old ones :expressionless:

Note that even centuries ago, very few (often elite, sponsored by kings) were able to write books, yet very few survive, as people could decide by the cover that the book doesn’t seem worthy. Much more content is produced today, and the role of authority control is very, very important. Most people still complain about OpenStreetMap internal representation without being aware that it is more close to a library catalog, with full history even for very specific nodes, than discartable geometries that would be mere layers. Note how upset not just Data Working Group, but old contributors become when they see people deleting content: they’re reaction is similar to libraries seeing someone burning books.

However, different from DOIs (which there’s more effort to deserve to have one; and strict set of minimal metadata) new kinds of IDs to represent concepts such as the one for OpenStreetMap may from time to time be created by mistake and in massive quantities. If duplicated, then aliases from duplicates need a pointer to recommend reference. But (like they do on Wikidata) things poorly defined, even if without user request, might be worth delete. But even this procedure needs to be known, so let people with less fear.

My argument here is the following: the idea of identifiers for places, necessarily, means authority control. Think about centuries in future. However, the very nature of being geospatial, could allow easily the equivalent of concept identifiers on OpenStreetMap be far, far more updated by bots and by companies than happens on Wikidata after their initial setup by humans (or the first organization that starts the definition, but using their identifier as one of the properties) because is easier to make inferences based on location thar are resolvable better than in Wikidata. (because most of the time OpenStreetMap already only accepts things in space and time).

So, replying @o_andras, while the idea of UUID, made by Overture, is in fact perfect for distributed concept, if we assume OpenStreetMap as authority control (which is how is considered by the way document very well, I closing full history of every node), we would still have something such as serial incremental number at least for the concepts that have some level of notoriety. But even for things that would be heavily automated (think like a few big collaborators adding data) my next point makes me think that no single big player could create an identifier alone without others agreeing with more relevance.

(Hypothesis) strategy to deal with both notoriety and long term survival for concepts for place: require baseline standards on how interlinked is the definition

I had one idea about how to deal with persistence of identifiers that may not have sufficient information and already are not clearly likely to be notorious (like administrative boundaries): we, even more strict than Wikidata, enforce minimal metadata, maybe even some delay time (like weeks). The DOI, while allows even those authorized to issue codes to have private uses, for example does this for what’s expected to be used in public:

So, while not necessarily as big as UUID 4, if generic amenities (which is not clear their relevance) could get some sort of identifier, but not the same kind of the shorter ones, otherwise we would have far easier the issues the @SimonPoole pointed out. The types of places of interest that would likely to have more spam (so people even being paid add metadata to them, like happens on Wikipedia), then would have stricter requirements, however always more focused on what could make them well defined to be interlinked (avoid take in calculation for example ephemeral data that most users would do anyway, like what it sells).

A “DOI approach” would means automatically that some human need to decide if that place deserve a code (even if this is somewhat algoriticaly, like today is on Wikipedia that after some time a page may become a Wikidata Q item) and even then, much, much more medatada. This would means like take in consideration to value if can have or not an identifier, users add data of inception, etymology of the name, if this shop is part of some brand, etc, etc, etc, things that are less likely to change. Sadly, we cannot add personal information on OpenStreetMap, but if the shop had some famous founder in another authority control (like Wikidata Qs) then it could add as founder that person.

The idea of enforcing even stricter metadata for the same kind of shorter identifiers like administrative boundaries could have does not mean these places could have other more algorithmic or by user/company requests. We can do both. But the “first-class” persistent identifiers (even if places cease to exist) in my opinion should only be allowed if at least in theory they could eventually be cared for.

Since several people commented about the challenge of knowing when something changed or not, a good safe approach is to intentionally make it hard to give identifiers for the first wave of amenities, and I mean not “Wikidata level notoriety”, but “Wikipedia level notoriety” (aka already be famous). This alone could allow sufficient time to think about (but already seeing how the encoding is working) so we could start to define minimum metadata for both humans and anyone else (like companies) would need to have.

A built-in “notification system” might be for OpenStreetMap what interlink equivalent pages between languages was for Wikipedia.

Starting with far less items also helps to perfect tools to trigger notifications based on changes. Today for example some people already watch for changes on pages on the Wiki, and since people on OpenStreetMap are more exigent than Wikipedia/Wikidata, if we create such items, people will get interested in known changes since last visit.

Also, there is an obvious advantage of having another way to query the concepts, however today this somewhat would still be possible with advanced queries. But some way for people to know changes customized for their needs, not. And if someone wants a geographic region (like their city) then they focus on the concept of the region and be easier for anyone to create apps that allow them to filter updates customized for that person.

Some people already just enter OpenStreetMap to change some place they found an error. I would say that if we come to a point to know how to trigger notifications, for new users that changed points of interest (not something like just a road) it could be notified at least once when someone else’s also updated their Point Of Interest. For power users this might not be great, but for likely new users could get more engaged with the project.

Disclaimer: despite what I’m saying about trigger notifications, I do not have full proof of concepts on this and I know people complain Wikibase does not handle it very well. But if I would have to make some feature to convince everyone that it is worth going ahead, notify changes (however with option to ignore some types of minor modifications) seems to be what deliver greater impact. Note that compared with Wikipedia, collaborators on OpenStreetMap (except the Wiki itself) is not notified about changes at all.

(it’s unclear to me here whether you’re taking about something like “” here or some database that someone in OSM has control over)

(assuming you’re talking about "something that people in OSM have control over):
If X and Y are just OSM tags, then storing something derived from other OSM tags inside OSM doesn’t make any sense. Of course, you can store it outside, and use it to check when something in OSM has changed.

Like with everything else in OSM, whoever volunteers to do that.

If it’s in a list outside OSM it’s not a problem - whoever maintains that list is in charge.

(if you’re talking about or a similar externally-maintained database)
That’s just another primary key in another third-party database which we have no control over, of which there are lots in OSM already - wikipedia, wikidata, FHRS, etc.). They can be useful - for example should change hands, the FHRS ID for it will likely change too. Lots of tools are available to keep track of changes. Subject to licensing, you could do something similar with other third-party data.

Any non-open data from a third-party may be “here today, gone tomorrow” - there are lots of defunct startups in this area, so unlike open data I wouldn’t rely on it as a long-term solution.


Okay, I’ll throw out an unpopular opinion: this is all academic. The majority of nodes do not get deleted and replaced, and the ones that can be logically linked to Wikidata even less so. You all are creating clever new schemes to solve something that doesn’t need solving.

Shoot me.


My experience is that that is exactly correct. I do various QA on bits of OSM (making sure certain sorts of relations are still valid, filling in gaps in route relations where they have been introduced by accident, that sort of thing). I’ve been doing this for a couple of years now and all I’ve seen is that occasionally a cycle route will be split into a few pieces when it gets unwieldy, and perhaps a super-relation will get created. If you look here you can see that there’s maybe been 1 change per month of the currently 1600+ trails checked (UK+IE).

Elsewhere, perhaps nodes for POIs might get replaced by polygons, but that’s easy to spot too.


I think I share the view that what we’re able to model in Wikidata probably suffices for most of what we ourselves intend to do with stable identifiers. However, the subtext behind this thread is that, apparently, some OSM data consumers have found it so important to fashion an identifier-based conflation system for OSM data that Overture Maps is touting it as one of their selling points. I guess this is an exercise in seeing if we could bring some of that functionality in-house.

Stable identifiers aren’t really about tracking the history of a real-world feature over time. It’s more about being able to hang some extra metadata off of an OSM feature with some confidence that it refers to the same real-world feature. Change tracking only becomes relevant to the extent that either OSM or the external data source can become outdated.

I don’t have much experience with authority control on POIs, but I’ve seen that linear referencing schemes naturally arise from trying to tie static road data to historic or real-time traffic and incident data. Even so, at some point, the best stable identifier scheme compares poorly to obtaining fresh data and purging stale data. Users don’t care whether you managed to match the traffic jam to exactly the right spot as much as they want the colors on the map to be current. Likewise, restaurant reviews from a decade ago aren’t necessarily trustworthy anymore, even if the cuisine and owner stay the same.