It’s not exactly “squatting”. You should expect ref= on different features are different.
It’s flawed naming. As you have mentioned 1 and ref:gkz= , country codes a la ref:AT:gkz= is used distinguish between different databases and identifications, within a country, and between different countries. Otherwise, nat_ref= and reg_ref= can be simply be used.
Given the way OSM tags evolve, often quite informally, it may be more like “somebody had a reference they wanted to map for a particular set of objects, and chose a tag that seemed like a good idea at the time”.
E.g. for ref:DE-BY, it seems to mainly apply to individual trees and to protected areas. I haven’t looked into the history, but maybe somebody thought “it would be useful to tag these trees with the official reference catalogued by the relevant Bayern authority, which may be different from how they are labelled locally”. Then somebody wanted to do something similar with protected areas and used the same key. Or maybe trees and protected areas are referenced in the same catalogue.
The fact that there are only 2032 values on 2107 objects on that key seems to be mainly because LB-01066 is used on a number of small dispersed sites around Nürnberg - maybe these are treated as a single site in whatever catalogue is referred to.
Perhaps it’s not intended, but “squatting” seems like a strange choice of word as there is nothing intentional here. I don’t remember ever seeing documentation about how exactly to choose ref:* tags. Also the use of country codes seems to have only become popular around 2011 - e.g. ref:ine is widely used in Spain to refer to the national statistics institute, and perhaps would be ref:ES:ine if it was invented today.
Finally, when looking at small numbers of cases, there are always likely to be some random input errors (perhaps by choosing the wrong tag from a drop down box) or misunderstandings. One of the most used of these tags is ref:bag which is specific to the Netherlands. If you look at the global map for that tag in taginfo you can see it pops up in other countries, e.g. I found one case in England where it was used to map the number of each hole on a golf course! Your ref:gkz example may be one of these issues with no wider significance.
Is there a particular problem you are trying to solve?
One part is UID storage cleanup: I would like to be the tags that store UIDs to be unique in OSM, to achieve that, the key names should be less ambigous.
The practice of using known ambiguous “*” in ref-keys, especially if as confusing as in the case of DE and DE-XX is going against that goal.
“ref:DE-XX=” - haven’t seen such usage of ISO 3166-2 codes outside “DE”, from the current usage it seems they are mostly unique - so if that was intended, and shall not be destroyed, then it forces other ID-systems to use “ref:DE-XX:morespecific=”
“ref:DE=” except for “ref:FR=” no other ISO 3166-1 alpha-2 code that I looked up (AR, AT, BO, CL, ES, HU, IT, TR, US) has more than 9 usages (most have one or zero)
It could be helpful to move UIDs from “ref” and unprefixed keys to “uid:*”. Currently UIDs are shown in “random” places: in the tag section of the iD-editor, in taginfo, in the wiki.
highway= + ref=1 means something different from waterway= + ref=1
The same logic extends to ref:*=1
This is not “squatting”. It’s homonymous, but furthermore the former means a Highway Route 1, not about waterways.
OSM doesn’t determine or judge whether a ref:*= is “unique”. It’s simply recorded. If you need this assessed, it’s your own responsibility, in an application.
It looks like nobody has created wiki pages for these tags, so there is nothing to link to in the category page. If you are specifically interested in Germany, you could ask the German mapping community about these tags.
OSM’s ATYL (any tag you like) approach doesn’t always fit easily with theoretical principles. Of course it’s possible to get mappers to follow a more structured approach, but it’s probably better to do that by pointing to concrete advantages (for mappers, end users, or both) rather than generic ideas like “clean data”.
What kind of UIDs do you have in mind for the plain ref key? Often this holds a reference visible on the ground (like a hiking trail identifier or bus stop code) and for certain object types it makes sense for renderers to display it. Neither the mapper nor the renderer needs to know if it is a unique id in some system outside OSM, what matters is that it is useful in a local context. But I’m not sure if that is the kind of ref you have in mind.
It is not about me “specifically interested in Germany”. And before that, I showed that outside DE and DE-XX such usage is almost non-existent.
What are “theoretical principles”? Does ATYL allow squatting?
findability, accessibility, interoperability, reusability - if you don’t understand any of these -abilities please say which, after pairing these with my other statement: “Currently UIDs are shown in “random” places: in the tag section of the iD-editor, in taginfo, in the wiki.”
It doesn’t matter. My full sentence was: "It could be helpful to move UIDs from “ref” and unprefixed keys to “uid:*”. Followed by “Currently UIDs are shown in “random” places: in the tag section of the iD-editor, in taginfo, in the wiki.”
Bad for: “findability, accessibility, interoperability, reusability”.
Why would it matter what @Tobias_Conradi want to use the data for? What if @Tobias_Conradi doesn’t want to use but wants to help others to use the data?
It doesn’t particularly forbid it. Generally it is encouraged to document and discuss tags, and there are specific requirements for data imports (which are often how new ref: tags are introduced). It’s possible that during such discussions somebody might point out potential conflicts with other tags, but there is no guarantee of that.
Many people post on this forum asking for help in understanding some aspect of OSM data. Often it is possible to help them more effectively by understanding what exactly they want to do with the data. So it’s a reasonable question.
This reminds me of the contact: schema, especially contact:phone. While some mappers value grouping all kinds of contact details in a namespace, the plain phone tag has remained more popular. Effectively data consumers always need to look at both. The same might happen for popular tags like wikipedia and wikidata that mappers are already familiar with, if the uid: versions were introduced.
In all seriousness, I suppose most people think of UID as user id now, ot one of the possible technical meanings. So you should imagine using something else.
Your uid:*= doesn’t work for all uses, as *:ref= is used for referencing attributes. This is formatted as eg bridge:ref= for the *:ref= of bridge= logically. Furthermore, there can be multiple reference codes in different languages, resulting in ref:*= containing language codes.
Tags don’t need to “label” themselves. They are documented on wiki. You can find a category for what’s unique, or data items to query Wikidata style.
As I said, ref= is simply to record what’s being used as the reference code. It doesn’t assess or make a judgement on whether it’s unique. This further has verifiability concerns, forcing a user to determine whether it’s “unique” before adding a uid:*= , causing undue burden.
Another distinct case is *id= as artificial identifiers composed in OSM by users, technically not existing reality, as in the misnamed operator:guid= and network:guid= for GTFS. This is basically what network= on roads functions as.
I still don’t understand, what’s your definition of “squatting”?
On a side note for others, id is the Indonesian language code. Therefore *:id= should be avoided.
Indeed, and that’s especially an issue if you want to apply this to wikidata, which is used more often in this attribute form than on its own. If you go for operator:uid:wikidata and so on, you lose the grouping of uid when presented alphabetically which I think is part of the idea. But if you go for uid:operator:wikidata you break expectations of operator and its attribute tags appearing together.
first of all, we do not encourage the use of abbreviations because they can be ambiguous. For example in English these are possible meanings: https://en.m.wikipedia.org/wiki/UID
This is also demonstrated by your example of gkz, which in German speaking countries is used for Gewässer as well as Gebäude.
The standard solution is to not abbreviate.
My other comment is about the idea that wikipedia or wikidata could be used as unique identifiers for OpenStreetMap objects / concepts - they should not. Wikipedia and Wikidata follow their own systematics, they do not relate 1:1 to OpenStreetMap concepts, rather they are similar, more or less, they are related objects / concepts, not identical ones. There can be many OpenStreetMap objects which can get the same wikidata id, and there can be OpenStreetMap objects which can get several wikidata ids (or wikipedia articles in a given language).