Key ref name squatting - example Key:ref:[ISO 3166 code], Key:gkz

An ISO 3166 code can be:

  1. an ISO 3166-1 alpha-2 code, e.g. “CH” for Switzerland, “FR” for France, “SE” for Sweden
  2. an ISO 3166-2 code, e.g. “CH-ZH” for France, “DE-BY” for Bavaria, “US-TX” for Texas

To use a “Key:ref:[ISO 3166 code]” for a specific purpose either:

  1. prevents it from being used for another purpose or
  2. doesn’t prevent it from not being unique if used for another purpose

E.g. using ref:XX-YY= on a protected area,

  1. but the authority XX-YY is using more than type of identifier for the protected area, e.g. ref:XX-YY=1423 and ref:XX-YY=7-487-F
  2. but the authority XX-YY is using same values on other types of items, e.g. street identifier ref:XX-YY=1423

So it is rare to have “ref:FR” ref:FR | Keys | OpenStreetMap Taginfo - six different values exist but instead there are several keys in the form t"ref:FR:*" Search results | OpenStreetMap Taginfo 685 top three: ref:FR:FANTOIR, ref:FR:SIRET, ref:FR:SIREN

On the other hand for protected areas in Germany squatting is widespread:

  1. ref:DE= on 223 objects with 172 values ref:DE | Keys | OpenStreetMap Taginfo
  2. ref:DE-BW= on 736 objects with 723 values ref:DE-BW | Keys | OpenStreetMap Taginfo
  3. ref:DE-BY= is used on 2107 objects with 2032 values ref:DE-BY | Keys | OpenStreetMap Taginfo
  4. ref:DE-HE= on 898 objects with 840 values ref:DE-HE | Keys | OpenStreetMap Taginfo
  5. ref:DE-NW= on 917 objects with 916 values ref:DE-NW | Keys | OpenStreetMap Taginfo - as of today more than 100 protected areas only use “ref=” overpass turbo

ref:gkz (ref:gkz | Keys | OpenStreetMap Taginfo) - is used mostly to store identifiers for water bodies in Germany (g for Gewässer = body of water), but is also found at way/89087127 (Way: ‪Rudolf‬ (‪89087127‬) | OpenStreetMap) to store the building identifier ATU28803809 (g for Gebäude = building) where common practice is, to use ref:at:gkz (ref:at:gkz | Keys | OpenStreetMap Taginfo)

Questions:

  1. On which other keys squatting exist?
  2. Do rules against the described practice exist?
  1. It’s not exactly “squatting”. You should expect ref= on different features are different.

It’s flawed naming. As you have mentioned 1 and ref:gkz= , country codes a la ref:AT:gkz= is used distinguish between different databases and identifications, within a country, and between different countries. Otherwise, nat_ref= and reg_ref= can be simply be used.

AFAIK it doesn’t prevent it from not being unique if used for another purpose. So there is no squatting here at all.

Noone promises anywhere that ref* keys will be globally unique.

The same is true for example for ref key and name key.

(I am not sure why ref:FR exists at all)

1 Like

What do you mean by that? How can X be different from X?

Anyway the thread is about ref:*= not about ref=, what is the purpose of adding “:*”?

What in your interpretation is the purpose of inserting :* between refand =, to get ref:*= and use it on items that don’t use ref=?

My interpretation was, to get unique tags, i.e. each ref:*= is used with a different value.

Given the way OSM tags evolve, often quite informally, it may be more like “somebody had a reference they wanted to map for a particular set of objects, and chose a tag that seemed like a good idea at the time”.

E.g. for ref:DE-BY, it seems to mainly apply to individual trees and to protected areas. I haven’t looked into the history, but maybe somebody thought “it would be useful to tag these trees with the official reference catalogued by the relevant Bayern authority, which may be different from how they are labelled locally”. Then somebody wanted to do something similar with protected areas and used the same key. Or maybe trees and protected areas are referenced in the same catalogue.

The fact that there are only 2032 values on 2107 objects on that key seems to be mainly because LB-01066 is used on a number of small dispersed sites around Nürnberg - maybe these are treated as a single site in whatever catalogue is referred to.

Perhaps it’s not intended, but “squatting” seems like a strange choice of word as there is nothing intentional here. I don’t remember ever seeing documentation about how exactly to choose ref:* tags. Also the use of country codes seems to have only become popular around 2011 - e.g. ref:ine is widely used in Spain to refer to the national statistics institute, and perhaps would be ref:ES:ine if it was invented today.

Finally, when looking at small numbers of cases, there are always likely to be some random input errors (perhaps by choosing the wrong tag from a drop down box) or misunderstandings. One of the most used of these tags is ref:bag which is specific to the Netherlands. If you look at the global map for that tag in taginfo you can see it pops up in other countries, e.g. I found one case in England where it was used to map the number of each hole on a golf course! Your ref:gkz example may be one of these issues with no wider significance.

Is there a particular problem you are trying to solve?

3 Likes

https://en.wikipedia.org/wiki/FAIR_data: FAIR data is data which meets the FAIR principles of findability, accessibility, interoperability, and reusability (FAIR).

One part is UID storage cleanup: I would like to be the tags that store UIDs to be unique in OSM, to achieve that, the key names should be less ambigous.

The practice of using known ambiguous “*” in ref-keys, especially if as confusing as in the case of DE and DE-XX is going against that goal.

  1. “ref:DE-XX=” - haven’t seen such usage of ISO 3166-2 codes outside “DE”, from the current usage it seems they are mostly unique - so if that was intended, and shall not be destroyed, then it forces other ID-systems to use “ref:DE-XX:morespecific=”
  2. “ref:DE=” except for “ref:FR=” no other ISO 3166-1 alpha-2 code that I looked up (AR, AT, BO, CL, ES, HU, IT, TR, US) has more than 9 usages (most have one or zero)

At Category:Key descriptions for group "references" - OpenStreetMap Wiki documented usage of ISO 3166 codes in the form “ref:ISO 3166 code:*” is visible, the “DE” and “DE-XX” cases aren’t there.

It could be helpful to move UIDs from “ref” and unprefixed keys to “uid:*”. Currently UIDs are shown in “random” places: in the tag section of the iD-editor, in taginfo, in the wiki.

What do you want to use the data for?

OSM is usually more interested in real world use cases than theoretical advantages.

6 Likes

highway= + ref=1 means something different from waterway= + ref=1
The same logic extends to ref:*=1
This is not “squatting”. It’s homonymous, but furthermore the former means a Highway Route 1, not about waterways.

OSM doesn’t determine or judge whether a ref:*= is “unique”. It’s simply recorded. If you need this assessed, it’s your own responsibility, in an application.

It looks like nobody has created wiki pages for these tags, so there is nothing to link to in the category page. If you are specifically interested in Germany, you could ask the German mapping community about these tags.

OSM’s ATYL (any tag you like) approach doesn’t always fit easily with theoretical principles. Of course it’s possible to get mappers to follow a more structured approach, but it’s probably better to do that by pointing to concrete advantages (for mappers, end users, or both) rather than generic ideas like “clean data”.

What kind of UIDs do you have in mind for the plain ref key? Often this holds a reference visible on the ground (like a hiking trail identifier or bus stop code) and for certain object types it makes sense for renderers to display it. Neither the mapper nor the renderer needs to know if it is a unique id in some system outside OSM, what matters is that it is useful in a local context. But I’m not sure if that is the kind of ref you have in mind.

2 Likes

No the same logic doesn’t do that. To not repeat myself I refer to my last question at: Key ref name squatting - example Key:ref:[ISO 3166 code], Key:gkz - #4 by Tobias_Conradi

You misquoted, I wrote:

“At Category:Key descriptions for group “references” - OpenStreetMap Wiki documented usage of ISO 3166 codes in the form “ref:ISO 3166 code:*” is visible, the “DE” and “DE-XX” cases aren’t there.”

It is not about me “specifically interested in Germany”. And before that, I showed that outside DE and DE-XX such usage is almost non-existent.

What are “theoretical principles”? Does ATYL allow squatting?

findability, accessibility, interoperability, reusability - if you don’t understand any of these -abilities please say which, after pairing these with my other statement: “Currently UIDs are shown in “random” places: in the tag section of the iD-editor, in taginfo, in the wiki.”

It doesn’t matter. My full sentence was: "It could be helpful to move UIDs from “ref” and unprefixed keys to “uid:*”. Followed by “Currently UIDs are shown in “random” places: in the tag section of the iD-editor, in taginfo, in the wiki.”

Bad for: “findability, accessibility, interoperability, reusability”.

Examples:
ref:at:gkz=*
de:gemeindeschluessel=*
ref:gkz=*
ref:ine=*
ref:nuts=*
ref:WDPA=*
wikidata=*
wikipedia=*

more FAIR:
uid:AT:gkz=*
uid:DE:gemeindeschluessel=*
uid:DE:gkz=*
uid:ES:ine=*
uid:EU:nuts=*
uid:WDPA=*
uid:wikidata=*
uid:wikipedia=*

Not FAIR.

[(Post must be at least 10 characters
Have you tried the like button? ) - No.]

Why would it matter what @Tobias_Conradi want to use the data for? What if @Tobias_Conradi doesn’t want to use but wants to help others to use the data?

Transparency note: @Tobias_Conradi is a OSM contributor.

Who is OSM? Where can the statement be verified?

I wrote: “Currently UIDs are shown in “random” places: in the tag section of the iD-editor, in taginfo, in the wiki.” - Does using:

  1. the tag section of the iD-editor
  2. taginfo
  3. the wiki

fall under “real world use cases” or “theoretical advantages”?

Since the current data is not very FAIR, any suggested improved more FAIR version is theoretical?

On this forum, by the amount of pushback you’re getting. LOL

It doesn’t particularly forbid it. Generally it is encouraged to document and discuss tags, and there are specific requirements for data imports (which are often how new ref: tags are introduced). It’s possible that during such discussions somebody might point out potential conflicts with other tags, but there is no guarantee of that.

Many people post on this forum asking for help in understanding some aspect of OSM data. Often it is possible to help them more effectively by understanding what exactly they want to do with the data. So it’s a reasonable question.

This reminds me of the contact: schema, especially contact:phone. While some mappers value grouping all kinds of contact details in a namespace, the plain phone tag has remained more popular. Effectively data consumers always need to look at both. The same might happen for popular tags like wikipedia and wikidata that mappers are already familiar with, if the uid: versions were introduced.

2 Likes

In all seriousness, I suppose most people think of UID as user id now, ot one of the possible technical meanings. So you should imagine using something else.
Your uid:*= doesn’t work for all uses, as *:ref= is used for referencing attributes. This is formatted as eg bridge:ref= for the *:ref= of bridge= logically. Furthermore, there can be multiple reference codes in different languages, resulting in ref:*= containing language codes.
Tags don’t need to “label” themselves. They are documented on wiki. You can find a category for what’s unique, or data items to query Wikidata style.
As I said, ref= is simply to record what’s being used as the reference code. It doesn’t assess or make a judgement on whether it’s unique. This further has verifiability concerns, forcing a user to determine whether it’s “unique” before adding a uid:*= , causing undue burden.
Another distinct case is *id= as artificial identifiers composed in OSM by users, technically not existing reality, as in the misnamed operator:guid= and network:guid= for GTFS. This is basically what network= on roads functions as.
I still don’t understand, what’s your definition of “squatting”?
On a side note for others, id is the Indonesian language code. Therefore *:id= should be avoided.

2 Likes

Indeed, and that’s especially an issue if you want to apply this to wikidata, which is used more often in this attribute form than on its own. If you go for operator:uid:wikidata and so on, you lose the grouping of uid when presented alphabetically which I think is part of the idea. But if you go for uid:operator:wikidata you break expectations of operator and its attribute tags appearing together.

uid:wikidata=*
uid:wikipedia=*

first of all, we do not encourage the use of abbreviations because they can be ambiguous. For example in English these are possible meanings: https://en.m.wikipedia.org/wiki/UID

This is also demonstrated by your example of gkz, which in German speaking countries is used for Gewässer as well as Gebäude.

The standard solution is to not abbreviate.

My other comment is about the idea that wikipedia or wikidata could be used as unique identifiers for OpenStreetMap objects / concepts - they should not. Wikipedia and Wikidata follow their own systematics, they do not relate 1:1 to OpenStreetMap concepts, rather they are similar, more or less, they are related objects / concepts, not identical ones. There can be many OpenStreetMap objects which can get the same wikidata id, and there can be OpenStreetMap objects which can get several wikidata ids (or wikipedia articles in a given language).

1 Like