What are the "top-level tags"?

Many articles in the wiki make reference to “top-level” tags, but there is not an article explaining or listing them. This could help a lot to new contributors that do not understand the whole tagging-scheme.

What could be considered as top-level tags?

These are some mentions:

  • highway 1 2 3 5 6
  • leisure 2
  • landuse 2
  • natural=coastline 2
  • building 3
  • amenity 2 4
  • railway 5
  • waterway 5 6

What about the “shop” tag?

10 Likes

In my understanding, any tag that is not clarifying another tag is a top-level tag.

If a top-level tag is the only tag on the object, it makes some sense. If a non-top-level tag is the only tag on the object, it doesn’t really make sense.

So for example highway=service is top-level, and service=driveway is not top-level, and bridge=yes is also not top-level. highway=service can exist by itself. A way with service=driveway as its only tag is incomplete. A way with bridge=yes as only tag is also incomplete (but note that man_made=bridge as the only tag of a way would be fine, hence man_made=bridge is a top-level tag).

railway=rail is top-level, but embankment=yes is not top-level. A way with embankment=yes as its only tag wouldn’t make sense.

shop=beauty is top-level, beauty=nails is not top-level.

amenity=place_of_worship is top-level, and so is building=church, but religion=christian is not top-level.

12 Likes

“Top-level” is just one of the many terms that refers to this important concept. There was a good discussion on the tagging mailing list about it back in October 2022. Recalling one observation I made then:

“Primary feature” appears in the Collective Database Guideline Guideline
[sic] and Geocoding community guideline, which clarify the terms of use
under the ODbL. “Feature type” appears in the Horizontal Map
Layers community guideline
.

For the purpose of the Collective Database Guideline, which underpins a lot about how OSM data is used in practice:

“Primary feature” means data from a key value pair, or combination thereof, but not inclusive of properties (e.g. colour, brand, operator, or width). For purposes of illustration, you may reference Map features - OpenStreetMap Wiki.

So it’s kind of defined in terms of what it isn’t, rather than what it is. I vaguely recall someone asking me for help at the time to define this notion of a primary feature tag; I guess I wasn’t able to provide a convincing alternative.

2 Likes

It’s some distance from “primary features”, but a list of the tags that, if they occurred on their own I would consider for display in one project can be found here.

Many of those are ones that I wouldn’t personally use on their own, but people do.

1 Like

I encourage you to figure out yourself and then write an article about it. How?

Look at the statistics! While in principle you can look at all the world at once, I suggest to pick a small region you are familiar with, because the final step is to manually screen a substantial number of objects. First the high-level view for my home city:

[out:csv(n,w,r,key)];
area[name="Wuppertal"];
nwr(area);
make info key="_all",n=count(nodes),w=count(ways),r=count(relations)->.all;
.all out;
nwr._(if:count_tags()>0)->.tagged;
make info key="_tagged",n=tagged.count(nodes),w=tagged.count(ways),r=tagged.count(relations)->.tagged;
.tagged out;
for (keys())
{
  if (count(nwr) > 10000)
  {
    make info key=_.val,n=count(nodes),w=count(ways),r=count(relations);
    out;
  }
}

This yields a table in which highway, building, and multiple addr: prefixed tags have the highest number. Any list of top level tags thus must include highway and building, otherwise you have a plenty of unexplained objects. By contrast, it is an act of human judgement to assume addr: will highly overlap with building and ignore it for now.

Filter out that first category and adjust the threshold in line 12 from 10000 to 1000.

[out:csv(n,w,r,key)];
area[name="Wuppertal"];
nwr(area)
  [!building][!highway];
make info key="_all",n=count(nodes),w=count(ways),r=count(relations)->.all;
.all out;
nwr._(if:count_tags()>0)->.tagged;
make info key="_tagged",n=tagged.count(nodes),w=tagged.count(ways),r=tagged.count(relations)->.tagged;
.tagged out;
for (keys())
{
  if (count(nwr) > 1000)
  {
    make info key=_.val,n=count(nodes),w=count(ways),r=count(relations);
    out;
  }
}

Now in the table the keys amenity, landuse, and natural have prominent numbers. The keys name and operator have substantial numbers as well, but again are by policy decision not top-level tags.

We repeat with again a lower threshold and after excluding this keys as well:

[out:csv(n,w,r,key)];
area[name="Wuppertal"];
nwr(area)
  [!building][!highway]
  [!amenity][!natural][!landuse];
make info key="_all",n=count(nodes),w=count(ways),r=count(relations)->.all;
.all out;
nwr._(if:count_tags()>0)->.tagged;
make info key="_tagged",n=tagged.count(nodes),w=tagged.count(ways),r=tagged.count(relations)->.tagged;
.tagged out;
for (keys())
{
  if (count(nwr) > 300)
  {
    make info key=_.val,n=count(nodes),w=count(ways),r=count(relations);
    out;
  }
}

We see two other motivations to pick top-level tags now:

  • the tags railway, waterway, and shop have notable but not compelling numbers but are promoted by human judgement of the importance of features
  • the tags entrance for nodes and type for relations are so dominant in their object type that we shall not step over them

Note that while almost all relations have a type tag that you then still know very little about relations. This is more of a technicality.

Next and final spreadsheet round:

[out:csv(n,w,r,key)];
area[name="Wuppertal"];
nwr(area)
  [!building][!highway]
  [!amenity][!natural][!landuse]
  [!railway][!waterway][!shop][!entrance][!type];
make info key="_all",n=count(nodes),w=count(ways),r=count(relations)->.all;
.all out;
nwr._(if:count_tags()>0)->.tagged;
make info key="_tagged",n=tagged.count(nodes),w=tagged.count(ways),r=tagged.count(relations)->.tagged;
.tagged out;
for (keys())
{
  if (count(nwr) > 100)
  {
    make info key=_.val,n=count(nodes),w=count(ways),r=count(relations);
    out;
  }
}

The number of tagged objects which are written in line 3 before _tagged is at over 10k still too much for a decent map display. This is decided unilaterally for me to also pick leisure, barrier, man_made, and tourism as top-level tags, somewhat semantically arbitrary.

This allows to now bring the rest on the map:

area[name="Wuppertal"];
nwr(area)
  [!building][!highway]
  [!amenity][!natural][!landuse]
  [!railway][!waterway][!shop][!entrance][!type]
  [!leisure][!barrier][!man_made][!tourism];
nwr._(if:count_tags()>0);
out center;

To get this very overloaded map display uncluttered, I have again picked a bunch of keys which are prominent in the map display:

area[name="Wuppertal"];
nwr(area)
  [!building][!highway]
  [!amenity][!natural][!landuse]
  [!railway][!waterway][!shop][!entrance][!type]
  [!leisure][!barrier][!man_made][!tourism];
nwr._(if:count_tags()>0);
out center;

Hence the final observation is that there are a lot of valid objects with no numerous top-level tag (think of traffic_sign) to pick a random example where the key fully explains the category of object, but where the objects themselves are less numerous or less important.

To wrap this up:

  • Some keys must be amongst the top-level tags because of their sheer number: building, highway
  • Some keys are excluded by human judgement (addr:) or project policy (name because we are not a database of pure names with coordinates). Were there enough people to maintain an alternative view on the data then this would be most likely absolutely viable.
  • In the middle ranks, no significant enough patterns exist, hence some human judgement must be involved
  • There remain a lot of little top-level tags where less numerous tags characterize less important objects. I.e. and functional list of a comprehensive map will be quite long

I will admire you I you manage to put this into proper English and a polished presentation to have a documentation of this in the wiki. I personally will always too much stick to the numbers and structures to get that reader-friendly.

4 Likes

I kind of had a similar thought about obtaining some insights directly from the tagging data mangling. taginfo would seem like a great place for an automated extract of such top-level tags to live, with the appropriate filters in place. In particular, it seems like a good first step would be to remove all subkeys, whether they’re explicitly namespaced (in the foo:bar form) or identified by their name occuring as the value of another key (e.g. service=*, because it occurs as a value of highway=*).

In fact, there’s already a list of popular keys and popular key-value combinations right in the taginfo homepage, which might serve as a good starting point for this investigation (or at least a good source of data for validating whatever results one obtains).

Just for fun (well, who am I kidding — I got totally nerd-sniped), I downloaded the taginfo-master.db database from taginfo’s download page, which contains aggregate statistics and is relatively small; I then extracted the top 100 rows from the popular_keys table, as well as the entries from top_tags table with an aggregate usage count >100k, and did a bit of impromptu (and totally non-scientific) analysis to see if the approach above would yield any results. Unfortunately what remains is still a broad mix of primary and secondary keys (e.g. highway vs. lanes) that requires smarter heuristics or manual semantic filtering to produce a proper candidate for the top-level keys.

So in the end I resorted to simply grabbing the most common values (among the tags with >100k uses) for each of the top keys, manually selected the keys that “feel” to me as the most important top-level ones, along with a manual selection and ordering of their top values, and grouped them in broad categories according to my personal mental model of OSM tagging. I did this for personal reference only, but feel free to check it out here in case it contributes anything to this discussion.

1 Like

FWIW preset-utils/src/main/java/ch/poole/osm/presetutils/Tags.java at master · simonpoole/preset-utils · GitHub

These are keys, not tags. To make new contributors understand the whole tagging scheme, the terminology should be precise, I think!

There are a small number of exceptions, for example cycleway=asl, but outside of those exceptions, as a rule all values of the relevant keys will define top level objects.

3 Likes

I was asking for the top-level tags because I want to identify these tags. To prevent misunderstandings, I want to see a long list, not “examples,” as in the screenshot below. The list of drolbr is probably a good starting point for the list.

I am interested in identifying which ones can be used with semi-colon for multiple values, especially the shop= key. The Wiki does not mention Shop as a top-level key and can have various values. But, in this discussion, I could interpret that this is a top-level and it should not use semicolons.

Multi-purpose shops

(Option 2 should not be used as shop, as this could be considered a top-level key).

The guidance to avoid multiple values in shop=* is equal parts principle and pragmatism. The practical consideration is that a lot of software applications (editors, renderers, geocoders) distinguish between different shop types without first parsing the tag as a value list. Some data consumers do parse shop=* as a value list, such as Mapbox Streets, but there isn’t enough software in that category for many mappers to feel confident in using a value list with that key.

Some other apparent primary feature keys are also commonly set to value lists, especially traffic_sign=*. There isn’t as much of a practical downside for these keys, which don’t enjoy as much software support in the first place. Before editors added presets for various craft=*, office=*, and man_made=* tags, I recall that it was more common to set them to value lists too.

1 Like

My current list of top-level keys is

  • amenity
  • tourism
  • shop
  • leisure
  • office
  • craft
  • emergency
  • man_made
  • traffic_calming
  • barrier
  • advertising
  • highway
  • natural
  • power
  • historic
  • military
  • attraction
  • aeroway
  • railway
  • landuse
  • boundary
  • building
  • building:part
  • waterway
  • cemetery
  • aerialway
  • public_transport
  • telecom
  • landcover
  • healthcare

Each of them may be without prefix, or with one of lifecycle prefixes. Such as

  • construction:
  • disused:
  • abandoned:
  • ruins:
  • demolished:
  • removed:
  • razed:
  • destroyed:
  • was:
  • former:
  • closed:

So for example shop and was:shop

And yes, not all values of this keys would form top-level tags but nearly all will - and listing manually all top-level tags was too much effort for my uses.

(I maintain my own listing in osm_bot_abstraction_layer/osm_bot_abstraction_layer/tag_knowledge.py at master · matkoniecz/osm_bot_abstraction_layer · GitHub - I am not claiming it is complete/superior to others, but I am using it for some data processing, with some new entries appearing from time to time - listing Key:telecom - OpenStreetMap Wiki in recognize telecom= as a main key · matkoniecz/osm_bot_abstraction_layer@60f4d48 · GitHub was I think latest serious change)

6 Likes

Perhaps it is also worth noting that the top-level-ness of a key can change over time. For example there is currently an open proposal to allow information= to be a top level key, rather than a refinement of tourism=.

https://wiki.openstreetmap.org/wiki/Proposal:Top-level_information_tag

3 Likes

-was
-former
(and maybe closed too)

would seem to be wrong in this context as they by definition can only be added to existing elements that are already objects ‘on their own’. Definitely they strech a useful definition of a top-level object.

maybe information could be added?

Thats simple: none.

2 Likes

+1, or also: all of them. It depends what you mean by “can” and what you expect to be the outcome. Some data consumers actually do split semicolon separated values into several objects, but it is safe to assume that for the current state of things, using multiple values for “feature tag keys” will likely result in your mapping being discarded by most data consumers.

2 Likes

I would say that was:shop=hairdresser name=Foobar is having top level tag was:shop=hairdresser and can be validly tagged for few months on a free-floating node.

But in the end it depends on exact use what you want to consider as top-level

It depends. Many renderers, including OSM Carto, will symbolize a multi-value shop=* with the generic icon for any unrecognized shop type and a multi-value office=* with the generic icon as for almost any kind of office. However, most renderers would ignore a multi-value amenity=* or man_made=*, because they’d have no idea how to symbolize it with an intuitive icon. Geocoders such as Nominatim would still index the feature as a generic “amenity” or “man-made feature”.

Getting away with these multi-value feature tags is relatively complicated and subjective, so as you say, it’s a lot easier to just tell mappers to avoid the practice if at all possible. This sometimes entails bending other OSM rules like “One feature, one element” as a result.

2 Likes

That stretches the meaning of support a lot.

1 Like