I encourage you to figure out yourself and then write an article about it. How?
Look at the statistics! While in principle you can look at all the world at once, I suggest to pick a small region you are familiar with, because the final step is to manually screen a substantial number of objects. First the high-level view for my home city:
[out:csv(n,w,r,key)];
area[name="Wuppertal"];
nwr(area);
make info key="_all",n=count(nodes),w=count(ways),r=count(relations)->.all;
.all out;
nwr._(if:count_tags()>0)->.tagged;
make info key="_tagged",n=tagged.count(nodes),w=tagged.count(ways),r=tagged.count(relations)->.tagged;
.tagged out;
for (keys())
{
if (count(nwr) > 10000)
{
make info key=_.val,n=count(nodes),w=count(ways),r=count(relations);
out;
}
}
This yields a table in which highway, building, and multiple addr: prefixed tags have the highest number. Any list of top level tags thus must include highway and building, otherwise you have a plenty of unexplained objects. By contrast, it is an act of human judgement to assume addr: will highly overlap with building and ignore it for now.
Filter out that first category and adjust the threshold in line 12 from 10000 to 1000.
[out:csv(n,w,r,key)];
area[name="Wuppertal"];
nwr(area)
[!building][!highway];
make info key="_all",n=count(nodes),w=count(ways),r=count(relations)->.all;
.all out;
nwr._(if:count_tags()>0)->.tagged;
make info key="_tagged",n=tagged.count(nodes),w=tagged.count(ways),r=tagged.count(relations)->.tagged;
.tagged out;
for (keys())
{
if (count(nwr) > 1000)
{
make info key=_.val,n=count(nodes),w=count(ways),r=count(relations);
out;
}
}
Now in the table the keys amenity, landuse, and natural have prominent numbers. The keys name and operator have substantial numbers as well, but again are by policy decision not top-level tags.
We repeat with again a lower threshold and after excluding this keys as well:
[out:csv(n,w,r,key)];
area[name="Wuppertal"];
nwr(area)
[!building][!highway]
[!amenity][!natural][!landuse];
make info key="_all",n=count(nodes),w=count(ways),r=count(relations)->.all;
.all out;
nwr._(if:count_tags()>0)->.tagged;
make info key="_tagged",n=tagged.count(nodes),w=tagged.count(ways),r=tagged.count(relations)->.tagged;
.tagged out;
for (keys())
{
if (count(nwr) > 300)
{
make info key=_.val,n=count(nodes),w=count(ways),r=count(relations);
out;
}
}
We see two other motivations to pick top-level tags now:
- the tags
railway, waterway, and shop have notable but not compelling numbers but are promoted by human judgement of the importance of features
- the tags
entrance for nodes and type for relations are so dominant in their object type that we shall not step over them
Note that while almost all relations have a type tag that you then still know very little about relations. This is more of a technicality.
Next and final spreadsheet round:
[out:csv(n,w,r,key)];
area[name="Wuppertal"];
nwr(area)
[!building][!highway]
[!amenity][!natural][!landuse]
[!railway][!waterway][!shop][!entrance][!type];
make info key="_all",n=count(nodes),w=count(ways),r=count(relations)->.all;
.all out;
nwr._(if:count_tags()>0)->.tagged;
make info key="_tagged",n=tagged.count(nodes),w=tagged.count(ways),r=tagged.count(relations)->.tagged;
.tagged out;
for (keys())
{
if (count(nwr) > 100)
{
make info key=_.val,n=count(nodes),w=count(ways),r=count(relations);
out;
}
}
The number of tagged objects which are written in line 3 before _tagged is at over 10k still too much for a decent map display. This is decided unilaterally for me to also pick leisure, barrier, man_made, and tourism as top-level tags, somewhat semantically arbitrary.
This allows to now bring the rest on the map:
area[name="Wuppertal"];
nwr(area)
[!building][!highway]
[!amenity][!natural][!landuse]
[!railway][!waterway][!shop][!entrance][!type]
[!leisure][!barrier][!man_made][!tourism];
nwr._(if:count_tags()>0);
out center;
To get this very overloaded map display uncluttered, I have again picked a bunch of keys which are prominent in the map display:
area[name="Wuppertal"];
nwr(area)
[!building][!highway]
[!amenity][!natural][!landuse]
[!railway][!waterway][!shop][!entrance][!type]
[!leisure][!barrier][!man_made][!tourism];
nwr._(if:count_tags()>0);
out center;
Hence the final observation is that there are a lot of valid objects with no numerous top-level tag (think of traffic_sign) to pick a random example where the key fully explains the category of object, but where the objects themselves are less numerous or less important.
To wrap this up:
- Some keys must be amongst the top-level tags because of their sheer number:
building, highway
- Some keys are excluded by human judgement (
addr:) or project policy (name because we are not a database of pure names with coordinates). Were there enough people to maintain an alternative view on the data then this would be most likely absolutely viable.
- In the middle ranks, no significant enough patterns exist, hence some human judgement must be involved
- There remain a lot of little top-level tags where less numerous tags characterize less important objects. I.e. and functional list of a comprehensive map will be quite long
I will admire you I you manage to put this into proper English and a polished presentation to have a documentation of this in the wiki. I personally will always too much stick to the numbers and structures to get that reader-friendly.