I encourage you to figure out yourself and then write an article about it. How?
Look at the statistics! While in principle you can look at all the world at once, I suggest to pick a small region you are familiar with, because the final step is to manually screen a substantial number of objects. First the high-level view for my home city:
[out:csv(n,w,r,key)];
area[name="Wuppertal"];
nwr(area);
make info key="_all",n=count(nodes),w=count(ways),r=count(relations)->.all;
.all out;
nwr._(if:count_tags()>0)->.tagged;
make info key="_tagged",n=tagged.count(nodes),w=tagged.count(ways),r=tagged.count(relations)->.tagged;
.tagged out;
for (keys())
{
if (count(nwr) > 10000)
{
make info key=_.val,n=count(nodes),w=count(ways),r=count(relations);
out;
}
}
This yields a table in which highway
, building
, and multiple addr:
prefixed tags have the highest number. Any list of top level tags thus must include highway
and building
, otherwise you have a plenty of unexplained objects. By contrast, it is an act of human judgement to assume addr:
will highly overlap with building
and ignore it for now.
Filter out that first category and adjust the threshold in line 12 from 10000 to 1000.
[out:csv(n,w,r,key)];
area[name="Wuppertal"];
nwr(area)
[!building][!highway];
make info key="_all",n=count(nodes),w=count(ways),r=count(relations)->.all;
.all out;
nwr._(if:count_tags()>0)->.tagged;
make info key="_tagged",n=tagged.count(nodes),w=tagged.count(ways),r=tagged.count(relations)->.tagged;
.tagged out;
for (keys())
{
if (count(nwr) > 1000)
{
make info key=_.val,n=count(nodes),w=count(ways),r=count(relations);
out;
}
}
Now in the table the keys amenity
, landuse
, and natural
have prominent numbers. The keys name
and operator
have substantial numbers as well, but again are by policy decision not top-level tags.
We repeat with again a lower threshold and after excluding this keys as well:
[out:csv(n,w,r,key)];
area[name="Wuppertal"];
nwr(area)
[!building][!highway]
[!amenity][!natural][!landuse];
make info key="_all",n=count(nodes),w=count(ways),r=count(relations)->.all;
.all out;
nwr._(if:count_tags()>0)->.tagged;
make info key="_tagged",n=tagged.count(nodes),w=tagged.count(ways),r=tagged.count(relations)->.tagged;
.tagged out;
for (keys())
{
if (count(nwr) > 300)
{
make info key=_.val,n=count(nodes),w=count(ways),r=count(relations);
out;
}
}
We see two other motivations to pick top-level tags now:
- the tags
railway
, waterway
, and shop
have notable but not compelling numbers but are promoted by human judgement of the importance of features
- the tags
entrance
for nodes and type
for relations are so dominant in their object type that we shall not step over them
Note that while almost all relations have a type
tag that you then still know very little about relations. This is more of a technicality.
Next and final spreadsheet round:
[out:csv(n,w,r,key)];
area[name="Wuppertal"];
nwr(area)
[!building][!highway]
[!amenity][!natural][!landuse]
[!railway][!waterway][!shop][!entrance][!type];
make info key="_all",n=count(nodes),w=count(ways),r=count(relations)->.all;
.all out;
nwr._(if:count_tags()>0)->.tagged;
make info key="_tagged",n=tagged.count(nodes),w=tagged.count(ways),r=tagged.count(relations)->.tagged;
.tagged out;
for (keys())
{
if (count(nwr) > 100)
{
make info key=_.val,n=count(nodes),w=count(ways),r=count(relations);
out;
}
}
The number of tagged objects which are written in line 3 before _tagged
is at over 10k still too much for a decent map display. This is decided unilaterally for me to also pick leisure
, barrier
, man_made
, and tourism
as top-level tags, somewhat semantically arbitrary.
This allows to now bring the rest on the map:
area[name="Wuppertal"];
nwr(area)
[!building][!highway]
[!amenity][!natural][!landuse]
[!railway][!waterway][!shop][!entrance][!type]
[!leisure][!barrier][!man_made][!tourism];
nwr._(if:count_tags()>0);
out center;
To get this very overloaded map display uncluttered, I have again picked a bunch of keys which are prominent in the map display:
area[name="Wuppertal"];
nwr(area)
[!building][!highway]
[!amenity][!natural][!landuse]
[!railway][!waterway][!shop][!entrance][!type]
[!leisure][!barrier][!man_made][!tourism];
nwr._(if:count_tags()>0);
out center;
Hence the final observation is that there are a lot of valid objects with no numerous top-level tag (think of traffic_sign
) to pick a random example where the key fully explains the category of object, but where the objects themselves are less numerous or less important.
To wrap this up:
- Some keys must be amongst the top-level tags because of their sheer number:
building
, highway
- Some keys are excluded by human judgement (
addr:
) or project policy (name
because we are not a database of pure names with coordinates). Were there enough people to maintain an alternative view on the data then this would be most likely absolutely viable.
- In the middle ranks, no significant enough patterns exist, hence some human judgement must be involved
- There remain a lot of little top-level tags where less numerous tags characterize less important objects. I.e. and functional list of a comprehensive map will be quite long
I will admire you I you manage to put this into proper English and a polished presentation to have a documentation of this in the wiki. I personally will always too much stick to the numbers and structures to get that reader-friendly.