New in GeoDesk: Tag & Role Statistics

Who doesn’t love statistics?

Well, admittedly, this new type of GeoDesk query lacks the coolness of an immersive 3D renderer, but it does provide (sometimes surprising) insights into OpenStreetMap data.

GeoDesk stats reports are similar to Taginfo, except that they can be tailored to specific areas and types of features. They allow you to discover:

  • What are commonly used keys/tags for (Feature X)?
  • How popular are certain tags in different regions of the world?
  • How do different regions compare with regards to data completeness?

For example, these are some things I’ve learned comparing historic=castle in Germany vs. France:

  • In Germany, more than half of all castles are tagged with an associated wikipedia article; in France, less than one third.
  • However, in France the start_date tag is applied nearly twice as frequently and mostly lists the century (peaking in the 16th), and the heritage tag finds far more use.
  • The castle_type tag is significantly more popular in Germany (61% of all castles) vs. France (25%).
  • German castles are 4 times more likely to be ruins than French.
  • For German castles, a street address is listed in over a third; for French castles, an addr:street tag is a rarity.
  • For nearly one in five of German castles, there’s an image; in France, that ratio
    is one in a hundred.

What you’ll need

Usage

gol query <gol> <query> -f=stats [<options>]
  • <gol>: The GOL file (extension may be omitted)
  • <query> must be in GOQL format (similar to MapCSS)

Common options

  • f:tally :What to report (keys, tags, roles, count, length or area)

  • -t (--tags): One or more tag keys to use in the report (Used for -f:tally=count|length|area|roles)

  • -f:unit=<unit>: Units to use for -f:tally=length and -f:tally=area

  • -f:min-tally=<count>|<percentage>: Don’t include rows in the report if the subtotal
    is less than count or less than percentage of total (default: 1%)

  • -f:split-values: Treat ;-separated items in a tag value as distinct values

(See full documentation at docs.geodesk.com)

For example, for the castle analysis above, use:

gol query germany na[historic=castle] -f=stats
gol query france  na[historic=castle] -f=stats

(You can build germany.gol and france.gol from an OSM-PBF file, like the extracts provided by Geofabrik)

Reports will look like this:

5,701 features                        /key  /count
==================================================
historic                     5,701          100.0%
  = castle                   5,701  100.0%  100.0%
name                         5,320           93.3%
wikidata                     3,626           63.6%
castle_type                  3,505           61.5%
  = stately                  1,523   43.5%   26.7%
  = defensive                1,185   33.8%   20.8%
  = manor                      527   15.0%    9.2%
  = fortress                   111    3.2%    1.9%
...  

You can run these queries on an entire GOL, or restrict them to a specific area (a state or city, or any arbitrary region) with -a=<polygon-file>.

To get a polygon file from a GOL, use a command like this:

gol query <gol> -f=poly a[boundary=administrative][admin_level=4][name=California] > ca.poly

More examples

What are the most common types of restaurants (and typical names)?

gol query <gol-file> -f=stats
  na[amenity=restaurant] -t=cuisine,name -f:tally=count -f:split-values -f:min-tally=20

cuisine          name
======================================================
-                -                       1,365    1.3%
greek            Akropolis                 158    0.2%
italian          L’Osteria                 126    0.1%
-                Zur Linde                 118    0.1%
greek            Poseidon                  103    0.1%
...
------------------------------------------------------
Total                                  101,311  100.0%

Tip: Use -f:split-values to split tag values such as italian;pizza;seafood into individually tallied items.

Which hotel chains are most prominent?

gol query <gol-file> -f=stats
  na[tourism=hotel] -t=operator -f:tally=count

What are typical opening hours of shops?

gol query <gol-file> -f=stats
  na[shop] -t=opening_hours -f:tally=count

Which are the longest rivers?

gol query <gol-file> -f=stats
  w[waterway=river][name] -t=name -f:tally=length -f:unit=km

Tip: Length units are meters (‘m’) by default; use -f:unit to specify km, mi or others.

What is the distribution of surface quality among different road types?

gol query <gol-file> -f=stats
  wa[highway] -t=highway,surface -f:tally=length

What are the predominant forms of land use?

gol query <gol-file> -f=stats
  a[landuse] -t=landuse -f:tally=area -f:unit:ha

Tip: Length units are square meters (‘m’) by default; use -f:unit to specify square kilometers (km), hectares (ha) or others.

What roof types are found on buildings with 6 levels or more?

gol query <gol-file> -f=stats
  "a[building][building:levels >= 6]" -t=roof:shape -f:tally=count

What are typical roles on various types of restriction relations?

gol query <gol-file> -f=stats
  r[type=restriction] -t=restriction -f:tally=roles

Looking Ahead

As part of our mission to make OpenStreetMap data more accessible, we continue to add capabilities to GeoDesk. As always, we appreciate your feedback! For questions or other issues, please open a ticket in our GitHub repository.

6 Likes