Who doesn’t love statistics?
Well, admittedly, this new type of GeoDesk query lacks the coolness of an immersive 3D renderer, but it does provide (sometimes surprising) insights into OpenStreetMap data.
GeoDesk stats
reports are similar to Taginfo, except that they can be tailored to specific areas and types of features. They allow you to discover:
- What are commonly used keys/tags for (Feature X)?
- How popular are certain tags in different regions of the world?
- How do different regions compare with regards to data completeness?
For example, these are some things I’ve learned comparing historic=castle
in Germany vs. France:
- In Germany, more than half of all castles are tagged with an associated
wikipedia
article; in France, less than one third. - However, in France the
start_date
tag is applied nearly twice as frequently and mostly lists the century (peaking in the 16th), and theheritage
tag finds far more use. - The
castle_type
tag is significantly more popular in Germany (61% of all castles) vs. France (25%). - German castles are 4 times more likely to be
ruins
than French. - For German castles, a street address is listed in over a third; for French castles, an
addr:street
tag is a rarity. - For nearly one in five of German castles, there’s an image; in France, that ratio
is one in a hundred.
What you’ll need
- The latest GeoDesk GOL Tool (Version 0.1.4)
- Java JDK Version 16 or above
- A GOL file (how to create one)
Usage
gol query <gol> <query> -f=stats [<options>]
<gol>
: The GOL file (extension may be omitted)<query>
must be in GOQL format (similar to MapCSS)
Common options
-
f:tally
:What to report (keys
,tags
,roles
,count
,length
orarea
) -
-t
(--tags
): One or more tag keys to use in the report (Used for-f:tally=count|length|area|roles
) -
-f:unit=<unit>
: Units to use for-f:tally=length
and-f:tally=area
-
-f:min-tally=<count>|<percentage>
: Don’t include rows in the report if the subtotal
is less than count or less than percentage of total (default:1%
) -
-f:split-values
: Treat;
-separated items in a tag value as distinct values
(See full documentation at docs.geodesk.com)
For example, for the castle analysis above, use:
gol query germany na[historic=castle] -f=stats
gol query france na[historic=castle] -f=stats
(You can build germany.gol
and france.gol
from an OSM-PBF file, like the extracts provided by Geofabrik)
Reports will look like this:
5,701 features /key /count
==================================================
historic 5,701 100.0%
= castle 5,701 100.0% 100.0%
name 5,320 93.3%
wikidata 3,626 63.6%
castle_type 3,505 61.5%
= stately 1,523 43.5% 26.7%
= defensive 1,185 33.8% 20.8%
= manor 527 15.0% 9.2%
= fortress 111 3.2% 1.9%
...
You can run these queries on an entire GOL, or restrict them to a specific area (a state or city, or any arbitrary region) with -a=
<polygon-file>
.
To get a polygon file from a GOL, use a command like this:
gol query <gol> -f=poly a[boundary=administrative][admin_level=4][name=California] > ca.poly
More examples
What are the most common types of restaurants (and typical names)?
gol query <gol-file> -f=stats
na[amenity=restaurant] -t=cuisine,name -f:tally=count -f:split-values -f:min-tally=20
cuisine name
======================================================
- - 1,365 1.3%
greek Akropolis 158 0.2%
italian L’Osteria 126 0.1%
- Zur Linde 118 0.1%
greek Poseidon 103 0.1%
...
------------------------------------------------------
Total 101,311 100.0%
Tip: Use
-f:split-values
to split tag values such asitalian;pizza;seafood
into individually tallied items.
Which hotel chains are most prominent?
gol query <gol-file> -f=stats
na[tourism=hotel] -t=operator -f:tally=count
What are typical opening hours of shops?
gol query <gol-file> -f=stats
na[shop] -t=opening_hours -f:tally=count
Which are the longest rivers?
gol query <gol-file> -f=stats
w[waterway=river][name] -t=name -f:tally=length -f:unit=km
Tip: Length units are meters (‘m’) by default; use
-f:unit
to specifykm
,mi
or others.
What is the distribution of surface quality among different road types?
gol query <gol-file> -f=stats
wa[highway] -t=highway,surface -f:tally=length
What are the predominant forms of land use?
gol query <gol-file> -f=stats
a[landuse] -t=landuse -f:tally=area -f:unit:ha
Tip: Length units are square meters (‘m’) by default; use
-f:unit
to specify square kilometers (km
), hectares (ha
) or others.
What roof types are found on buildings with 6 levels or more?
gol query <gol-file> -f=stats
"a[building][building:levels >= 6]" -t=roof:shape -f:tally=count
What are typical roles on various types of restriction relations?
gol query <gol-file> -f=stats
r[type=restriction] -t=restriction -f:tally=roles
Looking Ahead
As part of our mission to make OpenStreetMap data more accessible, we continue to add capabilities to GeoDesk. As always, we appreciate your feedback! For questions or other issues, please open a ticket in our GitHub repository.