Geodesk ... eine neue geospatiale Datenbank für OSM-Daten?

Wie immer ist viel los auf dem Server, etwas Web-Overhead / Latenz spielt sicherlich auch noch rein, ist aber eher von untergeordneter Bedeutung.

Und klar, es macht auch einen Unterschied, ob ich nur eine DB mit Germany teste, oder einen kompletten Planet, wo ich dann zusätzlich auf Deutschland filtern muss (ich habe oben jeweils Germany geladen und mit dieser Version getestet).

1 Like

Most of this time is actually burned up by the formatter, which for OSM-XML is relatively slow (It has to build an object graph and re-create the untagged way-nodes). Ironically, the “simple” formatters (e.g. GeoJSON) are currently slower still because of this bug (The GeoJSON formatter is due to be replaced by a parallelized version anyway).

The query itself should run in < 200ms on a quadcore machine (You can approximate this with-f=count).

Yes, the query engine is quite greedy and will use every core it can. This is probably overkill for a basic query like this, and will need some tuning. Parallelization pays off for spatial predicates (intersects, within, etc. – coming in 0.2, enabled as Preview now in 0.1.2) because the topological checks are CPU-heavy.

3 Likes

For reference purposes, I’m also posting a few runtimes for the “count” use case:

gol

Single CPU

/usr/bin/time -v taskset --cpu-list 1 bin/gol query germany "na[amenity=post_box]" -f=count --precision=7 
75408

Retrieved 75.408 features in 346ms
	Command being timed: "taskset --cpu-list 1 bin/gol query germany na[amenity=post_box] -f=count --precision=7"
	User time (seconds): 0.58
	System time (seconds): 0.07
	Percent of CPU this job got: 96%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.67

Many CPUs


/usr/bin/time -v  bin/gol query germany "na[amenity=post_box]" -f=count --precision=7 
75408

Retrieved 75.408 features in 105ms
	Command being timed: "bin/gol query germany na[amenity=post_box] -f=count --precision=7"
	User time (seconds): 1.46
	System time (seconds): 0.12
	Percent of CPU this job got: 491%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.32

Overpass

Query: nw[amenity=post_box];out count;

<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="Overpass API 0.7.59.120 (mmd) 84edf1af">
<note>The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.</note>
<meta osm_base=""/>

  <count id="0">
    <tag k="nodes" v="75398"/>
    <tag k="ways" v="10"/>
    <tag k="relations" v="0"/>
    <tag k="total" v="75408"/>
  </count>

</osm>
	Command being timed: "src/osm3s_query --db-dir=db"
	User time (seconds): 0.34
	System time (seconds): 0.04
	Percent of CPU this job got: 99%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.39

1 Like

Thanks, this is very helpful.

I continue to be blown away by how far the JVM has come. Quarter-second startup time, and that even includes setting up the gol program itself and opening the database.

We’ve fretted about class loading, bytecode verification, bounds checks at every corner, and of course GC, but these things have essentially become non-issues (It does defer JIT compilation – as the name implies – which competes with query execution in the early phase).

Strong showing by Overpass as well. Is it using indexing for this type of query?

(By the way, are these 4 physical cores, or 2 cores hyper-threaded as 4?)

Feature Request : Would it be possible to add a link to the OSM object (like in Overpass Turbo) in the map view (-f=map)?

grafik

1 Like

Good idea.
I’m leaning towards keeping the tooltip (vs. popup, which requires clicking instead of hovering), and instead make the feature clickable.

1 Like

This is now supported in Version 0.1.3: Clicking on a feature on a generated Leaflet map takes the user to the OSM object.

2 Likes

Danke, funktioniert. :heart_eyes:

Als Anregung hier mein golm.bat (angepasstes gol.bat), welches die generierte Map automatisch anzeigt:

@echo off
set javacmd=\apps\gol-tool\jdk17\bin\java

%javacmd% -Xmx12g -cp %~dp0\..\lib\gol.jar -Dfile.encoding=UTF-8 com.geodesk.gol.GolTool %* -f=map -f:color=#0000FF > map.html

start map.html

Beispielaufruf:

golm query germany "na[tourism=aquarium]" :fish: :tropical_fish:

3 Likes

Yes, it’s using an index for the tags key/values. I should note that my measurements were based on my heavily modified fork. The runtimes for the count use case above using the latest official release look more like this:

<osm version="0.6" generator="Overpass API 0.7.59.1 2a9d9642">
<note>The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.</note>
<meta osm_base=""/>

  <count id="0">
    <tag k="nodes" v="75398"/>
    <tag k="ways" v="10"/>
    <tag k="relations" v="0"/>
    <tag k="total" v="75408"/>
  </count>

</osm>
	Command being timed: "./osm3s_query --db-dir=db"
	User time (seconds): 12.04
	System time (seconds): 0.33
	Percent of CPU this job got: 99%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:12.37

I’m using an 8 core Intel i9 at 2.4 GHz.

1 Like

Is there a possibility to query based on meta-data such as user name or date-range of timestamp? Or is something like querying meta data planned in later versions?

1 Like

20x speedup, excellent job! When will this go live?

Currently, GOLs don’t contain any OSM metadata. We don’t have any immediate plans to add metadata support, though we’re considering an option to turn edit timestamps and user information into additional tags (which can then be queried).

Could you tell me more about your use case? Are you looking for features edited within a time span, in a specific area, or are you gathering planet-wide statistics (similar to HDYC)?

Thanks for your reply. At the moment there are two use cases in my mind:
First is about cleanup of untouched imported data. E.g., TIGER-data in the US. The user name and time-range of the import is known. Those query is pretty heavy, so public overpass server only works on small areas. As GeoDesk is pretty easy to setup on my local computer, my idea was to do this locally.

Second one is similar like hdyc, but maybe more individual (in regards of areas) and as well also a bit similar like Tag-info to see how popular (in question of users using it) a specific tag is in a specific region.

Thanks! This is a bit of a challenging topic. Upstream merging turned out to be too time consuming and in the end not feasible at all. OTOH, I don’t have enough free time to operate my own instance. Currently, only testing on a dev server is available for a limited audience.

Can you share an example, maybe?

https://wiki.openstreetmap.org/wiki/TIGER_fixup/Overpass_queries

You can find some in the wiki above, mainly the issue is the amount of data you get as a result. :wink:

Thanks! I worked with those before, and yes, they’re triggering some known performance issues.

If the amount of data is the main concern, maybe don’t try to display the results in overpass turbo, but download them to a file instead. Overpass turbo really has a hard time displaying dozens of megabytes.

Follow up for the TIGER queries is here now: TIGER data quality - #24 by mmd

Wo finde ich nochmal die Binaries für Windows vom GOL Tool?

@chris66
Also ich werde direkt auf http://www.geodesk.com/download fündig,

dort gibt es doch aktuell das zip-Archiv für die Version 0.1.8 … oder?

2 Likes

Ah, danke. Ich hatte auf github gesucht.