Geodesk ... eine neue geospatiale Datenbank für OSM-Daten?

Gelöst. Man muss es in der Form -f=xml schreiben.

2 Likes

We’ve only tested on JDK 16 and above (We figured early adopters will use a recent JDK). It should work fine on JDK 13 (Some functionality may trigger a link exception). It definitely won’t work on pre-13 (at least not without modification). If you build from source, you will need at least 14.

Great idea! Is this an attribute of the osm parent element? (The Wiki page claims “No official .xsd Schema exists”)

I’ve created this GitHub issue.

1 Like

So:

<osm version="0.6" generator="geodesk gol/0.1.2" upload="false">

1 Like

Or even upload = never (see #12731 (Add an option to completely prevent upload of a layer : e.g. "never" to upload=true/false) – JOSM)

This is all very JOSM specific I believe. I’m not sure if other apps also support it.

1 Like

The conversion from WGS-84 to Mercator and back is lossless up to 100 nanodegrees (the resolution used by OSM). However, you will need to set --precision=7 (By default, output precision is rounded to 6 digits, accurate to 10cm).
Maybe default precision should be 7 instead?

1 Like

A possible solution would be to use negative IDs for synthetic node IDs, but I’m concerned that some downstream consumers may choke on those (But still, as far as I understand the upload process, negative IDs would result in new nodes being created, so the outcome is still unacceptable).

If we implement the option to keep all node IDs, the produced XML will be suitable for editing. Until then: Prevent users from uploading XML with synthetic node IDs · Issue #69 · clarisma/gol-tool · GitHub

Right, negative ids would cause new objects to be created during upload. Another option might be a positive offset value which is a bit larger than the largest ids used for osm objects, but not too large that some tools would fail due to excessive memory requirements. Any attempt to upload such data would fail b/c the object ids don’t exist on the main osm database.

As an example, maybe take a look at extract and check-refs use too much RAM with numerically high node IDs · Issue #234 · osmcode/osmium-tool · GitHub

1 Like

Das Dingens ist schon erschreckend schnell:

gol query germany "na[amenity='post_box']" -f=xml >box.osm
Retrieved 76.305 features in 960ms

Alle Briefkästen in Deutschland in weniger als einer Sekunde.
:sunglasses:

3 Likes

Jetzt müsste in box.osm nur noch etwas drin stehen… bei mir sind das 0 Bytes. (steht weiter oben im Faden auch schon mal sehe ich gerade).

Dann muss was falsch sein bei Dir.
22.042.474 box.osm

Ok, ich probier nochmal mit der Version 0.1.2 und einer neuen DB.

→ Ja, lag noch an der alten 0.1.0 Version, mit 0.1.2 passt das Ergebnis.

2 Likes

Ich schaue mir das gerade mal im Vergleich zu Overpass an. Gebe ich jeweils nur 1 CPU als Ressource frei (mit taskset --cpu-list 1), brauchen beide lustigerweise ziemlich genau 1,6 Sekunden.

Ohne Einschränkung läuft gol dann in 670ms durch, benötigt dafür allerdings 3,2s an User Time (im Vergleich zu 1,65s für Overpass), ganz einfach weil im Mittel 3,8 CPUs belegt werden. Es scheint, dass gol im Moment noch etwas zu stark parallelisiert und für einen relativ kleinen Laufzeitgewinn dann überproportional CPU-Ressourcen verbraucht.

2 Likes

Interessant. Warum ist OP dann im Browser so lahm? Web-Overhead oder eher weil 100 Queries im Schnitt parallel laufen auf dem Server?

Wie immer ist viel los auf dem Server, etwas Web-Overhead / Latenz spielt sicherlich auch noch rein, ist aber eher von untergeordneter Bedeutung.

Und klar, es macht auch einen Unterschied, ob ich nur eine DB mit Germany teste, oder einen kompletten Planet, wo ich dann zusätzlich auf Deutschland filtern muss (ich habe oben jeweils Germany geladen und mit dieser Version getestet).

1 Like

Most of this time is actually burned up by the formatter, which for OSM-XML is relatively slow (It has to build an object graph and re-create the untagged way-nodes). Ironically, the “simple” formatters (e.g. GeoJSON) are currently slower still because of this bug (The GeoJSON formatter is due to be replaced by a parallelized version anyway).

The query itself should run in < 200ms on a quadcore machine (You can approximate this with-f=count).

Yes, the query engine is quite greedy and will use every core it can. This is probably overkill for a basic query like this, and will need some tuning. Parallelization pays off for spatial predicates (intersects, within, etc. – coming in 0.2, enabled as Preview now in 0.1.2) because the topological checks are CPU-heavy.

3 Likes

For reference purposes, I’m also posting a few runtimes for the “count” use case:

gol

Single CPU

/usr/bin/time -v taskset --cpu-list 1 bin/gol query germany "na[amenity=post_box]" -f=count --precision=7 
75408

Retrieved 75.408 features in 346ms
	Command being timed: "taskset --cpu-list 1 bin/gol query germany na[amenity=post_box] -f=count --precision=7"
	User time (seconds): 0.58
	System time (seconds): 0.07
	Percent of CPU this job got: 96%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.67

Many CPUs


/usr/bin/time -v  bin/gol query germany "na[amenity=post_box]" -f=count --precision=7 
75408

Retrieved 75.408 features in 105ms
	Command being timed: "bin/gol query germany na[amenity=post_box] -f=count --precision=7"
	User time (seconds): 1.46
	System time (seconds): 0.12
	Percent of CPU this job got: 491%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.32

Overpass

Query: nw[amenity=post_box];out count;

<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="Overpass API 0.7.59.120 (mmd) 84edf1af">
<note>The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.</note>
<meta osm_base=""/>

  <count id="0">
    <tag k="nodes" v="75398"/>
    <tag k="ways" v="10"/>
    <tag k="relations" v="0"/>
    <tag k="total" v="75408"/>
  </count>

</osm>
	Command being timed: "src/osm3s_query --db-dir=db"
	User time (seconds): 0.34
	System time (seconds): 0.04
	Percent of CPU this job got: 99%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.39

1 Like

Thanks, this is very helpful.

I continue to be blown away by how far the JVM has come. Quarter-second startup time, and that even includes setting up the gol program itself and opening the database.

We’ve fretted about class loading, bytecode verification, bounds checks at every corner, and of course GC, but these things have essentially become non-issues (It does defer JIT compilation – as the name implies – which competes with query execution in the early phase).

Strong showing by Overpass as well. Is it using indexing for this type of query?

(By the way, are these 4 physical cores, or 2 cores hyper-threaded as 4?)

Feature Request : Would it be possible to add a link to the OSM object (like in Overpass Turbo) in the map view (-f=map)?

grafik

1 Like

Good idea.
I’m leaning towards keeping the tooltip (vs. popup, which requires clicking instead of hovering), and instead make the feature clickable.

1 Like