Geodesk ... eine neue geospatiale Datenbank für OSM-Daten?

chris66 · December 10, 2022, 2:26pm

Gelöst. Man muss es in der Form -f=xml schreiben.

GeoDeskTeam · December 10, 2022, 4:39pm

We’ve only tested on JDK 16 and above (We figured early adopters will use a recent JDK). It should work fine on JDK 13 (Some functionality may trigger a link exception). It definitely won’t work on pre-13 (at least not without modification). If you build from source, you will need at least 14.

GeoDeskTeam · December 10, 2022, 4:42pm

Great idea! Is this an attribute of the osm parent element? (The Wiki page claims “No official .xsd Schema exists”)

I’ve created this GitHub issue.

chris66 · December 10, 2022, 4:46pm

So:

<osm version="0.6" generator="geodesk gol/0.1.2" upload="false">

mmd · December 10, 2022, 4:48pm

Or even upload = never (see #12731 (Add an option to completely prevent upload of a layer : e.g. "never" to upload=true/false) – JOSM)

This is all very JOSM specific I believe. I’m not sure if other apps also support it.

GeoDeskTeam · December 10, 2022, 4:52pm

The conversion from WGS-84 to Mercator and back is lossless up to 100 nanodegrees (the resolution used by OSM). However, you will need to set --precision=7 (By default, output precision is rounded to 6 digits, accurate to 10cm).
Maybe default precision should be 7 instead?

GeoDeskTeam · December 10, 2022, 5:00pm

A possible solution would be to use negative IDs for synthetic node IDs, but I’m concerned that some downstream consumers may choke on those (But still, as far as I understand the upload process, negative IDs would result in new nodes being created, so the outcome is still unacceptable).

If we implement the option to keep all node IDs, the produced XML will be suitable for editing. Until then: Prevent users from uploading XML with synthetic node IDs · Issue #69 · clarisma/gol-tool · GitHub

mmd · December 10, 2022, 5:47pm

Right, negative ids would cause new objects to be created during upload. Another option might be a positive offset value which is a bit larger than the largest ids used for osm objects, but not too large that some tools would fail due to excessive memory requirements. Any attempt to upload such data would fail b/c the object ids don’t exist on the main osm database.

As an example, maybe take a look at extract and check-refs use too much RAM with numerically high node IDs · Issue #234 · osmcode/osmium-tool · GitHub

chris66 · December 10, 2022, 8:14pm

Das Dingens ist schon erschreckend schnell:

gol query germany "na[amenity='post_box']" -f=xml >box.osm
Retrieved 76.305 features in 960ms

Alle Briefkästen in Deutschland in weniger als einer Sekunde.

mmd · December 10, 2022, 8:24pm

Jetzt müsste in box.osm nur noch etwas drin stehen… bei mir sind das 0 Bytes. (steht weiter oben im Faden auch schon mal sehe ich gerade).

chris66 · December 10, 2022, 8:30pm

Dann muss was falsch sein bei Dir.
22.042.474 box.osm

mmd · December 10, 2022, 8:31pm

Ok, ich probier nochmal mit der Version 0.1.2 und einer neuen DB.

→ Ja, lag noch an der alten 0.1.0 Version, mit 0.1.2 passt das Ergebnis.

mmd · December 10, 2022, 9:05pm

Ich schaue mir das gerade mal im Vergleich zu Overpass an. Gebe ich jeweils nur 1 CPU als Ressource frei (mit taskset --cpu-list 1), brauchen beide lustigerweise ziemlich genau 1,6 Sekunden.

Ohne Einschränkung läuft gol dann in 670ms durch, benötigt dafür allerdings 3,2s an User Time (im Vergleich zu 1,65s für Overpass), ganz einfach weil im Mittel 3,8 CPUs belegt werden. Es scheint, dass gol im Moment noch etwas zu stark parallelisiert und für einen relativ kleinen Laufzeitgewinn dann überproportional CPU-Ressourcen verbraucht.

chris66 · December 10, 2022, 9:12pm

Interessant. Warum ist OP dann im Browser so lahm? Web-Overhead oder eher weil 100 Queries im Schnitt parallel laufen auf dem Server?

mmd · December 10, 2022, 9:19pm

Wie immer ist viel los auf dem Server, etwas Web-Overhead / Latenz spielt sicherlich auch noch rein, ist aber eher von untergeordneter Bedeutung.

Und klar, es macht auch einen Unterschied, ob ich nur eine DB mit Germany teste, oder einen kompletten Planet, wo ich dann zusätzlich auf Deutschland filtern muss (ich habe oben jeweils Germany geladen und mit dieser Version getestet).

GeoDeskTeam · December 11, 2022, 4:44pm

Most of this time is actually burned up by the formatter, which for OSM-XML is relatively slow (It has to build an object graph and re-create the untagged way-nodes). Ironically, the “simple” formatters (e.g. GeoJSON) are currently slower still because of this bug (The GeoJSON formatter is due to be replaced by a parallelized version anyway).

The query itself should run in < 200ms on a quadcore machine (You can approximate this with-f=count).

Yes, the query engine is quite greedy and will use every core it can. This is probably overkill for a basic query like this, and will need some tuning. Parallelization pays off for spatial predicates (intersects, within, etc. – coming in 0.2, enabled as Preview now in 0.1.2) because the topological checks are CPU-heavy.

mmd · December 11, 2022, 5:49pm

For reference purposes, I’m also posting a few runtimes for the “count” use case:

gol

Single CPU

/usr/bin/time -v taskset --cpu-list 1 bin/gol query germany "na[amenity=post_box]" -f=count --precision=7 
75408

Retrieved 75.408 features in 346ms
	Command being timed: "taskset --cpu-list 1 bin/gol query germany na[amenity=post_box] -f=count --precision=7"
	User time (seconds): 0.58
	System time (seconds): 0.07
	Percent of CPU this job got: 96%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.67

Many CPUs


/usr/bin/time -v  bin/gol query germany "na[amenity=post_box]" -f=count --precision=7 
75408

Retrieved 75.408 features in 105ms
	Command being timed: "bin/gol query germany na[amenity=post_box] -f=count --precision=7"
	User time (seconds): 1.46
	System time (seconds): 0.12
	Percent of CPU this job got: 491%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.32

Overpass

Query: nw[amenity=post_box];out count;

<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="Overpass API 0.7.59.120 (mmd) 84edf1af">
<note>The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.</note>
<meta osm_base=""/>

  <count id="0">
    <tag k="nodes" v="75398"/>
    <tag k="ways" v="10"/>
    <tag k="relations" v="0"/>
    <tag k="total" v="75408"/>
  </count>

</osm>
	Command being timed: "src/osm3s_query --db-dir=db"
	User time (seconds): 0.34
	System time (seconds): 0.04
	Percent of CPU this job got: 99%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.39

GeoDeskTeam · December 12, 2022, 5:00pm

Thanks, this is very helpful.

I continue to be blown away by how far the JVM has come. Quarter-second startup time, and that even includes setting up the gol program itself and opening the database.

We’ve fretted about class loading, bytecode verification, bounds checks at every corner, and of course GC, but these things have essentially become non-issues (It does defer JIT compilation – as the name implies – which competes with query execution in the early phase).

Strong showing by Overpass as well. Is it using indexing for this type of query?

(By the way, are these 4 physical cores, or 2 cores hyper-threaded as 4?)

chris66 · December 12, 2022, 5:09pm

Feature Request : Would it be possible to add a link to the OSM object (like in Overpass Turbo) in the map view (-f=map)?

grafik

GeoDeskTeam · December 13, 2022, 12:40pm

Good idea.
I’m leaning towards keeping the tooltip (vs. popup, which requires clicking instead of hovering), and instead make the feature clickable.

github.com/clarisma/gol-tool

`query`: Make map features clickable

opened 12:09PM - 13 Dec 22 UTC

clarisma

enhancement

For `-f=map`: - Clicking on a feature navigates to URL - Use option `-f:link…` to specify URL (by default, links to the feature on the main OSM website) - `-f:link=none` disables linking - The following placeholders are valid in a URL: - `$type`: The OSM type of the feature (`node`, `way`, `relation`) - `$id`: The feature's OSM id - `$t`: Type as a letter (`n`, `w`, `r`) - `$T`: `$t` as uppercase letter - Offer presets for common OSM tools? - iD editor: - https://www.openstreetmap.org/edit?$type=$id - must also specify map zoom and center (e.g. `#map=20/48.17392/11.55887`) in order to bring the feature into focus - Could calculate this from the feature's bbox - Would be nice for iD to do this by default, ask for enhancement? - JOSM: - http://127.0.0.1:8111/load_object?objects=w259346776 - Looks like JOSM needs nodes of ways and members of relations - See clarisma/geodesk#57 - Is there a way to get JOSM to automatically fetch nodes/members? - Alternative display: - Use popup instead of tooltip - Requires user to click on feature (rather than just hover) - But popup is "sticky" (useful if there are lots of tags; but #73 could help) - Would then need to click on an explicit link in the popup Requires: simple templates, #71