GeoDesk: A fast and compact database engine for OSM features

It looks like @ZeLonewolf needs a PBF as input into the OpenMapTiles toolchain. There currently is no PBF output option for GeoDesk queries, so unfortunately the gol tool can’t help him there. For anything that requires writing PBFs, osmium is the best bet.

However, it would be very interesting to explore GeoDesk as a backend for map rendering. This would obviate the need to import OSM data into a PostGIS database (which takes a long time and beefy hardware). For comparison, gol build will create a GOL from a planet file in about 40 minutes on a fairly low-end workstation (10-core Xeon, 32 GB RAM, consumer-grade NVMe SSD). A renderer could then query the tile regions and receive the various features, with their geometries already assembled. Such a renderer wouldn’t use gol query, but interact directly with the GeoDesk library for best performance. Querying a GOL is significantly faster than anything involving an SQL database (and a GOL from a recent planet is only about 80 GB).

Right now, we’re working on support for incremental updating (Currently, GOLs need to be rebuilt with a fresh planet file, which is fast, but not fast enough if someone wishes to update more frequently than once a day – which is a typical requirement for tile rendering).

We’re looking to eventually integrate with the various map-rendering toolchains, so thanks for pointing out this post!

At present, I only need to use the openmaptiles-tools toolchain for development renders. This process is slow because openmaptiles-tools uses a postgres database as an intermediary rather than a direct PBF-to-mbtiles processing logic.

For production renders, I use planetiler, which renders the planet pbf to an mbtiles file in the OpenMapTiles schema in about an hour on a 64-core machine with SSDs. Planetiler is a Java implementation that skips the intermediate steps in the openmaptiles-tools toolchain. planet.osm.pbf → planet.mbtiles, 1 hour. This is done with a Java profile for OpenMapTiles that essentially hard-codes the logic for generating the planet mbtiles in Java code, using planetiler as a core depedency.

THAT process is so fast that there is likely nothing to be gained from pre-filtering the planet. However, updates to planetiler lag OpenMapTiles, so it’s not useful in a “testing new features in OpenMapTiles” context.

The biggest performance gap I currently have is that it takes on the order of close to an hour to patch the (hopefully) weekly planet PBF by applying .osc hourly diffs to it. This process is single-threaded using pyosmium-up-to-date. If you had a solution to THAT, I would be very excited.

2 Likes

We don’t have anything in our arsenal for this use case, unfortunately. There’s a fast PBF reader in GeoDesk, but nothing for writing PBFs.

Have you tried using osmium apply-changes instead of pyosmium-up-to-date? libosmium (the underlying library of both) reads PBFs using multiple threads, and its PBF writer also has (at least limited) multi-thread support starting with version 2.17.0. However, the requirements of the CPython interpreter may force the Python version to run on a single threaded.

The OSM objects in a PBF generally have to be sorted by ID, which means the blocks within the file have to be written in order — this makes parallelization more complicated. (@Jochen_Topf, the main author of Osmium, may be able to shed some light on this).

If the .osc processing could be spread across multiple cores (maybe with help of a bounded priority queue to enforce the block writing order), processing time should take about 5 minutes on your kind of system (With 64 cores, it will be mostly IO-bound in this scenario).

How frequently do you want/need to update your planet file?

Osmium apply-changes and pyosmium-up-to-date are doing essentially the same internally. And yes, as @GeoDeskTeam mentions, there is some multithreading involved.

Multithreading works well for reading PBFs, but writing in multiple threads is not straightforward. The reason is less so because of the ordering requirement, but because of the way PBFs are encoded in blocks. Ideally you want blocks to contain a “reasonable” number of objects and/or bytes. But objects have widely different sizes, so you basically have to write them into the blocks before you know when the blocks have a “good” size. But you can’t start with the next block until you know where the previous block ends, so you can’t really have a different thread start on the next block while you encode the current one. You can just use fixed number of objects per block to solve this, the blocks might have different sizes, but that might not matter in every case. It might make reading somewhat less efficient, though, due to the extra per-block overhead. And you have to make sure that blocks stay below the maximum size defined by the PBF format, so you need to cover that case somehow.

And the more stuff you are doing at the same time, the more memory you need for all the “in-flight” data that’s in the process of being assembled. This can amount to many GBs of data that you keep in memory while processing. That’s why the multithreading is limited on writing in Osmium. If anybody has an idea how to make this better, please implement it and tell me. :slight_smile:

4 Likes

I think the ultimate solution in my case is for planetiler to support ingesting change files directly. It’s already holding massive in-memory and on-disk representations of the planet at various phases of its processing that are therefore independent of PBF file limitations. It’s something that’s been discussed with regard to planetiler, and they’d welcome it, but that’s certainly not trivial work.

Though I was struck by the comment:

And the more stuff you are doing at the same time, the more memory you need for all the “in-flight” data that’s in the process of being assembled. This can amount to many GBs of data that you keep in memory while processing. That’s why the multithreading is limited on writing in Osmium.

If “reducing memory use” is a limitation, I would note that machines with hundreds of GB of RAM are readily available these days and there are certainly use cases that would have no problem trading RAM for speed.

2 Likes

NB: It seemed to me that one of the major selling points for the new data model discussion was to assume exactly the opposite. Maybe it’s time to make some reasonable assumptions about current (affordable) hardware, which is backed by actual data.

3 Likes

Just to put some numbers out there – when I’m rendering the planet, I normally rent a machine on AWS that has 128GB of ram, a 64 core processor, and >1TB of SSD storage. For this kind of compute power, I pay something around $1.00 USD per hour. I spend the first hour downloading the planet and patching it with hourly diffs, and the second hour (closer to 40 minutes at this point) rendering my patched planet into an .mbtiles file. Then I copy the .mbtiles file out to a network share where my (way, way cheaper) tileserver ($30 USD / month) can access it to serve tiles.

I only render the planet occasionally (when we need an update for renderer development purposes, such as someone’s done a lot of mapping and we want to see the output), so the cost of $2 whenever I want an update is pretty reasonable.

Of course for a “production” capability, I’d want a permanent asset that can run builds continuously, but even that isn’t crazy-person prices anymore and very financially accessible in a business setting.

2 Likes

There are many different use cases for OSM data. And Osmium tries to be as useful as possible to the most number of people. Not everybody has access to Amazon machines or the money to spend on it. In fact one of the complaints I hear most often about Osmium is that it uses too much memory for this or that task. So this is a big concern for me. I want OSM to be accessible for the student with their old hand-me-down notebook. Also Osmium is a library so I have to be conscious about other uses for the memory the user might have over which I have no control.

7 Likes

If the 32 MB maximum uncompressed block size still holds, and we assume 10x size of the working set (to account for all the temporary buffers), even for 64 threads, we’re only talking 20 GB. That’s reasonable for workstation/server. 8 threads on a notebook = 2.5 GB

What do you think of this:

  • Main thread reads the pbf and hands compressed blocks to the worker threads
  • Each worker thread unzips/decodes one block, applies the changes, encodes/zips and puts it into a priority queue.
    • Worker would need to know the type/ID of the first element in the block immediately following its assigned block, in order to determine whether it should incorporate newly created objects with ID > max ID into its block. (Workers would need to pass this info and signal their peers.)
    • If a block becomes too big, split it
    • The priority queue needs to be bounded (so we don’t run out of memory in case writing to disk is slow), but always allows enqueuing of the next block to be written
  • An output thread grabs the encoded blocks from the priority queue and writes them to disk

(There are probably some subtleties I’m missing)

As for today, would you recommend going with Osmium directly vs. the Python version, or are they similar performance-wise?

Agree. I do like locations-on-ways (which is how GeoDesk stores ways internally), but getting OSM source data in this form would only cut the runtime of gol build by about 20% (and it is only used once in the lifecycle of a GOL). I wouldn’t trade a time savings of a few minutes (for the typical user) in exchange for upheaval brought to the entire OSM ecosystem.

I’d rather see the time/energy/resources spent on launching your Overpass fork we’ve discussed above, or improving the UX of JOSM — or enhancing osmium apply-changes :slight_smile:

That’s certainly a worthy goal, and being resource-conscious is always a good thing (especially in a cloud environment, where CPU/storage consumptions translates directly into billing).

Multi-threading support wouldn’t need to exclude users with low-end hardware, as fewer cores means fewer threads → less memory used.

Let me acknowledge (and thank you!) developers from the most widely used tools/library about intentional design decisions on a low resource footprint over benchmark-like speed (but with far higher baseline requirements). This is not just about hardware, but about the entire ecosystem of users around OpenStreetMap.

Anyway, this approach is still easier to optimize to use on large machines (e.g. if not explicitly RAM disks, for recurring tasks, make better use of linux page cache for the “unused RAM”) than any attempt to always hardcode a high minimum baseline for everyone. So it is somewhat unfair to think that the tools that use less resources are slower with poorly configured benchmarks.

I think one of the benchmark examples was very Blazegraph (Sophox implementation, very low concurrent access) vs Overpass (the same used in production). Ignoring a bit of a different query language as an additional option, one of the advertised advantages of Blazegraph was being able to make queries at world level without timeout like Overpass, however this was unfair: not only did Blazegraph on the benchmark don’t have full data (not just dont have attic data, but even the live data did not had geometries), but the server specs from Blazegraph were higher than Overpass.

So in general, I think it is better to take benchmarks with a grain of salt. Makes no sense to assume that a tool with default configuration to not use more resources might not benefit from fine tuning with similar time needed to other tools that claim to be faster under very specific circumstances.

1 Like

The question here really is, what you mean by “accessible”. What are the specific use cases you have in mind here?

Besides, I believe this topic needs to be looked at with a bit of a broader view:

Assuming I’m a student in a region with fairly slow internet connection, and/or a data plan that’s prohibitively expensive / limited to xyz MB/GB per month, no amount of data model changes would get me anywhere close to processing OSM data on a global scale. Downloading a planet file would take days, and the occasional power outage doesn’t help either.

There are some alternative options already available: Some providers, like Geofabrik, provide country extracts, or we’re asking people to download some specific data from Overpass or other online services. Also, OSMF provides free access to their development server, you only need to apply for an account via a Github issue (Sign in to GitHub · GitHub). If you’re into vector tiles, people sometimes use osmium renumber before creating tiles, to reduce the overall memory consumption, etc.

What are the use cases that cannot be covered by these alternative options, and what could be done specifically to improve the situation?

1 Like

I quickly wanted to comment on this one as well. I added an option in Overpass to process PBF files with locations-on-ways to skip the node lookup part during way processing. Since Overpass is very unhappy with missing nodes, I had to use “–keep-untagged-nodes”.

Tests on a 2012 planet file showed a processing time of 56 minutes instead of 66 minutes. Estimating the effect of --keep-untagged-nodes, I’d say, 40 minutes should be possible.

This sounds like an impressive 40% improvement. However, I did those tests on my fork. Upstream would take around 4-5 times longer. When you put 1.1 hours or 0.6 hours in relation to 5 hours, it doesn’t seem like a huge step forward anymore.

1 Like

(Not to sound presumptuous or like I know it all, especially since these are not my circumstances nor of anyone I know personally, but) you’re making some assumptions you don’t know are true. Specifically, it seems you’re assuming that anyone who wants to work with the OSM data will download everything themselves. One of the most common requests (e.g. to Organic Maps) was being able to share map files directly between users. So it’s credible in some parts they have somewhere where they can download files from the internet but otherwise distribute them through sneakernets.

I didn’t know about this, thanks for sharing. Maybe this is a solution for some people. I guess the point I can make here is that this is not widely advertised.

About the dev server, it may still not be the solution. Someone with consistent (even if slow) internet connection may be able to use it, while someone with intermitent internet connection may not. But this is me talking out of my ass… Everything depends on specific cases.

Either way, pushing everything onto the internet or “the cloud” just because we can is not the way to go with anything – small is better than big, local is better than remote. Why is one gonna depend on some remote server if one can do everything on one’s own computer/phone? If one depends on said remote server, what’s one gonna do if it goes away? Everyone must have heard something like this in the past, I’m just rehashing a rehash after all…

1 Like

That’s a major accomplishment!

We’ve touched upon this in the other thread — I’m very surprised that there isn’t more of an effort on the part of the Overpass maintainers to pull your changes back into the mainline (Radical architecture change? Tradeoffs that clashed with other requirements?)

It’s seems silly to consider sweeping changes to the core of OSM that may result in performance gains of 20% to 40% (in select cases), while ignoring opportunities for order-of-magnitude improvements in existing OSM tools.

1 Like

That’s the design approach behind GeoDesk: Users build a local database (or download a tileset), then run their queries directly. Overpass (as a hosted service) works great for querying specific features across a large geographical area, but isn’t suitable for high rates of repetitive queries or large data downloads (For example, discovering the surrounding features for several million points-of-interest).

Use cases of OSM data are very diverse (and hopefully will continue to grow) and will likely be met by a mix of cloud-based and local tools.

2 Likes

I think this would be a rather large undertaking. Currently, I have listed about 70-80 topics in the README file, where the fork deviates from upstream. It includes architectural and data persistency changes.

OSMF could probably fund someone for a few months to bring some of the changes back into mainline. Unfortunately, I don’t have the time to do that.

IIRC, I’ve managed >10’000 requests/s at one point, or downloading all motorways worldwide, or filtering by nearby phones for some 1.7 amenities in about 10s (overpass turbo). So in general, this isn’t completely impossible.

As a heavy user of overpass (on a private server), I would be in favor of simply adopting your fork, with all of its awesome performance improvements and, and instead paying someone to build up the supporting infrastructure to make it usable - which I think is just planet data downloads in the correct format?

It took a bit of time, but GeoDesk now has full scripting support via Python. Python is easy to learn and opens the door to a universe of tools for cartography and data analysis. Check it out and let us know what you think!

2 Likes