Imposm3 planet.osm.pbf on a small VPS

I have a small app that expects the output of imposm3. I’m only extracting a subset of the data such as buildings, addresses and POI’s. Right now I only use a country .pbf export.

Now I want to start my “production” app with planet.osm.pbf on the smallest hetzner server having 4GB RAM and currently 60gb storage. The first bottleneck seems to be the initial import of planet.osm.pbf. The imposm3 doc mention the following

An import in diff-mode on a Hetzner AX102 server (AMD Ryzen 9 7950X3D, 256GB RAM and NVMe storage) of a 78GB planet PBF (2024-01-29) with generalized tables and spatial indices, etc. takes around 7:30h. This is for an import that is ready for minutely updates. The non-diff mode is even faster.

Now my question is:
Do I really need all of that hardware if I don’t care about the duration too much?
What is the minimally required hardware given days for the initial import?

I tried seeding the database from my development machine (48GB RAM, plenty storage + CPU) via SSH Port Forwarding to my production database but ran into an obscure "no space left on disk in query COPY" error after 8 hours. Yet when I check the storage usage on my server, it still had 25GB of storage available.

I can attach a bigger disk to my server, but I don’t even know how much storage I realistically would need. Maybe it’s a RAM issue? I really don’t want to over provision too hard since I plan to let it run for a long time without any return on investment.

As usual, once I give up, write something, I find a possible solution.
The -appendcache flag of imposm3 could to be the magic. Need to see whether all of the country files of https://download.geofabrik.de work.

1 Like

the issue, even if you only maintain a small extract in the database , will be the cache size.

But cache size should be on the host that imposm3 is running on, right?

My appendcache setup seems to work. But the VPS is still too slow and I added 500GB of storage. 2 shared vCPU and 4GB RAM also seem to be too little. Right now it will take approximately 5 days to import all of Europe, so roughly 20 Days for the planet. So whoever wants to use a small VPS, 2 CPU cores and 4GB RAM are too little for the world. But 8GB RAM and 4 Cores seem to be reasonable. 10 Days for the initial import.

Day 3. I’m giving up. I made Europe from A-Z and made it till Germany. 3.5 Euro per Month is clearly too little to pay.

Short answer: Yes.

In the OpenStreetMap data model, ways contain a list of node IDs only. However, PostGIS (and almost all other GIS software) store lines and polygons as a list of coordinates. Therefore, data consumes like Osm2pgsql or Imposm have to maintain a cache of all node locations (a mapping of uint64 → (int32, int32)) to build the geometry of the linestring/polygon.

You could store this cache in swap memory but it will make things painfully slow because accessing the cache is purely random IO. It will not take hours or days, it will take multiple weeks for the whole planet.

Consider switching to Osm2pgsql. It is as fast but works with 128 GB RAM only (maybe even 96 GB). Compared to a couple of years ago, it can be configured better. Back in these days, Imposm was a better choice for some use cases. But nowadays Osm2pgsql has caught up.

I recommend one of the dedicated servers by Hetzner, not their (little) cloud machines.

1 Like

Can you describe your use case?

Sadly me an the AI were not intelligent enough to write the required lua scripts. Also I never really managed to get the data output clean enough to be consumed in a simple app. Likely my issue, but I had none of these when going for imposm3.

Just extracting specific data into tables such as restaurants that can be consumed by a boring Ruby on Rails app. These might be shown on a off-the-shelf vector tiles map afterwards, but also on a HTML table next to it.

How often do you need to update this data?

On that point, did you try actually looking around yourself for real-life examples of osm2pgsql lua scripts, starting with for example the defaults that ship with it, or the examples bundled with a commonly-used map style such as OSM Carto?

What about looking for examples that fillet only the data that you want out of the planel file before even trying to load it?

As a completely different approach, why not generate the database (including removing all information that you don’t want) on a large machine and then just copying it to a small one? I don’t mean “mess about with port forwarding” here, I mean conclude the import (and removal of what you don’t want locally) and then copying and importing it on the small one.

I dream of minutely updates. Maybe whatever geofabrik offers. But for now I just wish to get the world once.

I noticed that scaling up and down a server is a lot easier these days. I switched to a 32gb ram, 16 core hetzner VPS. The country/area imports are way faster now and seem to be doable within 24 hours. Costs 50 Fr or so, but I will hopefully only need it for a few days.

I tried whatever was around and run into dependency issues immediately. It was hard to get a pure example working. Once I had that, I struggled with the data model. I think I even had duplicated OSM ID’s somehow. Anyway, none of that happened on imposm3. Imposm3 just works and played very well with Ruby on Rails. All I need to do after an import is re-adding indexes.

Dream no longer :slight_smile:

If you tell us what you did in what order people will be able to help you, but saying something like that doesn’t give anyone any information on what you actually did, and therefore no-one will be able to help.

I do know that, as of the last update to the switch2osm site when the three main guides were updated (because the bleeding edge of OSM Carto moved to “flex”), the new procedure was tested and it worked “soup to nuts”. The fourth main guide (Docker) is locked to an earlier OSM Carto release and therefore did not need updating.

1 Like

Sorry, I would if I remembered. Last time I tried osm2pgsql was 9 months ago. A while after that I tried impsom3 and it just worked as expected.

I will give it another shot if my imposm3 approach fails on my faster server.

1 Like

GeoDesk may be a good fit for your needs. It is specifically designed to perform well even in resource-constrained environments. It uses a single-file database that is able to store the planet in less than 100 GB. On a 16-core / 32 GB RAM setup, importing the planet takes about 20 minutes.

To access the database, there are 3 APIs (Java, Python, and now C++ as well). For basic queries, you can also use a command-line utility that uses a syntax similar to Overpass and outputs results in various formats (e.g. GeoJSON or CSV). Everything is open-source.

Currently, you can’t update a GeoDesk database (You’ll need to re-build it from a fresh .osm.pbf file), but we’ll soon be adding the ability to ingest minutely updates.