Learnings from trying to create a page offering thematic extracts

Often I find myself in a situation when I need OSM data focusing on a specific topic. I either use overpass turbo to get such a dataset, or download an extract from Geofabrik and filter that one using ogr2ogr or osmium. Both methods involve some manual repetitive work until I get to a GPKG (my current favourite file format).

So, I wanted to build a website which offers thematic extracts. This turns out to be more complicated then I initially thought. For context, I don’t own hardware or have hardware expertise, so I am reliant on cloud providers.

I use these 3 services:

  1. GitHub pages to serve frontend code
  2. GitHub scheduled workflows to run extraction jobs daily
  3. Supabase object storage to store results of extraction jobs

GitHub scheduled workflows are limited by size, so forget processing a planetfile or similar (unless one is happy to pay GitHub for the workflows). The Supabase service I can access without providing a bank card is limited to 1 GB total storage, max file size 50 MB. I tried other providers, but I found there is always a catch:

  • Backblaze: you can have a free account, you can’t create a public bucket unless you give payment info (then you have to worry about egress charges)
  • Hetzner: cheap storage, but again, if I give payment info, then I can theoretically face runaway egress charges
  • AWS, Azure, GCP: I use these enough at work, I wanted to try something new in my freetime (+ theoretical threat of runaway egress charges.)

The end result is this pretty miniscule website: OSM extracts (source code).

I could probably include some more regions, albeit not large ones (due to the 50MB file size limit). My main takeaway is that it is not an accident that a truly robust website offering GPKG (or similar) extracts does not exist, creating one takes more effort than a Sunday afternoon. I hope next time I have a similar idea, I’ll just go outside instead.

3 Likes

Yes, storage costs something.
Maybe you could contact some local OSM associations that runs an infrastructure with some spare storage on their servers? After all you don’t need that much space.

Good idea, thank you.
If someone says they need this service, I’ll improve it. Right now I’m not aware of any actual prospective users, so I am not planning on continuing on the short term.

With at least some of the options you get (a) a fairly large amount of egress traffic included, so that you’d have to be favourited on “the next Pokemon Go’s message board” to cause problems and (b) have the ability to control spend invoice by invoice.

Maybe I’m paranoid: leaving it open and uncontrolled worries me, as someone malicious could also automate downloading lots of data. I don’t know why anyone would do it, but it still would stress me out.

Good to know! If I find someone who finds the website helpful (or someone who would find it helpful if it had more data), then I’ll investigate more.

If we know someone who needs something like this, please tell them to message me, I’d be glad to help them.