New PC Build - Making Osmfilter faster

Garry · November 11, 2022, 8:48pm

Hi, I am building a new PC as my old one is very old and things like Osmfilter and Osmosis take a long time and I do a lot of work extracting and dumping data, takes hours in some cases. I am deciding on a 5600g/32gb of ram and a m.2 3.0 PCLe hard drive. Basically my question is is it worth going with another chip that supports m.2 4.0 as the speed is twice as fast as the 3.0, will it noticeably increase the speed of Osmfilter? What will be the bottleneck?

Matija_Nalis · November 11, 2022, 11:58pm

Do you (need to) run on full planet, or are using much smaller and faster country extracts ?

Also, what is your workflow? Do you run osmfilter multiple times or do complex queries? If so, you might benefit from preprocessing step of using osmfilter --out-o5m, as the generated file is almost 10 times smaller., and run all further processing on it.

example:

% time osmfilter croatia-latest.osm --keep-nodes=lit=yes --drop-ways -o=light.osm
osmfilter croatia-latest.osm --keep-nodes=lit=yes --drop-ways -o=light.osm  22,95s user 2,23s system 99% cpu 25,184 total

% time osmfilter croatia-latest.o5m --keep-nodes=lit=yes --drop-ways -o=light2.osm
osmfilter croatia-latest.o5m --keep-nodes=lit=yes --drop-ways -o=light2.osm  4,57s user 0,80s system 99% cpu 5,368 total

you’ll notice that .o5m is 5 times faster, even when running from ramdisk. If it were on HDD (and big enough so working dataset doesn’t fit in RAM), the difference would be even bigger. Also note that 99% in CPU usage. If your time shows that it is much less CPU, that means CPU is waiting on disk to serve data to CPU, and that the disk is your bottleneck.

To know for which bottleneck to plan, well, you’d need to know what is the bottleneck now ? Disk I/O speed? CPU? Memory?

What I would do to determine that, is get some smaller country extract, convert it to .o5m, and store it on ramdisk (on GNU/Linux systems mkdir /tmp/ramdisk ; mount -t tmpfs -o size=8G none /tmp/ramdisk) and put the .o5m file there and run your osmfilter processings. (make sure that you don’t dedicate more memory to ramdisk than say 3/4 of system ram, or things will go bad, but use as big country extract as will fit to get more precise measurements).

The speed you get in those test is maximum you’ll ever get on that CPU. If that is not fast enough for you, you’ll definitely need faster CPU and/or smarter workflow (preprocessing to smaller datasets and operating on them, getting extracts instead of the whole planet, of finally not doing full reimport each time but learn how to work with diffs)

If when running from ramdisk it is plenty fast, then try the same thing on your HDD. If it is slow, then you need faster HDD (which usually means SSD if your osmfilter queries need a lot of seeking in file), or more RAM. More RAM is always better, as bigger part of file will get cached in RAM, which is always much faster than any disk (thus, if you can fit your whole dataset in RAM, you’ll get blazing speeds)

Anyway, if in doubt after those tests, with reasonably recent CPU I’d go primarily with SSD (cheapest one will do, as long as it is big enough) and invest the difference in more RAM.

stephan75 · November 12, 2022, 7:57am

By reading this topic I would like to ask (apart from the hardware question):

What is the most recent release of osmfilter?
Where is its recent sourcecode hosted?
Is osmfilter still under active maintenance, or any active fork?

SomeoneElse · November 12, 2022, 9:57am

Is osmfilter the best approach (for whatever you want to do) currently?

“osm-tags-transform” (using osmium internally) may be worth looking at.