Faster updates: Is O5M format "dead?" Update speed of O5M vs PBF

I was reading up and experimenting with the O5M format and its tools (osmconvert, osmupdate, read support in Osmium). Its documentation suggested that it was faster than pbuf at the cost of some space but more compact than a regular osm file.

Right now it takes my virtual machine (20 GB RAM, virtual disk on a 4x NVME SSD, 12 threads of an AMD R9 3900X) 20-30 minutes to update planet.osm.pbf with pyosmium-up-to-date. I was looking at ways to speed that up. It does not seem to be disk bound as moving it to a faster SSD didn’t help, nor did splitting the input and output file between disks. It’s mostly CPU bound to one primary thread, possibly with the osmium reader and writer objects implementing multiple IO threads. The bulk of the update time is spent in one function: std::set_union (complexity 2*(N_1 + N_2) - 1 ). I read up on that code and tried to replace it with stdlibc++'s parallel version but there isn’t much (IMHO) that could be parallelized and I didn’t see a notable performance increase. I can kind of understand that, trying to break a big set into two sets and then recombine them may not work. EDIT: It would help if I enabled the parallel processing correctly, but it also appears that the I/O iterators are not random access.

Anyway, I tried comparing formats with only 240 GB of disk to play with. As I mentioned updating a .pbuf takes 23 minutes in my most recent example and is about 66 GB of storage. I didn’t even bother to work with osm.bz2 as even a simple read/verify with osmium was ridiculously slow. I tried using the osmupdate tools to create and update a .o5m file from the .pbf but it was even slower and resulted in a file that was more than double the size (134 GB). Was I supposed to specify .gz compression as well? That’s even slower. At first I wondered if it was because I was going from .pbf to .o5m and maybe an .o5m to .o5m would be faster. But even the other .o5m tools were slow, as were simple readins from osmium.

So I wonder, if in the years since O5M was developed, enough improvements were made to the PBF tools (maybe the multithreaded readers and writers in osmium?) that the performance improvements of O5M were eclipsed.

I hate replying to my own posts, but my experiments have provided some useful information that may benefit someone else.

by changing the format options (running pyosmium-up-to-date with --format pbf,add_metadata=false,pbf_compression=lz4) I can strip out the metadata I don’t need or use, and get a speedier file to work with.
updated planet.osm.pbf ~22.5 minutes to update, 67 GB disk space, 11 minutes to filter my desired tags (osmium tags-filter for OpenRailwayMap), 2:30 minutes to run error check (osmium fileinfo -e)
lz4,stripped meta osm.pbf ~15-16 minutes to update, 64 GB disk space, 9:30 minutes to filter tags, 2 mins to run error check
stripped metadata osm.pbf [did not test update], 52 GB disk space, [did not test tag filtering], 2:19 minutes to run error check