How do I merge HGT data into OSM dump with only osmium more efficiently?

Context: I’m trying to build maps for my Garmin bike computer with mkgmap, and I have compiled instructions from different sources into a script.

The step that is relevant to this thread is where I combine the elevation data created by pyhgtmap and the OSM dump downloaded from Geofabrik.

The command that I used for pyhgtmap:

pyhgtmap --polygon=/data/poly/country.poly --step=${HGT_STEPS:=20} --hgtdir=/data/pyhgtmap/hgt/ --earthexplorer-user="${EARTHEXPLORER_USER}" --earthexplorer-password="${EARTHEXPLORER_PASS}" --pbf --source=srtm3,view3,sonn3,alos1,srtm1,view1,sonn1 --alos-user="${ALOS_USER}" --alos-password="${ALOS_PASS}"

This outputs .osm.pbf files named lon*lat*.osm.pbf in /data/pyhgtmap/output. The next step is to merge these files with the OSM dump, and then the merged data goes to mkgmap’s splitter.

The commands that I have used for merging the data:

osmium merge lon*lat*.osm.pbf -o /data/pyhgtmap/output/country.merged.osm.pbf
osmconvert /data/pyhgtmap/output/country.merged.osm.pbf -o=/data/pyhgtmap/output/country.merged.o5m
osmconvert --drop-version /data/pyhgtmap/output/country.merged.o5m /data/osm-data/country.osm.pbf -o=/data/osmconvert-output/country.osm.pbf

However, since osmconvert is in maintenance mode, I wanted to only use osmium. I found that this following block also does the job, but without osmconvert:

osmium merge lon*lat*.osm.pbf -f pbf | osmium cat -F pbf - /data/osm-data/country.osm.pbf -o /data/osmium-output/country.merged.osm.pbf
osmium sort /data/osmium-output/country.merged.osm.pbf -o /data/osmium-output/country.sorted.osm.pbf --strategy multipass

Output files of both of this works with splitter.

The problem is that the osmium sort command uses a huge amount of memory (as documented in the manual), at around 29 GB for Austria, whereas with osmconvert only uses 9 GB. But without the sorting step for the newer commands, splitter complains about the data not being sorted.

The fact that osmconvert can do the same job with so much less memory usage makes me think that the osmium sort command might be overkill, and the data is partly already sorted, but the merging process has some kinks that I do not understand and causes splitter to complain. But that’s just my uneducated guess.

Does anybody have any clue, how I could efficiently merge the HGT data and OSM dump with only osmium, without having to sort the data?

Not 100% sure what you want to accomplish here but it looks to me you want to merge:

  1. The merged content of all lon*lat*.osm.pbf files
  2. /data/osm-data/country.osm.pbf

And you use osmium cat for that. The man page says clearly:

The data is not sorted in any way but strictly copied from input to output.

It looks to me that instead of cat you should use another merge.

For my interest, could you dump the first 10 lines of one of these lon*lat*.osm.pbf files using osmium cat <filename.pbf> | head -n 10 and share the output here?

Not 100% sure what you want to accomplish here but it looks to me you want to merge:

  1. The merged content of all lonlat.osm.pbf files
  2. /data/osm-data/country.osm.pbf

Yes, that’s exactly what I want to do. I tried osmium merge lon*lat*.osm.pbf /data/osm-data/country.osm.pbf -o /data/osmium-output/country.merged.osm.pbf, but splitter still complains about the output file about sorting. Hence, it is not important this part of the command you quoted to be merge or cat, because I still need to sort afterwards.

The error that splitter gives if using one single merge command for lon*lat*.osm.pbf and /data/osm-data/country.osm.pbf:

New way id 10057629 is not higher than last id 10057629
uk.me.parabola.splitter.SplitFailedException: Maybe the IDs are not sorted. This is not supported with keep-complete=true or --problem-list
	at uk.me.parabola.splitter.MultiTileProcessor.processWay(MultiTileProcessor.java:179)
	at uk.me.parabola.splitter.AbstractMapProcessor.consume(AbstractMapProcessor.java:84)
	at uk.me.parabola.splitter.OSMFileHandler.execute(OSMFileHandler.java:157)
	at uk.me.parabola.splitter.ProblemLists.calcMultiTileElements(ProblemLists.java:255)
	at uk.me.parabola.splitter.Main.useProblemLists(Main.java:503)
	at uk.me.parabola.splitter.Main.start(Main.java:127)
	at uk.me.parabola.splitter.Main.main(Main.java:81)

Output of osmium cat /data/pyhgtmap/output/europe/belgium/lon2.34_3.00lat50.00_51.00_srtm3v3.0.osm.pbf -f osm | head -n 10:

<?xml version='1.0' encoding='UTF-8'?>
<osm version="0.6" generator="osmium/1.18.0">
  <bounds minlat="50" minlon="2.34" maxlat="51" maxlon="3"/>
  <node id="10000000" lat="51" lon="2.5858333"/>
  <node id="10000001" lat="50.9991667" lon="2.5841667"/>
  <node id="10000002" lat="50.9983333" lon="2.5858333"/>
  <node id="10000003" lat="50.9979167" lon="2.585"/>
  <node id="10000004" lat="50.9983333" lon="2.5833333"/>
  <node id="10000005" lat="50.9975" lon="2.5816667"/>
  <node id="10000006" lat="50.9983333" lon="2.5808333"/>

See OpenStreetMap Statistics, OSM nodes are currently in the range 0…10055801003 so merging nodes from another source that have a number in this range will gives problems, as the man page says:

If you have objects with the same type, id, and version but different other data, the result of this command is undefined. This situation can never happen in correct OSM files, but sometimes buggy programs can generate data like this. Osmium doesn’t make any promises on what the result of the command is if the input data is not correct.

So I think you should instruct pyhgtmap not to start numbering at 10000000 but at 20000000000, the same story for way-ids.

Thanks for the heads-up! I think this is exactly where the problem is.

Just out of curiosity, does osmconvert rewrite the id or have I just been lucky that there were no duplicate ids previously?

Also, is there a upper limit to the id number?

On the upper limit, 10055801003 is 2^33,2273…, that is more than 32 bits and in computers the next limit is typically 2^64.

But for the actual answer have a look in Openstreetmap-website/Database schema, the type for a node id is int8.

The range of values for INT8 is -9223372036854775808 to 9223372036854775807.