Why osmfilter (Ubuntu 64bit) giving wrong ids (first 5k ok, later entries wrong)

Dear readers,
After a long time, I started using osmfilter+osmconvert again. Sadly the filtered planet output is only correct for the first about 6k entries, but behind it contains only wrong (already deleted) entries.
Short: download planet, convert to o5m, filter place=, and check output (.osm or .csv by editor), and see only strange ids after some point.
Hint: if output is CSV, then entries exist with printed tags, but id+lat+lon is printed wrong !

wget https://ftp5.gwdg.de/pub/misc/openstreetmap/planet.openstreetmap.org/pbf/planet-241104.osm.pbf
# apt install : osmfilter 1.4.4, osmconvert 0.8.10
# build .c : osmfilter 1.4.6, osmconvert 0.8.11  # only for comparison
osmconvert planet-241104.osm.pbf -o=planet.o5m
ls -lh planet* # pbf = 78GB, o5m = 168 GB
inxi # Ubuntu 24.04, Kernel: 6.8.0-48-generic x86_64
osmfilter planet.o5m --keep="place=state or place=city" --drop-ways --drop-relations -o=state_nd.osm
# same issue without drop ways+rela and -o=file.o5m
osmconvert state_nd.o5m --all-to-nodes --csv="@id @lon @lat place name" --csv-headline -o=eur_place.csv
@id	@lat	@lon	ISO3166-2	place	capital	population	name	name:en
313872884	46.9796562	9.1088120		state			Glarus	# ok
3289105152	0.2991701	212.5869485		city	5	20000	Gampaha	Gampaha	 # wrong, see deleted, https://www.openstreetmap.org/node/3289105152

Any ideas? (ps: Ubuntu is using versions some years behind)

Thanks, Alex

1 Like

Not sure what is your issue here, but have you considered using osmium-tools ?

2 Likes

not until now. I will try later. Does osmium have a similar speed?

  1. If it turns out to be a real issue, these tools should be marked as dangerous (also under Linux 64bit). Could someone try to confirm?
  2. I’m a long term C programmer, and I think there seems to be some kind of overflow in the list/array handling of the (many >32bit) node coordinates + internal number (array position) vs osm-id.

maybe your version is old enough not handle 64-bit IDs.

This is the diff between 0.8.10 and 0.8.11:

1,2c1,2
< // osmconvert 2018-05-27 12:00
< #define VERSION "0.8.10"
---
> // osmconvert 2020-03-31 14:20
> #define VERSION "0.8.11"
7c7
< // (c) 2011..2018 Markus Weber, Nuernberg
---
> // (c) 2011..2020 Markus Weber, Nuernberg
390,396c390,396
< "hash table. By default, it uses 1200 MB for storing a flag for every\n"
< "possible node, 150 for the way flags, and 10 relation flags.\n"
< "Every byte holds the flags for 8 ID numbers, i.e., in 1200 MB the\n"
< "program can store 9600 million flags. As there are less than 5700\n"
< "million IDs for nodes at present (May 2018), 720 MB would suffice.\n"
< "So, for example, you can decrease the hash sizes to e.g. 720, 80 and\n"
< "2 MB using this option:\n"
---
> "hash table. By default, it uses 1800 MB for storing a flag for every\n"
> "possible node, 180 for the way flags, and 20 relation flags.\n"
> "Every byte holds the flags for 8 ID numbers, i.e., in 1800 MB the\n"
> "program can store 14400 million flags. As there are less than 7400\n"
> "million IDs for nodes at present (Mar 2020), 925 MB would suffice.\n"
> "So, for example, you can decrease the hash sizes to e.g. 1000, 120\n"
> "and 4 MB using this option:\n"
398c398
< "  --hash-memory=720-80-2\n"
---
> "  --hash-memory=1000-120-4\n"
407c407
< "  --hash-memory=1500\n"
---
> "  --hash-memory=3000\n"
409,410c409,410
< "These 1500 MB will be split in three parts: 1350 for nodes, 135 for\n"
< "ways, and 15 for relations.\n"
---
> "These 3000 MB will be split in three parts: 2700 for nodes, 270 for\n"
> "ways, and 30 for relations.\n"
13149c13149
<       h_n= 1200; h_w= 150; h_r= 10;
---
>       h_n= 1800; h_w= 180; h_r= 20;

Most notably, the update changed some parameters which increased a node limit from 9600 million to 14400 million. We are right now at over 12200 million, so way over the stated limit of 9600 million for the old version, and quite near the current limit.
I haven’t looked at the code in detail, but the parameter is used for a hashmap, where being over capacity could mean that you basically get a random result, rather than a simple wrap-around.

You can set the memory limit yourself via the --hash-memory setting, e.g. --hash-memory=1800-180-20 (current default). Please check if adding that to both osmfilter and osmconvert solves the issue.