While parsing the latest OSM world data, my parser reported that over 1000 ways contain node references to non-existant nodes. Most of the missing nodes IDs were higher than the highest node ID in the OSM world file. I manually verified that these nodes are not in the file using a hex editor. I have downloaded the last two world files and both had the same problem.
Does this indicate some sort of corruption in the OSM database or a bug in the OSM world data exporter?
For now I am just skipping ways that contain non-existant nodes. Is this the best way to solve this issue currently?
Note that the time stamp and changeset for the last way are after the time stamp and change set of the last node. I have a feeling that the final bunch of nodes are not getting saved to the world file.
Note: Not all the missing nodes had IDs past the highest in the file. Some were earlier IDs that were simply missing. I am going to assume that these nodes were re-allocated recently and like the other recent nodes did not get written when the planet file was created.
This is expected behaviour from the weekly planet dump. There is no guarantee of integrity on any data that was added to the OSM db after the time the planet file generation has started.
It takes a few hours to generate the planet file, and when the nodes have been dumped to the file, someone can still add a new way and associated new nodes. This could then show up only as a way, later in the file.
The only way to get a good file with referential integrity is to take the weekly planet dump, and add the next day’s daily diff file to that:
Fetch the weekly planet (e.g. planet-090708.osm.bz2) and next day’s diff (e.g. 20090708-20090709.osc.gz)
It’s just a dump, and for a lot of needs quite useful already as-is.
That’s not to say what you propose couldn’t be done, either additionally or instead of the current dump. I suggest you send your proposal to the dev mailing list, where the admins are more likely to see it than on this forum.