Missing Nodes in Ways

While parsing the latest OSM world data, my parser reported that over 1000 ways contain node references to non-existant nodes. Most of the missing nodes IDs were higher than the highest node ID in the OSM world file. I manually verified that these nodes are not in the file using a hex editor. I have downloaded the last two world files and both had the same problem.

Does this indicate some sort of corruption in the OSM database or a bug in the OSM world data exporter?

For now I am just skipping ways that contain non-existant nodes. Is this the best way to solve this issue currently?

-Mark Granger

Can you give a few way ID’s and the node ID’s that are apparently missing?

Easy! Just look at the very last way in the file. All of its nodes, starting with the first one, are past the maximum node ID in the file.

Edit: To be more precise, the last way is as follows:

The last node is as follows:

This was found using Hex Edit (Version 3.0 - http://www.hexedit.com/)) on the file planet-090701.osm

Note that the time stamp and changeset for the last way are after the time stamp and change set of the last node. I have a feeling that the final bunch of nodes are not getting saved to the world file.

Note: Not all the missing nodes had IDs past the highest in the file. Some were earlier IDs that were simply missing. I am going to assume that these nodes were re-allocated recently and like the other recent nodes did not get written when the planet file was created.

-Mark Granger

In the OSM-Data the nodes seem to be available:
http://www.openstreetmap.org/browse/node/431377449

So, I guess your planet file is corrupt.

chris

This is expected behaviour from the weekly planet dump. There is no guarantee of integrity on any data that was added to the OSM db after the time the planet file generation has started.

It takes a few hours to generate the planet file, and when the nodes have been dumped to the file, someone can still add a new way and associated new nodes. This could then show up only as a way, later in the file.

The only way to get a good file with referential integrity is to take the weekly planet dump, and add the next day’s daily diff file to that:

  1. Fetch the weekly planet (e.g. planet-090708.osm.bz2) and next day’s diff (e.g. 20090708-20090709.osc.gz)

  2. bzcat planet-090708.osm.bz2 | osmosis --rxc 20090708-20090709.osc.gz --rx - --ac --wx planet-090709.osm.gz

I wish to propose a change to the way the weekly planet dumps are produced.

  1. Create the weekly planet dump file as usual but do not release it.
  2. The next day, apply the diff file as you describe to the dump file.
  3. Release the merged planet dump file.

I think that it would be worth waiting an extra day to download a non-corrupt world file.

-Mark Granger

It’s just a dump, and for a lot of needs quite useful already as-is.

That’s not to say what you propose couldn’t be done, either additionally or instead of the current dump. I suggest you send your proposal to the dev mailing list, where the admins are more likely to see it than on this forum.