PBF format - PrimitiveBlock independence question

Hello mappers !

I’m playing with PBF parser and got quite suprised with, as far as I’m concerned, my misunderstanding of the following fragment from PBF wiki:

“Each PrimitiveBlock is independently decompressable, containing all of the information to decompress the entities it contains.”

In my humble understanding it means that with given set of let’s say ways in my primitive block I can expect to have all referenced nodes in the same primitive block as well. That idea gave the chance of nice and efficient decomposition of maps data into separate but still complete and functional blocks with constrained size which leads to constrained memory requirements (that’s important in embedded environment). However, it’s not like that. It turns out, that during read at first I’m receiving PrimitiveBlocks with nodes only, when nodes are finished the ways are starting to flow but with no Nodes. There is only one single PrimitiveNode with Ways and Nodes available.

Is it a feature or a bug ? Being just curious.

ps. I’ve used Coarse.pbf from geofabrik as input data.

Below is a log of execution for those interested in.

Source PBFFIle is: /Downloads/corse.osm.pbf
File position is:0
Length is:13
Finished processing block #1
File position is:142
Length is:13
Trying to handle # nodes: 8000
Trying to handle # ways: 0
Finished processing block #2
File position is:63206
Length is:13
Trying to handle # nodes: 8000
Trying to handle # ways: 0
Finished processing block #3
File position is:111791
Length is:13

(…) INTERESTING PART, SOMEWHERE IN THE MIDDLE OF THE FILE

Finished processing block #123
File position is:5859954
Length is:13
Trying to handle # nodes: 8000
Trying to handle # ways: 0
Finished processing block #124
File position is:5910818
Length is:13
Trying to handle # nodes: 1469
Trying to handle # ways: 6531
Finished processing block #125
File position is:6221011
Length is:13
Trying to handle # nodes: 0
Trying to handle # ways: 8000
Finished processing block #126
File position is:6526596
Length is:13
Trying to handle # nodes: 0
Trying to handle # ways: 8000
Finished processing block #127
File position is:6755364
Length is:13
Trying to handle # nodes: 0
Trying to handle # ways: 8000
Finished processing block #128
File position is:6992315
Length is:13

(…) PROCESSING OF REST OF THE WAYS, NOTHING SPECIAL HAPPENS

Trying to handle # nodes: 0
Trying to handle # ways: 5982
Finished processing block #136
File position is:8828886

Finished.

Hi alurg,

the PBF format is very flexible. Originally, it was intended to store OSM objects (nodes, ways, relations) sorted by geographical region. This would have made it possible do get regional extracts very fast. But - in the end - it turned not out that way…

That’s right: first ALL nodes, then ALL ways, and finally ALL relations. This is the usual sequence in several other formats too, e.g. .osm, .osc, .osh, .o5m, .o5c.
I think there are two reasons why PBF format does adhere to this strict OSM object sequence:

  • a lot of programs expect OSM objects being supplied in this order,

  • the PBF file size would increase because you would have to store duplicates of all ways and relations which have nodes in more than one geographical region.

Nevertheless - as mentioned above - the PBF format itself is flexible enough. So I think you can use it to store the OSM objects in every sequence you like. But unfortunately, not every software will cope with other than the unusual object sequence.

Marqqs, thanks a lot ! Now it’s clear.

I finally ended up with the approach that You described. I’ve just postprocessed pbf data to have it in self-sufficient tile blocks and now it suits my needs.

That’s great! How did you do this?