Pbf file with history (internal.osh.pbf) to postgis

Currently I load pbf files from geofabrik into postgis using osm2pgsql. This gives me the current OSM data. I now want to browse historic information in postgis. Geofabrik provides this info in internal.osh.pbf files. I now want to load this in postgis. Osm2pgsql is not suitable for this. So far it seems I have to use osmium. I know I can use osmium also to retrieve information from the file but I realy want the data in a postgis DB. Any suggestions (script) on how I should proceed?

[edit: changed osmosis to osmium and typo]

2 Likes

If only the historical metadata is important and geometry is not needed at all, then it might be worth looking at

As I see, it has a --with-history option.
I haven’t tested it yet, but it seems interesting!

Thanks @ImreSamu and @SK53 for your reactions. My main goal is to find objects that were deleted in lets say the past 5 years. I want to know what the latest version was before and after so that I know where the object was (last geometry), its tags and metadata (like last user before deletion) , when it was deleted and by whom?

It seems like @SK53 approach is suitable for my use case. If I can get this done for just highways it would be great. And if nodes can also be done that’s even better. Could you provide an example (script?) of how this can be done?

Perhaps the importer part at https://github.com/MaZderMind/osm-history-renderer/tree/master/importer is what you are looking for. However, the issue is that it was archived about 4 years ago, and it may not be so easy to build.

In a similar situation, I would try using joto’s fork: GitHub - joto/osm-history-renderer at new-osmium; but I haven’t tried it myself yet to see how well it works. ( branch = new-osmium )

Some hints in the “Extracting OSM History” section in this blog post.

I used OPL, did a bit of light preprocessing (using awk, but the tool/language just needs to be one which one can be productive in quickly); mainly to ensure all rows have a start & end date, then uploaded to what I call ‘image’ tables (i.e., no additional processing of data types so most columns get loaded as strings unless one can absolutely guarantee no odd values).

From there post-processing using PostreSQL window functions and turning tags into hstore. The most complex processing is for geometry histories because you need to straightforward, but easy to get wrong joins, between node & way dates, and then reassemble the way geometry. For this latter step it’s a good idea to give the PostgreSQL process a lot of memory!

1 Like