Osmium time-filter

I’m scratching my head a bit with osmium and its ability to take an OSM snapshot at a certain time.

I simplified my case as much as possible and I don’t get it, below is what I do:

osmium tags-filter france-internal.osh.pbf nw/shop -O -o shop_filtered.osh.pbf
osmium time-filter -o old_shop_filtered.osm.pbf shop_filtered.osh.pbf 2010-01-01T00:00:00Z
osmium check-refs --show-ids old_shop_filtered.osm.pbf -v

And I get a list of ids and the summary:

There are 43648 nodes, 3294 ways, and 0 relations in this file.
Nodes in ways missing: 1030

Why are there 1030 nodes in ways missing? I thought this historical file had everything.
I’m using the Geofabrik historical file and checked the checksum is good.

What am I doing wrong?

Have you tried time slicing the file first and then filtering for shop tags? tags-filter will only work correctly on a history file if you use --omit-referenced Osmium manual pages – osmium-tags-filter (1).

Thanks for the leads.

Somehow if I time filter first then there are even more missing nodes in ways

osmium time-filter -o time_filtered.osm.pbf france-internal.osh.pbf 2010-01-01T00:00:00Z
osmium check-refs --show-ids time_filtered.osm.pbf -v

which outputs:

There are 27932872 nodes, 1586246 ways, and 32487 relations in this file.
Nodes in ways missing: 52578

I’m confused about how osmium works maybe? Does it show really show how the map looked at that specific point in time? It doesn’t matter that some nodes were later deleted, right?

I tried like you did with two difference, the date and the country:

$ osmium time-filter -o time_filtered.osm.pbf /data/openstreetmap/netherlands-internal.osh.pbf 2020-01-01T00:00:00Z
[======================================================================] 100% 
$ osmium check-refs --show-ids time_filtered.osm.pbf -v
[ 0:00] Started osmium check-refs
[ 0:00]   osmium version 1.13.2
[ 0:00]   libosmium version 2.20.0
[ 0:00] Command line options and default settings:
[ 0:00]   input options:
[ 0:00]     file name: time_filtered.osm.pbf
[ 0:00]     file format: 
[ 0:00]   other options:
[ 0:00]     show ids: yes
[ 0:00]     check relations: no
[ 0:00] Reading nodes...                                                      
[ 0:02] Reading ways...                                                       
[ 0:04] Reading relations...                                                  
[======================================================================] 100% 
There are 109650401 nodes, 14773639 ways, and 154900 relations in this file.
Nodes in ways missing: 0
[ 0:04] Memory used for indexes: 848 MBytes
[ 0:04] Peak memory used: 1377 MBytes
[ 0:04] Done.

If I change the date to 2010-01-01T00:00:00Z I also get also missing nodes:

Nodes in ways missing: 5784

Doing some more experiments I see the problem was solved in the year 2012.

1 Like

The problem might be with the france-internal.osh.pbf. If that was created with a node missing then that node will also be missing in every file generated from it.

Generally working with history files is messy and you’ll encounter all sorts of strange corner cases. Most of the time you can probably ignore those, because it will just be a few nodes moved from inside to outside the area of interest or something like it.

Of course there might also be a bug somewhere, not many people are using history files so the software has not seen a lot of testing.

Amazing

Thanks @emvee, yeah I only have a very small number of node in ways missing from 2013 onwards and they disappear when I filter for shops so it solves my problem! :smiley:

@Jochen_Topf Thanks for your tips as well! It’s good to know for the future. I believe you are the main maintainer of osmium so thanks a lot for your work, it is super useful :+1:

Also, maybe I’m stupid but after hours of looking at the docs.osmcode to do ‘add-locations-to-ways’ and getting some of that data into a regular text file, I finally saw the answer on your blog :sweat_smile:

osmium cat germany-with-loc.osm.pbf \
       -o germany-with-loc.osm.bz2 -f osm,locations_on_ways

I believe it would be super useful to add that line or similar on Osmium manual pages – osmium-cat (1) or Osmium manual pages – osmium-add-locations-to-ways (1) with a mini-explanation because I never thought of adding ‘locations_on_ways’ to the output format.

At least for the Netherlands history file the problem is still present for 2012-07-18T12:00:00Z and one minute later, 2012-07-18T13:00:00Z there is no problem anymore.

Querying the changesets for this time range gives the majority of changes are done by the OSMF Redaction Account, for an example see 12282421, 12282504.

So that more or less proves the data is in the data itself.

1 Like

If you start with the pre-redaction history file https://planet.openstreetmap.org/cc-by-sa/full-experimental/full-planet-120401-final.osm.bz2 you get no missing nodes.

As a general rule if you want to look at pre-April 2012 you’ll need this older file.

1 Like