Get an OSC file between two dates with osmupdate

Hello everyone,

I’m currently using osmupdate software to get up to date regional pbf files, probably like many here.

I was wondering if this software is able to produce an OSC file containing changes between two dates?
I’m currently able to produce an OSC file containing changes between a past date and current time but not between two past dates.

Is it actually possible or should I make a feature request somewhere?

Have a look at:

Hello

Unfortunately, derive-changes command expects two osm data files I haven’t.

osmupdate handles diffs download and processing on its own in a more convenient way than processing two whole datasets. That’s why it would be more interesting to me to process diffs between those two dates and let unchanged data out.

osmconvert should be able to do this, though you will need to do the download part manually and then merge the files.

Is it that you want to merge two change files?

If so you can do that using osmium-merge.

I did use osmupdate in the past but switched to osmium-up-to-date that uses multiple cores on the CPU, for me it is 4 times faster.

1 Like

check out getdiff program and then use extract2planet.sh script together will do what you want real easy.

Let me elaborate; getdiff program downloads change files using sequence number not date, it has a range function where you can download a range of change files between 2 sequence numbers.

The script mentioned, uses a function to merge large number of change files by calling osmium. You might need to modify this script a little.

1 Like

Thank you guys for useful pieces of the puzzle

That’s clearly a big advantage in favour of osmium

That’s why I liked osmupdate: we could give it a date and it deals with sequence number on its own.

How could I access to sequences numbers and be sure any change in sequence won’t break my process?

I have direct access to file system and osmium binary so merging won’t be a challenge once diff files will be in a local directory.

I am wondering why you are working with sequence number and dates at all.

Isn’t your goal just to keep your regional pbf file up-to-date?

As an experiment I downloaded flevoland-latest.osm.pbf from Geofabrik and osmium fileinfo gives:

  Options:
    generator=osmium/1.14.0
    osmosis_replication_base_url=https://download.geofabrik.de/europe/netherlands/flevoland-updates
    osmosis_replication_sequence_number=2482
    osmosis_replication_timestamp=2025-09-12T20:21:01Z

Then running pyosmium-up-to-date:

$ pyosmium-up-to-date -v -v -v  --outfile flevoland-new.pbf flevoland-latest.osm.pbf
2025-09-13 20:15:51 DEBUG: Replication information found in OSM file header.
2025-09-13 20:15:51 DEBUG: Replication URL: https://download.geofabrik.de/europe/netherlands/flevoland-updates
2025-09-13 20:15:51 DEBUG: Replication sequence: 2482
2025-09-13 20:15:51 DEBUG: Replication timestamp: 2025-09-12T20:21:01Z
2025-09-13 20:15:51 INFO: Using replication service at https://download.geofabrik.de/europe/netherlands/flevoland-updates
2025-09-13 20:15:51 DEBUG: Starting new HTTPS connection (1): download.geofabrik.de:443
2025-09-13 20:15:52 DEBUG: https://download.geofabrik.de:443 "GET /europe/netherlands/flevoland-updates/state.txt HTTP/1.1" 200 123
2025-09-13 20:15:52 DEBUG: Server is at sequence 2482 (2025-09-12 20:21:01+00:00).
2025-09-13 20:15:52 INFO: File is already up to date.

So pyosmium-up-to-date knows exactly how to update without providing any date or sequence number.

No, my goal is to accumulate a log of changes between two dates for ProjetDuMois platform
It’s the only way to properly count contributions and users involved.

Usually changes are loged daily, but problems / migration or even tagging changes may happen and require to rebuild the complete log even for past projects so the two dates may be at different times in the past.

1 Like

Okay, good to know your goal.

I think OSC files are a poor source of accumulating a log of changes, an OSC file does not give what changed just the new state:

  • To see what changed you also need to know the old state
  • Your are missing changes if multiple updates are done during the time the OSC file is for.

Instead I would go for a history file:

  • With a history file you can create your own date extract of the for every data/time you specify.
  • Doing that for two data/time’s you can create the diff between these data/time’s
  • But even better you can filter the history file and get the statistics directly from that

A history file has all the versions of the nodes, way and relations. It is ordered like a normal .pbf file first node with all versions, then the remain nodes with all versions, then all ways and then all relations.

What I do, PBF_FN is the name of the history file:

with Popen(['osmium', 'show', '--no-pager', '--format-opl', PBF_FN], stdout=PIPE, encoding='utf-8') as proc:
    for line in proc.stdout:

          # Parse the .opl file, see https://osmcode.org/opl-file-format/

          # Further process the data

Will take some programming but that way you can catch every change very flexible.

1 Like

If you end up doing object-by-object processing in Python, you might as well use pyosmium directly. The above then becomes:

import osmium

for obj in osmium.FileProcessor(PBF_FN):
  # Further process the data

See the pyosmium manual for more information. pyosmium has a concept of prefiltering objects of interest to speed up processing. There is no time filter right now but I can see how that would be interesting for processing history data.

2 Likes

That’s an interesting point: I’m only considering processing official (daily) diffs, that must contain all versions as to allow to keep distributed OSH/OSM in sync with main database.

I agree others OSC may be optimized to only expose the last state in their time span and we should not use them.

ProjetDuMois is currently country focused and makes use of a national history file. We are deriving changes from this OSH. I intend to get rid of this because it prevents to manage projects at world scale, since maintaining a worldwide OSH is too heavy. It contains 99% of useless data for us.

Regarding the original state, it’s still possible to init the changes log from a planet file at the appropriate date and then feed it with changes filtered appropriately.

So I’m currently investigating daily diffs filtering to keep only changes that covers the defined projects in the platform as to save the hassle to keep the OSH in sync.

Unless I got stuck in retrieving daily diffs from a specific time span :slight_smile:

Once a change file is calculated, it is assigned a sequence number and a timestamp values and written to a corresponding“state.txt” file, you retrieve both from that “state.txt” file. Those values never change. The chance of one value being compromised is the same for both.

The getdiff program downloads ALL change files and their corresponding state.txt files between the specified sequence numbers, a sorted list of downloaded files is written by the program in “rangeList.txt” file. Program does not merge change files. If your process requires a different range of change files, then you need to use getdiff program to download a new range with different sequence numbers.

Osmium can take a list of change files in its merge function; but if you have a large number of files to merge, then it is better to break down that list. I mentioned the script because it uses a bash function called “mergeListOSC()” that does that for you using the produced sorted list from getdiff program in “rangeList.txt”.

Indeed but I fail to find how sequence numbers can be found.

If I assume I need daily diffs between 2024-09-01 and 2025-02-01, which sequence numbers should I use and how am I supposed to find them on the fly?

Program getdiff requires change file sequence number if you can not provide that then it will not help you. Both sequence number and timestamp (date) values are in the change file corresponding state.txt file, look them up in that file then getdiff can download a range of change files for you. I am not trying to convince you to use getdiff.

With pyosmium, you can easily convert sequence to and from timestamps.

2 Likes

Thank you guys for supplementary answers

So the best solution currently is to use pyosmium and possibly a script like getdiff to get necessary diffs between two dates.

I’m convinced it works but I currently doesn’t have python in ProjetDuMois project, only nodejs + bash
So I would be even happier to have it in osmconvert (osmium already stated it won’t implement download of diffs for relevant reasons)

Adding an end_id to pyosmium’s collect_diff() function (or if you need downloading diffs in a separate step, to pyosmium-get-changes) would certainly be an option.

Indeed, whatever the software used, defining options to independently download diffs until today or between two dates in the past should be standard.

This discussion makes me think I barely know the maintenance status of such key software projects.
Osmium looks like actively maintained while osm-c-tools don’t get so much updates. It could mean it’s perfectly stable or on the contrary lacking some involvement (not mandatory, it’s provided as this without any warranty)

That’s now available on the master branch of pyomsium: Add support for end dates/IDs to diff processing functions by lonvia · Pull Request #304 · osmcode/pyosmium · GitHub

Please do give it a try and report back issues you encounter.

Also, please be careful when working with dates. They are always only an approximation towards the end of the next diff file. If you need strict continuity between the diff files you create with pyosmium-get-changes, then you can use --end-date to give you and approximate end of downloads. But then you must save the actual ID of the last diff and use that as the start for the next call of pyosmium-get-changes. You can use the --sequence-file option which does magically the right thing for you here.

1 Like