I’m currently using osmupdate software to get up to date regional pbf files, probably like many here.
I was wondering if this software is able to produce an OSC file containing changes between two dates?
I’m currently able to produce an OSC file containing changes between a past date and current time but not between two past dates.
Is it actually possible or should I make a feature request somewhere?
Unfortunately, derive-changes command expects two osm data files I haven’t.
osmupdate handles diffs download and processing on its own in a more convenient way than processing two whole datasets. That’s why it would be more interesting to me to process diffs between those two dates and let unchanged data out.
Let me elaborate; getdiff program downloads change files using sequence number not date, it has a range function where you can download a range of change files between 2 sequence numbers.
The script mentioned, uses a function to merge large number of change files by calling osmium. You might need to modify this script a little.
No, my goal is to accumulate a log of changes between two dates for ProjetDuMois platform
It’s the only way to properly count contributions and users involved.
Usually changes are loged daily, but problems / migration or even tagging changes may happen and require to rebuild the complete log even for past projects so the two dates may be at different times in the past.
I think OSC files are a poor source of accumulating a log of changes, an OSC file does not give what changed just the new state:
To see what changed you also need to know the old state
Your are missing changes if multiple updates are done during the time the OSC file is for.
Instead I would go for a history file:
With a history file you can create your own date extract of the for every data/time you specify.
Doing that for two data/time’s you can create the diff between these data/time’s
But even better you can filter the history file and get the statistics directly from that
A history file has all the versions of the nodes, way and relations. It is ordered like a normal .pbf file first node with all versions, then the remain nodes with all versions, then all ways and then all relations.
What I do, PBF_FN is the name of the history file:
with Popen(['osmium', 'show', '--no-pager', '--format-opl', PBF_FN], stdout=PIPE, encoding='utf-8') as proc:
for line in proc.stdout:
# Parse the .opl file, see https://osmcode.org/opl-file-format/
# Further process the data
Will take some programming but that way you can catch every change very flexible.
If you end up doing object-by-object processing in Python, you might as well use pyosmium directly. The above then becomes:
import osmium
for obj in osmium.FileProcessor(PBF_FN):
# Further process the data
See the pyosmium manual for more information. pyosmium has a concept of prefiltering objects of interest to speed up processing. There is no time filter right now but I can see how that would be interesting for processing history data.
That’s an interesting point: I’m only considering processing official (daily) diffs, that must contain all versions as to allow to keep distributed OSH/OSM in sync with main database.
I agree others OSC may be optimized to only expose the last state in their time span and we should not use them.
ProjetDuMois is currently country focused and makes use of a national history file. We are deriving changes from this OSH. I intend to get rid of this because it prevents to manage projects at world scale, since maintaining a worldwide OSH is too heavy. It contains 99% of useless data for us.
Regarding the original state, it’s still possible to init the changes log from a planet file at the appropriate date and then feed it with changes filtered appropriately.
So I’m currently investigating daily diffs filtering to keep only changes that covers the defined projects in the platform as to save the hassle to keep the OSH in sync.
Unless I got stuck in retrieving daily diffs from a specific time span
Once a change file is calculated, it is assigned a sequence number and a timestamp values and written to a corresponding“state.txt” file, you retrieve both from that “state.txt” file. Those values never change. The chance of one value being compromised is the same for both.
The getdiff program downloads ALL change files and their corresponding state.txt files between the specified sequence numbers, a sorted list of downloaded files is written by the program in “rangeList.txt” file. Program does not merge change files. If your process requires a different range of change files, then you need to use getdiff program to download a new range with different sequence numbers.
Osmium can take a list of change files in its merge function; but if you have a large number of files to merge, then it is better to break down that list. I mentioned the script because it uses a bash function called “mergeListOSC()” that does that for you using the produced sorted list from getdiff program in “rangeList.txt”.
Program getdiff requires change file sequence number if you can not provide that then it will not help you. Both sequence number and timestamp (date) values are in the change file corresponding state.txt file, look them up in that file then getdiff can download a range of change files for you. I am not trying to convince you to use getdiff.
So the best solution currently is to use pyosmium and possibly a script like getdiff to get necessary diffs between two dates.
I’m convinced it works but I currently doesn’t have python in ProjetDuMois project, only nodejs + bash
So I would be even happier to have it in osmconvert (osmium already stated it won’t implement download of diffs for relevant reasons)
Indeed, whatever the software used, defining options to independently download diffs until today or between two dates in the past should be standard.
This discussion makes me think I barely know the maintenance status of such key software projects.
Osmium looks like actively maintained while osm-c-tools don’t get so much updates. It could mean it’s perfectly stable or on the contrary lacking some involvement (not mandatory, it’s provided as this without any warranty)
Please do give it a try and report back issues you encounter.
Also, please be careful when working with dates. They are always only an approximation towards the end of the next diff file. If you need strict continuity between the diff files you create with pyosmium-get-changes, then you can use --end-date to give you and approximate end of downloads. But then you must save the actual ID of the last diff and use that as the start for the next call of pyosmium-get-changes. You can use the --sequence-file option which does magically the right thing for you here.