How can I reliably get all changed/updated ways based on an OSC ChangeFile?

I’m working on a solution to acquire, and keep updated, OSM data in a Postgres (PostGIS) database.

I’m using currently using osm2pgsql to import data from Geofabrik.

In order to keep this data up-to-date I am using Pyosmium (a Python wrapper around Osmium) to download ChangeFiles in an OSC file format (which is just XML).

Osm2pgsql can then read the change files and perform updates.

The challenge that I’m facing is that I also need to raise events based on what ways have changed. Basically, I need to figure out exactly what roads/tracks have just been altered in order to notify downstream systems. The downstream system needs to be notified about changed roads/tracks because it needs to reprocess the data for our internal business needs.

The problem is that I’m not sure how to determine exactly what has changed. From reading around, the OSC/XML OsmChange files are not actually a good choice for this purpose. The reason for this, as I understand, is that a node could change and that node is part of a way but the way itself has not changed. For example, if the shape of an off-road track has been changed because the lat/long of a node has changed, then the ChangeFile will include the node, but not the way. This means that I can’t raise an event about the way based off the OSC file.

According to this answer and this answer, they tend to agree that the OSC files don’t include enough context about what has really changed.

Then I wondered if I can use osm2pgsql’s processing callbacks. But the problem I noticed is that, according to the documentation:

These functions are called for each new or modified OSM object in the input file.

Well we’ve already established that the input file (OSC OsmChange file) doesn’t have enough information and context.

So my question is: Based on an OSC OsmChange file that is about to be given to osm2pgsql, how can I determine exactly which roads/tracks are about to be updated, so that I can raise events/notifications for downstream systems?

The thing to note is that osm2pgsql -can- detect such changes, because it doesn’t just rely on the contents of the OCS file, it has the full prior state too (aka the OSM data prior to applying the diffs). For example it can regenerate a ways or a multipolygons geometry when a node has moved.

A decade ago I did something similar to what you want to do with database triggers, I’m sure that can be done in a more elegant fashion now with osm2pgsql internal means.

crossposted the same problem:

I would attempt using the osmium-getparents command
( “to get parents of objects from an OSM file” ).

“Get objects referencing the objects with the specified IDs from the input and write them to the output. So this will get ways referencing any of the specified node IDs and relations referencing any specified node, way, or relation IDs. Only one level of indirection is resolved, so no relations of relations are found and no relations referencing ways referencing the specified node IDs.” Osmium manual pages – osmium-getparents (1)

And since it only examines one level, I think it needs to be called twice.
Example code:

osmium cat -f opl 004.osc.gz | cut -d' ' -f1 > changes00.id
osmium getparents hungary-latest.osm.pbf --add-self --id-file changes00.id -f opl | cut -d' ' -f1 > changes01.id
osmium getparents hungary-latest.osm.pbf --add-self --id-file changes01.id -f opl | cut -d' ' -f1 > changes02.id

I think this can already be simply imported as CSV into PostgreSQL and merged with existing tables.

Of course, this is just a starting point;
the osmium-getparents can be combined with other ideas.

I also think it’s important to filter out false positive changes.
For example, if someone adds a fixme tag to a node and nothing else changes, I would not bother with it.
However, handling this is slightly more complex.