Accessing node data in OSM change files

Hi all,

When processing data in OSM change files, is there a smart way to access node data? All other data are available (e.g. tags) but the files reference existing nodes.

Therefore to maintain a subset of OSM it’s therefore necessary to maintain a full node database, or perform API lookups to get the existing information. Unless I’m missing something? Does anyone have any bright ideas to get all the node data referenced in an OSM change file?

Any suggestions appreciated!

Thanks, W

Yes - replication diffs contain just the new objects, and not all objects that are referenced by the new objects. This means if you want to build geometries based on the changes, you need to persist the information from the planet file you started with. osm2pgsql does with the slim tables.

1 Like

Thanks, I see. It seems that there is no way to efficiently manage a subset of OSM, based on a particular theme - it’s either deal with the entire planet or not.

I can’t imagine I’m the only person who wants a global dataset on a particular subject - in my case climbing sectors.

If there was a way to efficiently get the objects referenced in a change file, it would be possible.

Have you looked into Overpass API already to download data for specific tags only?

1 Like

Have a look at GitHub - SomeoneElseOSM/Boundary_Scripts (which filters a download based on a particular filter) and https://switch2osm.org/serving-tiles/updating-as-people-edit-pyosmium/ (which updates a database after doing some filtering).

I’m sure if you combined those two you’d be able to do what you want.

1 Like

Yes I think that’s going to be the solution in the end, watch the OSM change files and run a query like:
"[out:json];(way(id:173451992);>;);out body;"

The idea was really to fetch all relevant objects once per day, if you don’t need minutely updates.

In your approach be sure to query a larger number of ways in one go: way(id:1,2,3,4);

Way geometries can also change by moving nodes around. It’s important to store the node <> way relationship locally to handle this use case.

That’s a good point re the nodes moving - I hadn’t considered that. Shouldn’t be too big a deal to keep a watch-list of nodes locally.

Yes good call, better to fetch multiple ways.

I prefer the UX of minutely updates - seeing feedback almost immediately is more rewarding than waiting til tomorrow and forgetting.

I expect there to be >100k crags (climbing sectors) in the near future so daily updates would be slow and expensive to run.