Problems with maintaining the regional extract in an up-to-date state

I ran into a problem using the --bounding-polygon option in osmosis.

When you deploy data into fresh DB this set of parameters helps you to achieve your goal by clipping data using predefined *.poly (bounding-polygon).

osmosis \
--read-pbf file=some.osm.pbf \
--bounding-polygon file=clip.poly 
--write-apidb …

But when you try to do the same for stream of changes (*.osc) osmosis throw an error

Task 2-bounding-polygon does not support data provided by default pipe stored at level 1 in the default pipe stack.

osmosis \
--read-replication-interval workingDirectory=$DATA_DIR/replication_in \
--bounding-polygon file="$poly_file" \
--wxc $DATA_DIR/osm.osc
👈 see log here
osmosis \
--read-replication-interval workingDirectory=$DATA_DIR/replication_in \
--bounding-polygon file="$poly_file" \
--wxc $DATA_DIR/osm.osc
Feb 26, 2023 3:42:10 PM org.openstreetmap.osmosis.core.Osmosis run
INFO: Osmosis Version 0.48.3
Feb 26, 2023 3:42:10 PM org.openstreetmap.osmosis.core.Osmosis run
INFO: Preparing pipeline.
Feb 26, 2023 3:42:10 PM org.openstreetmap.osmosis.core.Osmosis main
SEVERE: Execution aborted.
org.openstreetmap.osmosis.core.OsmosisRuntimeException: Task 2-bounding-polygon does not support data provided by default pipe stored at level 1 in the default pipe stack.
        at org.openstreetmap.osmosis.core.pipeline.common.PipeTasks.retrieveTask(PipeTasks.java:159)
        at org.openstreetmap.osmosis.core.pipeline.common.TaskManager.getInputTask(TaskManager.java:165)
        at org.openstreetmap.osmosis.core.pipeline.v0_6.SinkSourceManager.connect(SinkSourceManager.java:51)
        at org.openstreetmap.osmosis.core.pipeline.common.Pipeline.connectTasks(Pipeline.java:74)
        at org.openstreetmap.osmosis.core.pipeline.common.Pipeline.prepare(Pipeline.java:116)
        at org.openstreetmap.osmosis.core.Osmosis.run(Osmosis.java:86)
        at org.openstreetmap.osmosis.core.Osmosis.main(Osmosis.java:37)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.codehaus.plexus.classworlds.launcher.Launcher.launchStandard(Launcher.java:321)
        at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:234)
        at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:406)
        at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:347)
        at org.codehaus.classworlds.Launcher.main(Launcher.java:47)

or in case with local osm.osc file, we see the same behaviour

osmosis \
--rxc $DATA_DIR/osm.osc \
--bounding-polygon file="$poly_file" \
--wxc $DATA_DIR/osmc.osc
👈 see log here
osmosis \
--rxc $DATA_DIR/osm.osc \
--bounding-polygon file="$poly_file" \
--wxc $DATA_DIR/osmc.osc
Feb 26, 2023 3:45:59 PM org.openstreetmap.osmosis.core.Osmosis run
INFO: Osmosis Version 0.48.3
Feb 26, 2023 3:45:59 PM org.openstreetmap.osmosis.core.Osmosis run
INFO: Preparing pipeline.
Feb 26, 2023 3:45:59 PM org.openstreetmap.osmosis.core.Osmosis main
SEVERE: Execution aborted.
org.openstreetmap.osmosis.core.OsmosisRuntimeException: Task 2-bounding-polygon does not support data provided by default pipe stored at level 1 in the default pipe stack.
        at org.openstreetmap.osmosis.core.pipeline.common.PipeTasks.retrieveTask(PipeTasks.java:159)
        at org.openstreetmap.osmosis.core.pipeline.common.TaskManager.getInputTask(TaskManager.java:165)
        at org.openstreetmap.osmosis.core.pipeline.v0_6.SinkSourceManager.connect(SinkSourceManager.java:51)
        at org.openstreetmap.osmosis.core.pipeline.common.Pipeline.connectTasks(Pipeline.java:74)
        at org.openstreetmap.osmosis.core.pipeline.common.Pipeline.prepare(Pipeline.java:116)
        at org.openstreetmap.osmosis.core.Osmosis.run(Osmosis.java:86)
        at org.openstreetmap.osmosis.core.Osmosis.main(Osmosis.java:37)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.codehaus.plexus.classworlds.launcher.Launcher.launchStandard(Launcher.java:321)
        at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:234)
        at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:406)
        at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:347)
        at org.codehaus.classworlds.Launcher.main(Launcher.java:47)

If you don’t use --bounding-polygon for clipping stream of osm-changes, you get updates from around the world, and no error. However, the task is to store only changes for a certain area.

osmosis \
--read-replication-interval workingDirectory=$DATA_DIR/replication_in \
--wxc $DATA_DIR/osm.osc
👈 see log here
osmosis \
--read-replication-interval workingDirectory=$DATA_DIR/replication_in \
--wxc $DATA_DIR/osm.osc
Feb 26, 2023 3:50:10 PM org.openstreetmap.osmosis.core.Osmosis run
INFO: Osmosis Version 0.48.3
Feb 26, 2023 3:50:10 PM org.openstreetmap.osmosis.core.Osmosis run
INFO: Preparing pipeline.
Feb 26, 2023 3:50:10 PM org.openstreetmap.osmosis.core.Osmosis run
INFO: Launching pipeline execution.
Feb 26, 2023 3:50:10 PM org.openstreetmap.osmosis.core.Osmosis run
INFO: Pipeline executing, waiting for completion.
Feb 26, 2023 3:50:11 PM org.openstreetmap.osmosis.replication.v0_6.BaseReplicationDownloader runImpl
INFO: Reading current server state. [ReplicationState(timestamp=Sun Feb 26 15:49:08 UTC 2023, sequenceNumber=5465506)]
Feb 26, 2023 3:50:12 PM org.openstreetmap.osmosis.core.Osmosis run
INFO: Pipeline complete.
Feb 26, 2023 3:50:12 PM org.openstreetmap.osmosis.core.Osmosis run
INFO: Total execution time: 1861 milliseconds.

:question: Are there any specific things that aren’t in the documentation about how to force osmosis to truncate the stream of osm-changes by using --bounding-polygon?

P.S.
As an interim solution, I’m currently using osmconvert to trim the change stream.
For example, like this:

osmosis \
--read-replication-interval workingDirectory=$DATA_DIR/replication_in \
--wxc - | \
osmconvert - -B=$DATA_DIR/clip.poly --out-osc > $DATA_DIR/osct.osc && \
osmosis \
--rxc $DATA_DIR/osct.osc \
--wxc $DATA_DIR/osmz.osc

It is technically impossible to make a regional cut from an .osc file because ways and relations in the .osc file will not necessarily have geometries. If someone changes the opening hours of a restaurant mapped as a way, you will only get the information “way #1234 has a new version” but not the information where on the planet way #1234 is. Therefore you cannot decide whether this object needs to remain in the file or not. You can only successfully clip change files when you have access to a node database.

I don’t know what osmconvert does but it certainly doesn’t do the right thing because doing the right thing is impossible :wink:

There are several possible ways around this. One possible way is to download world-wide updates, apply them to a data file you are keeping, clip the data file again to remove stuff outside the polygon, then compare the data file to the last version and import the resulting diff.

I’ve always used this to do that :slight_smile:

1 Like

Yes,

and in this script’s particular case, an osm2pgsql-imported database with --slim and without --flat-nodes. The OP seems to have an API style database to which the tool could probably be adapted.

@woodpeck @SomeoneElse :pray: Thank you for suggestions.

It seems that I need to shed some light to explain the sequence of action I’m going to perform.

The 1st stage is the deployment of a fresh API DB for the specified area. Here I used premade extract from https://download.openstreetmap.fr/.

And here everything is more or less smooth, despite the fragmentary information about the deployment of the standalone API DB. I have successfully deployed a Postgres instance and populated it with the somefile.osm.pbf extract.

I didn’t mention in the original post, but I used clipIncompleteEntities=true along with --bounding-polygon file=clip.poly option to additionally clip the extract during data population into the DB and to avoid crashes (see ERROR: insert or update on table “current_way_nodes” violates foreign key constraint “current_way_nodes_node_id_fkey” #99).

So, as a first step, I have an API DB with a regional extract that I’m going to keep up to date applying diffs from the planet to the database with a regional extract.

The 2nd stage is to update the local API DB that I deployed in the first stage to keep the data up to date. I have no intention (resources) to deploy the entire planet. So I want to limit receiving updates to only the area I need.

The 3rd stage is to create diffs from locally maintained API DB for other internal services or create a pipeline to move data into them using DB–>DB replication.

Now I’m stuck on the second step. Unfortunately, the osmosis documentation does not clearly state that the --bounding-polygon (--bonding-box) option only works with extracts (*.osm.pbf/*.osm.gz, although the extracts are stored in *.osm.pbf/*.osm.bz2 formats :crazy_face:) and doesn’t work for changesets (*.osm.osc).

I find this :point_up: statement only partly true, due to the fact that I already have a DB with nodes (geometry) to which I can apply the resulting changes. If way #1234 refers to nodes that are not in the database, it should be discarded at the stage of writing changes to the database (--write-apidb-change), after the nodes outside the bounding polygon have been discarded accordingly by --bounding-polygon file=clip.poly clipIncompleteEntities=true.

I also see the problem that the set of changes (diffs) we get from the planet do not contain a geospatial context per se. That is, we only have a list of changed objects and refs to changesets, but not (multi)polygons that describe where these changes took place. (see for example this changeset –https://www.openstreetmap.org/api/0.6/changeset/129818176/download). Despite the fact that we have one more element of the data schema—changesets, information about them is not included in diffs. This should be at least their ID and BBOX (area). A BBOX is used to describe an area where the changes have place, although there should be a (multi)polygon, say in WKT, that more accurately represents the area where the changes occurred. This is another bottleneck in OSM that can be eliminated by adding a geospatial component to the description of the changesets and partially converting the OSM database into a geospatial one. (see for example https://www.openstreetmap.org/api/0.6/changeset/129818176)

If we have information about the area where the changes took place, it will be much easier to make appropriate extracts.

Does my somewhat glib “use trim_osc.py” answer above not work because the resulting files cause errors when you’re updating an apidb? My experience with osmosis** and a rendering database was that it wasn’t an issue there, but I’ve never tried using the resulting files to update an apidb. I would expect that behind the scenes a whole bunch of updates without geometry would not be trimmed and updates corresponding to objects outside the area of interest would fail because the objects aren’t in the database.

** or osmium etc. FWIW

In all fairness, using trim_osc.py feels like using duct tape :adhesive_bandage: instead of addressing the root cause of the problem in osmosis and deficiency of OSM data schema, as well as the chain osmosis — osmctools/osmium — osmosis.

It works for first aid, but has a bad effect on operational excellence, increasing the overhead of keeping the system running as a whole. It can also lead to unexpected consequences by increasing the chain of dependencies for the entire system.

:question: Is there any other tool other than osmosis that can deploy the API database and keep it up to date using the diffs distributed by OSM planet?

:upside_down_face:

It isn’t really directly a consequence of OSMs data model, though the changes proposed for getting rid of way nodes would force fix it at the expense of making diffs substantially larger. It would be quite possible to generate a diff format that contains the prior geometry information it is just a choice that we don’t do so.

And yes that is exactly what overpass augmented diffs Overpass API/Augmented Diffs - OpenStreetMap Wiki do (the main problem with augmented diffs is that they do a lot more which is unnecessary for your kind of application).

Digging deeper into this issue, I came to the conclusion that there is no reliable mechanism/tools in OSM to replicate changes from the API DB for regional extracts. :earth_africa: :twisted_rightwards_arrows::derelict_house:

All this stems from the deficiencies of oversimplifying the data schema.

  1. *.osm.osc does not provide data about changesets per se. That is, we lack the geospatial context of where the changes took place. Even if information about the BBOX of the changeset will be included in the diff, it is clearly not enough. Instead of BBOX, the area where the changes took place should be described in the form of a polygon, which will be used as one of the properties of the changeset. Such a polygon should cover objects that are spatially connected to each other. In this way, we get rid of excessively large BBOXes that cover the entire planet, but the objects in which do not have spatial connections between each other. For example, one changed object in Africa, one in Europe, and one in Australia will not be part of a single changeset, but instead will be three separate sets, each with its own geospatial context. We need to have a clear geospatial context for the changes, which should be represented in diffs.
  2. Geospatial characteristics are inherited by other elements from the nodes that make up the ways, from the ways, in turn, multi-polygons and other relations are formed. However, in addition to the position of a node in space, we need information about its version (timestamp–position on the timeline) to have an idea of where and when the node was in time and space. This will greatly simplify and reduce the cost of both the reconstruction of the geometry of elements and the tracking of changes in geometry over time. We will not need to fetch the history of changes of all nodes of the way, which is currently very slow and leads to excessive loads on the DB server and API. (See Tracking geometry changes in GitHub – GitHub - osmlab/osm-data-model: For discussions about the OSM data model and how to improve it). In this way, we get a clear and reliable tool for tracking history in OSM, which was the goal of the OWL project.

By implementing these evolutionary changes in the API, we will significantly improve the control of versions (change history) of objects and the process of publishing diffs and their processing by consumers, both internal (OWL) and external.

1 Like

Thanks to your tips, I got the following workflow

1st, I fetch changes from the planet OSM and apply them on the local *.osm.pbf, clip data by bounding polygon to filter out unnecessary data and save updated extract in "$data_file.new" file.

osmosis \
--read-replication-interval workingDirectory=$DATA_DIR/replication_in \
--simplify-change \
--read-pbf file="$data_file" --b \
--apply-change \
--bounding-polygon file="$poly_file" cascadingRelations=yes clipIncompleteEntities=true \
--write-pbf file="$data_file.new"

2nd, I compare two files, old and new, deriving changes and writing them in API DB

osmosis \
--read-pbf file="$data_file.new" \
--read-pbf file="$data_file" \
--derive-change \
--write-apidb-change host="$POSTGRES_HOST" database="$POSTGRES_DB" user="$POSTGRES_USER" password="$POSTGRES_PASSWORD" validateSchemaVersion=no 

3rd replace old pbf with new one and repeat the cycle.

*Is it possible to combine steps 1 and 2 into one run?


Also, I ran into a few hiccups that I hope the community can help resolve.

  1. when omitting --simplify-change option for fetching diffs and applying them, osmosis throws error that several versions of one feature appear in the change stream. However, I would like to have all versions and not just the latest version from the diff for the objects in the local DB.
  2. This may be a side effect of the previous issue, but for objects being removed, the visible=false attribute is written to the previous version of the object, and the latest version with removal is skipped. However, I would like to have all the versions without gaps.

Here is an example.
Version 1 and 2 is a feature creation and modification

Then, version 3 is a feature removal, but it was skipped, instead version 2 got visible=false value

The feature was restored (ver 4). You may see that version 3 is missing.

And final deletion of the feature (ver 5) is missing as well as ver 3

I would appreciate any advice to help resolve these issues.