PTNA: news for Public Transport Network Analysis

ToniE · July 22, 2024, 4:41pm

PTNA is currently undergoing major renovations:

Overpass API queries are being replaced by queries towards ‘osmium’ based on planet extracts, step by step.

Q: Why all this?
A: PTNA reaches / exceeds the fair use limit for Overpass API queries

Q: Will my ‘network’ / analysis be affected?
A: Almost all ‘network’ will be changed. But only “almost all”.

Q: Will I see differences?
A: No, at least that is the goal. But: Queries to Overpass and queries to ‘osmium tags-filter’ are different. So there might be some difference. Please report any strange differences. Fine-Tuning will take place then.

Q: When will it start?
A: Tonight we start with parts of Germany in the UTC+01 time zone: DE-BY-CBB (already available), DE-BY-RVO and DE-NW-VRR. I hope it will run without any problems.

Q: Where can I see the current status of the switchover?
A: The statistics page has 3 new columns under “Filter Planet Extract” and a time zone specific output (see: UTC+01)

Thanks to @Jochen_Topf for the discussions on ‘osmium’ during the FOSSGIS OSM Community Meeting 2024 number 21

JesseFTW · July 22, 2024, 11:00pm

I assume you considered and rejected the idea of running a private Overpass server, just for PTNA – that would avoid needing to change the queries, at the cost of having to set up a full overpass server, and keep it populated with data.

ToniE · July 22, 2024, 11:35pm

That was my initial idea: Overpass instance in a docker container.
That would have been a big initial effort and would have required much resources (2 TB for the planet dump, postgis DB, …) on the FOSSGIS-owned server.

Jochen convinced me to work with planet dump, planet extracts.

I finally came up with the solution of working with country-specific extracts combined with their minute-updates from download.openstreetmap.fr

I will extend the list of country-specific extracts step-by-step, starting with Germany.
Those extracts are then split into smaller pieces: states, counties and/or area covered by the ‘network’ (as type=public_transport relation or so). Finally the smaller areas are filtered for relevant data and stored as .osm/.xml.

For the analysis part of PTNA, it is irrelevant where the data (as file) comes from, as long as it is XML. The better the filtering, the smaller the XML, the faster the parsing of the XML.

For DE-BY-CBB (Burghausen, a small city in Bavaria) the analysis results did not show any differences between Overpass and Planet Extracts. Let’s see whether this holds for the large analysis of DE-BY-RVO (Upper Bavaria, as relation based on admin boundary) and DE-NW-VRR (huge part of North Rhine-Westphalia, as relation based on public transport boundary).

BTW: minute-update of germany and split into pieces started 20 minutes ago.
I’ll keep an eye on it.

ToniE · July 22, 2024, 11:57pm

Oops! And there are tons of missing data:

missing member nodes and member ways outside the extract. Extracts should include all nodes and ways of all relations, even if they are outside the polygon defining the extract.
other missing data is related to ‘osimium’ not providing a filter like [~'route'~'(bus|tram|train|subway|light_rail|trolleybus|ferry|monorail|aerialway|share_taxi|funicular)'] as Overpass does - especially the left part ~'route' (a RegEx)

Either I can fix this or it will be a KO criterion.

ToniE · July 23, 2024, 6:29am

Yeah, after a very short night, things seem to be brightening up now. I will have to do some more research, especially on some of the options for ‘osmium extract’ that I had previously overlooked:

-S, --option=OPTION=VALUE which, if set well, might solve the first problem
- at least for the extracts which I trigger
- I have to find a solution for e.g. DE-BY-RVO bus routes or trains stretching into Austria (two different extracts, downloaded from the french server)
  - PTNA must create germany.osm.pbf and austria.osm.pbf from europe.osm.pbf and not download germany/austria
--clean=ATTR to get rid of user=‘ToniE’ and other values in the output
--set-bounds to see how far bus/train routes expand outside the polygon

The second problem/bullet can only be solved by adding more filters to ‘osmium tags-filter’. I currently use a positive and negative list

positive (include) : r/type=*route r/type=public_transport r/type=network r/route_master r/route public_transport highway=bus_stop,platform railway=stop,tram_stop,halt,station,platform
- but how to include disused:type=*, suspended:type=* → *:type=*
negative (exclude) : route=tracks,railway,bicycle,mtb,hiking,road,foot,inline_skates,canoe,detour,fitness_trail,horse,motorboat,nordic_walking,pipeline,piste,power,running,ski,snowmobile,cycling,historic,motorcycle,riding r/type=restriction landuse building natural shop office
- add r/type=boundary, add route=road,street,associated_street,building, …

Let’s see and test and test and test … I’m more optimistic now

Gertjan_Idema · July 23, 2024, 7:22am

For border-crossing routes, you might consider a combination of Osmium for country data and Overpass for border-crossing lines like Flixbus.

My workflow uses an Osmosis Pgsql extract of the Netherlands with daily updates from geofabrik.be. This is extended with a few dozen bus_stops in Belgium and Germany (currently a static dataset). Off course the restriction to one country helps a lot in reducing the size of the dataset.

ToniE · July 23, 2024, 2:16pm

Yeah, that could be an option. For ‘Flixbus’, I’d refrain from using planet extracts and rather stick with Overpass API.

That’s what I wanted to avoid: a huge initial effort (SQL Db, …)

ToniE · July 23, 2024, 2:38pm

… and still have no solution for:

‘osmium extract’ does not consider route-relations as member of route_master if the route-relation is completely outside the polygon area
- RB33 has two route members (Relation: ‪RB33: Heinsberg => Aachen‬ (‪1200277‬) | OpenStreetMap and Relation: ‪RB33: Aachen => Heinsberg‬ (‪4217837‬) | OpenStreetMap) which are not in the area of DE-NW-VRR
- translated error message: Route master has more routes than were actually found in the data set (4 vs. 2) Route was not found in the loaded data set: Relation 1200277 (iD, JOSM) Route was not found in the loaded data set: Relation 4217837 (iD, JOSM)
‘osmium extract’ does not consider MP-relations as member of route if the MP-relation is completely outside the polygon area
- same train: Error in the input data: not enough data for 'relations': relation 12338111 (iD, JOSM), relation 1772511 (iD, JOSM), relation 5151598 (iD, JOSM) Further analysis is skipped ...

This requires an extension on ‘osmium extract’: deep-dive into all relations and their members and their members …

@Lonvia and @Jochen_Topf : is there a chance to get this implemented in ‘osmium’? I assume, that’s not an easy task

would probably require several re-reading attemps of the data
and could end in an infinite loop if a member of a member points to the grand parent.

For the filter rules ‘osmium tags-filter’, positive and negative, I was more successful and could reduce the data significantly.

Jochen_Topf · July 24, 2024, 7:01am

@ToniE It is unlikely this will get implemented in Osmium. As you say it becomes more and more difficult and specialized to to extracts when looking at complex data, you have to read the file multiple times and make sure to avoid loops etc. At some point it makes more sense to just import all data into a database and do processing there. The streaming-processing model of Osmum is just not the right solution at that point.

You might be able to puzzle something together extracting Ids “manually” and then using osmium getid to get them out of the input file. But that’s annoying, too.

ToniE · July 24, 2024, 10:36am

@Jochen_Topf Thanks, I’m OK with that.

The majority of the “new” errors refer to MP platforms outside the polygon area.

PTNA shows an appropriate message “Missing data …” instead of “Wrong data …”.
For the missing data reported for train platforms (MPs) in DE-NW-VRR, we can easily refer to the DE-Bahnverkehr analysis, where we have a complete dataset for the same train(s).

The filtered data is still much bigger than before. But this is also due to MPs with type=multipolygon and name=* only! I consider this as incomplete tagging. PTNA will report (simply list) those, so mappers can fix them.

I’ll go ahead with my switchover towards Osmium.

If you don’t mind, I’ll get in touch with you on the Karlsruhe Hack Weekend in October.

Thanks.

ToniE · July 24, 2024, 9:11pm

I just added analysis of public-transport for

FR-ARA-SMTCAC
- Clermont Auvergne Métropole, le parc Vulcania et les villes de Sayat, Dallet, Mezel, St Bonnet et Pérignat-sur-Allier
- for Syndicat Mixte des Transports en commun de l’agglomération Clermontoise
- gtfs analysis is also available FR-ARA-SMTCAC
- based on request by OSM user Pmz

to ptna

ToniE · July 24, 2024, 9:16pm

I have a problem with the Europe extract from download.openstreetmap.fr. The available file europe.osm.pbf is very old and pyosmium-up-to-date fails: ERROR: Cannot download state information for ID 5910000. Is the URL correct?

Follow this topic on: Planet extracts from https://download.openstreetmap.fr/extracts/

emvee · July 25, 2024, 7:44am

As @Jochen_Topf indicated not something osmium extract can cover.

But … you know the osmid’s of these relations, why do you not just use osmium get-id for these osmid’s on a planet dump?

ToniE · July 25, 2024, 12:23pm

Yes, true, but: I don’t know beforehand how many missing data that’ll be.
I do not want to start asking and asking and asking … for an unknown number of IDs. It’s more a performance problem than a coding issue.

Jochen_Topf · July 26, 2024, 7:51am

@ToniE Heres is another idea: All relations for the whole planet are only about 800 MB as PBF file. If you remove everything that you are definitely not interested in, such as boundary or large multipolygon relations, you’ll end up with something that’s probably quite manageable, even without cutting out a specific bounding box or area.

ToniE · July 26, 2024, 11:09am

Thanks Jochen, that really sounds like a manageable solution.

Intitial task

download planet-latest.osm.pbf (76GB)
call pyosmium-up-to-date planet-latest.osm.pbf

Recurring tasks

nearly every hour using cron
- for the “Supported timezones (wintertime)” at their 2PM time
call pyosmium-up-to-date planet-latest.osm.pbf for hourly updates
- ~ 5MB each hour?
filter as mentioned above for relations and their node/way members
- positive filter: see at bottom
- negative filter: see at bottom
split into smaller pieces (related to the current time zone unter evaluation)
- continents, countries, states, counties, public transport boundaries - poly with few lat/lon
start ‘network’ based analysis
remove created *.osm.pbf (extratcs) files no longer needed

Filters, subject to change:
Positive Filters:
r/type=*route r/type=public_transport,network r/abandoned:type r/disused:type r/suspended:type r/razed:type r/removed:type r/route_master r/route r/network r/name r/ref r/from r/to r/via r/public_transport:version r/ref_trips public_transport highway=bus_stop,platform railway=stop,tram_stop,halt,station,platform route_ref gtfs:feed gtfs:route_id gtfs:stop_id gtfs:trip_id gtfs:trip_id:sample gtfs:shape_id

Negative Filters:
r/route_master=tracks,railway,bicycle,mtb,hiking,road,foot,inline_skates,canoe,detour,fitness_trail,horse,waterway,motorboat,boat,nordic_walking,pipeline,piste,power,running,ski,snowmobile,cycling,historic,motorcycle,riding,junction r/route=tracks,railway,bicycle,mtb,hiking,road,foot,inline_skates,canoe,detour,fitness_trail,horse,waterway,motorboat,boat,nordic_walking,pipeline,piste,power,running,ski,snowmobile,cycling,historic,motorcycle,riding,junction,canyoning,climbing,sled,TMC r/type=defaults,area,destination_sign,enforcement,person,treaty,cemetery,pipeline,election,level,restriction,boundary,building,waterway,building:part,organization,set,bridge,site,health,junction,right_of_way,dual_carriageway,street,associated_street,cluster,tunnel,tmc,TMC,tmc:point,tmc:area,traffic_signals,place_numbers,shop,group,collection r/type=*golf r/highway=pedestrian,service,living_street,footway r/network=lcn,rcn,ncn,icn,lwn,rwn,nwn,iwn,foot,bicycle,hiking indoor=room area:highway aeroway cemetery historic power amenity boundary admin_level place tourism junction parking landuse landcover building roof:shape room natural shop office craft man_made leisure playground golf piste:type

ToniE · July 27, 2024, 4:05pm

Looking good, working with planet file and applying positive and negative filters.

Updating planet file based on hourly diffs takes 40 minute, size: 85091323757 bytes
Applying positive filers takes 24 minutes, size then 8387796691 bytes ~10% of planet file
Applying negative filters takes 4:30 minutes, size then 7735181985 bytes - there is not much improvement but we can play around by adding more negative filters

But 40 minutes + ~30 minutes for filtering isn’t something we can start every hour, maybe every second hour - I don’t see a problem here. “The night is long”. Whether the analysed data’s timestamp is 1PM, 2PM, … 4PM local time - I don’t mind.

ToniE · July 30, 2024, 7:41am

Still looking good and some things seem to improve.

Question to the osmium and pyosmium-up-to-date experts @lonvia and @Jochen_Topf

Observations:

updating planet file takes 40 - 70 minutes (depending on competition on server, high disk load)
applying filters is in the range of the values mentions above, but can also be more

But what I noticed by chance/accident:

I can update an extract derived from filtered planet file (e.g. europe.osm.pbf) with pyosmium.up-to-date europe.osm.pbf with INFO: Using replication service at https://planet.osm.org/replication/hour/ and this one grows only marginally.
- 2 hours after planet update: size for filtered europe 4168749729 → 4175578013

I assume, when applying a planet update on a filtered extract:

handling deleted objects is quite easy
handling modified objects is also quite easy
handling new objects will add them to the filtered extract as is (i.e. w/o filters)

Would it be sufficient to apply

planet updates on the filtered planet file and filter again?
planet updates on filtered extracts and filter again?

This would have a significant positive performance impact:

applying planet update + filter on 8GB filtered planet file rather than 80GB planet file + filter (disk speed seems to be the limit here)
applying extract update + filter whenever an analysis has to be started

It would also change my strategy on updates, filters, extracts, …

download a complete planet.osm.pbf once a week, whenever a new one is available
- download during lazy hours of PTNA: 1:15PM - 6:30PM CEST - between “Alaska” UTC-09 and “Australia” UTC+9:30
work with updates+filters on filtered planet file for the rest of the week

There’s also some more news on “Missing data …” for MPs outside the poly area (mainly platforms in train stations)

add those stations to the poly-file
- some initial configuration effort, but amount known beforehand and then mostly “done and dusted”

Edit: add links to logs
Planet update, filter and extract Africa and Europe
Start analysis for UTC+01
Start analysis for DE-NW-VRR

lonvia · July 30, 2024, 8:20am

The problem with updates on filtered planets are usually with dependent objects.

Example: somebody adds an existing highway to a route relation. The modified route relation will show up in the diff. The highway will not be there because it has not changed. When you apply the diff to your filtered planet, some data is now missing.

You can mitigate the issue somewhat by making sure that all ways that could ever be in a route relation will be in your planet but there will be always corner cases because mappers will find and map the one use-case you haven’t considered.

There are similar problems when you need the geometries of ways. See my example in this issue.

ToniE · July 30, 2024, 8:26am

Thanks, understood. I haven’t considered these cases yet:

add existing node to way, relation
add existing way to relation
add existing relation to relation

A not so polite version from SW design: “You can’t make things fool-proof because fools are so ingenious!”