Generally extracts and relations are a problem. You can’t just put all relations and their members that are somehow indirectly related to stuff inside the extract into the extract file, because you’d get half the planets data that way. So the extract only contains specific types of relations, this is configurable in osmium. route_master relations are probably not taken into account by default for usual extracts. So I would not expect them to work. This is just a limitation of the OSM data model. To fix this you probably have to create your own extracts, and even then its complicated. This is a rather unspecific answer, sorry, but I don’t know enough about what you are doing to be more specific. I’d have to know exactly what processing steps you are doing (or somebody else whos data you are using).
Yes, I know that those relations cause problems. E.g. using Overpass-API, you can’t find (empty?) routes_masters within a bbox, you have to search globally, and other limitations.
What I’m trying to achieve in this case is:
find public transport route relations
find their route_master relations
analyse their relationship regarding same ‘ref’, ‘network’, …
does the route_master have all (or less or more) those corresponding route relations as members (in that area)
With Overpass-API, the second step is quite easy using the “<<” method based on the data of the first step.
With Planet extracts, this is different.
For performance reasons, I changed from creating own extracts from Planet file to prepared continent extracts, provided by the French server (on a update-per-minute basis). Reasons:
Planet file is large
OK, having extracts of all continents is in total even more?
‘pyosimium-up-to-date’ on Planet file takes a lot of time and additional disk space
in the end this is done 4 times: 2 AM, 7 AM, 11 AM and 6 PM Central Europe time
I’m interested only in
“europe” and “africa” data at 2 AM local time Central Europe
…
“australia” data at 2 AM local time Eastern Australia (6 PM Central Europe or so)
filter (positive) data from the Planet and in a second step filter (negative) takes some time
Would creating own extracts from Planet file help here? And how?
What would be the best set of options?
N.B.: PTNA makes heavy use of “osmium extract” in the workflow: currently from continent to timezone to country to set of regions/states to region/state to counties. Nothing gets lost here, hopefully, as far as I know.
For performance reasons, I changed from creating own extracts from Planet file to prepared continent extracts, provided by the French server (on a update-per-minute basis).
I have never figured out a way to offer minutely updated extracts due to the way the OSM data model works. Either the French are smarter or they are taking a shortcut somewhere, which might explain why your data is not complete. I am quite sure they are not using Osmium to create the minutely diffs.
PTNA makes heavy use of “osmium extract” in the workflow: currently from continent to timezone to country to set of regions/states to region/state to counties. Nothing gets lost here, hopefully, as far as I know.
Osmium extract will recursively include parent relations of relations, but not their children. So you are going up the tree, but not down again. This might or might not be what you need.
Recursively adding parent relations is great. Working on the final extract data, you cannot determine whether parents relations are missing or not.
PTNA recognizes if child relations are missing (e.g. Public Transport Platform MP-relations outside the extract area). PTNA reports their IDs on STDERR, a script collects those reported IDs and merges them into a file (... | sort -u ...). The next time their data is extracted from a bigger extract file (e.g. using osmium getid ... bavaria) and then merged to the original extract data (e.g. osmium merge ... upper-baravia ids-from-bavaria). If IDs are still missing, I can configure using osmium getid ... southern-germany or germany or europe. That works pretty well, with a delay of 24 hours though (“The next time …” = ‘cron’ job every night).
Yes, for 98% making a per country extract and then seeing which bus master network relations (type=route master) are in it but for networks that have over the border lines that does not work or does not work optimally. Making the extracts takes all quite some resources.
Getting a list of all bus route master relations is cheap. What is more challenging is to see to which country they belong.
That can be done programmatically, get the bus routes, get the ways and nodes of the bus routes and see in which country they are predominately.
That is not cheap and I think also not needed. You currently have already a list of relations per country. Just save that list and save all relations you are not interested in. Make a hash/dictionary route_master2country and maps each route master relation osmid to a country or None.
Use this dictionary to filter a updated incoming list of all bus route master relations and catch the ones for which there is no mapping, that should be only a few.
For these few new unknowns use some more expensive algorithm to determine in which country it is and add them to the dictionary. I would start by just opening the relation in the browser but I can well imagine you know a cheaper way of doing it.
Works pretty well as long as there are no relations as member of a route relation (MP platform outside the extract). But is handled now.
But, it’s a bit more complicated than that.
The original download was done via Overpass-API. I’m going to switch over to Planet extracts, 'cause PTNA exceeds the fair-use limits of Overpass-API.
Main focus is on route relations and not on route_master relations.
Some pre-PTv2 route relations do not even have a route_master, single PTv2 (round-trip) routes do not need a route_master.
Route_masters will be found via their ‘parent’ property, quite easy with the Overpass-API, done explicitly with Planet extracts using an appropriate set of filters for ‘osmium filter ...’
I do not want to collect all (public transport) route relations from the Planet file and select those which fit into a specific and small area (county or city notcountry.
It’s the other way round:
I ‘osmium filter ...’ a big extract (Planet, europe, asia, …) for relevant data and
‘europe’ 16 minutes
I ‘osmium filter -i ...’ delete irrelevant data which is still in the file.
‘europe’ ~ 3 minutes
This shrinks the Planet or continent to about 10-15% of the original size
Then I ‘osmium extract ...’ in several steps to the size I need (max. 8 extracts per call for RAM reasons)
Europe into 4 parts: countries within UTC+03, UTC+02, UTC+01 and UTC+00
UTC+xx into areas of interest
UTC+01 into 3 parts takes 2 minutes
country into regions (France: PAC, NOR, NAQ, IDF, ARA, …) of interest
France, two steps in 2 minutes
regions into states (départements, Bundesländer, Canton, …) of interest
states into (set of) counties, arrondissements or cities of interest
note the ‘set of’ here: DE-BY-MVV covers 17 counties and the city of Munich
start the analysis
This all is quite straight forward, once configured in this hierarchical way it takes some (CPU) time, yes.
But I think this approach is still faster than using dedicated Overpass-API calls for every single analysis.