Yeah! The tool chain has been created, the workflow is clear, the switchover is getting momentum.
I have solutions for the major blockers:
Relations outside the search area (defined by a polygon)
MP relations (platforms) as members of route relations and
route relations as members of route_master relations)
this can be handled by extending the poly file to include those (tiny) areas
Size of data to be parsed has been reduced a lot
Planet update and filtering is a single task, and configured as cron jobs (takes 1.5 to 2.5 hours for each cron job)
job for Africa and Europe starts at 1:15AM CET/CEST, so OSM data is as of 1AM CET/CEST
the analysis cron jobs will be configured to start 3 hours after the related planet update/filtering
the analysis cron job for Central Europe and Morocco (UTC+1) starts at 4AM CET/CEST and takes ~50 minutes for > 230 ‘network’
So, all tasks for a specific area of the planet can be started when most of the local mappers (the night owls) have stopped mapping (1AM local time) and tasks have finished (5AM local time) before the other local mappers (the early birds) start mapping.
Thanks to all for the discussions, questions and suggestions. The rest is configuration work …
For each GTFS feed, PTNA searches for a related post-process-ptna-sqlite.sh file (example). This can include sqlite3-SQL statements for DB-manipulation. Currently, this is used to remove “false positive error messages” inserted by the “analysis” task of GTFS feed import.
In this case here, I’d rather also add a pre-process-ptna-sqlite.sh mechanism which manipulates the DB before PTNA starts its aggregation, analysis tasks (which then would speed up those tasks significantly).
Thanks! I originally created the US-WA-* feeds list with the intention of getting them into PTNA but never got around to reaching out about it. This will help us very much!
One note: on this page, it indicates ““operator” can be taken from “agency_name” of GTFS” as true, but that’s not the case here. Specifically, the two streetcar lines indicate “City of Seattle” as the agency in the GTFS data, but the routes are actually operated by KCM, so the operator can’t be determined from the agency in these cases. (TBH, whether “City of Seattle” or “King County Metro” should be operator is up for debate depending on the semantics of the tag, but that’s something I think we’ll need to settle in the King County/Washington local community. But for now they are “King County Metro”, and at least @Lumikeijuappears to agree there lol.)
Edit: Oh I just noticed “Metro Transit” is the GTFS-published agency name there so would be used as operator as well. We use “King County Metro” for operator to be less ambiguous.
From my point of view, the GTFS does not follow the GTFS recommendations:
each route (route_id) represents exactly one shape/path/route with several trips (trip_id) providing service at different hours:minutes but follow the same path
each of the routes can be seen as a route variant in OSM terms
PTNA will complain when each route variant in OSM has it’s own gtfs:route_id and the route_master does not have a gtfs:route_id. This can be solved by adding a new analysis option, skipping this kind of test. But: GTFS is wrong/strange here. I’ve seen that also for PL-24-ZTM-Katowice.
My plan is to start with the 15 sub-districts (admin_level=5) in the 6 districts (admin_level=4). This will be done step by step. Each of the 15 analysis reports will accept any ‘network’ value.
Local mappers can focus on their sub-district, the analysis report should not be too big for each.
Kindly have a look at the CSV list at “Israel/Public_Transport/PTNA/IL-HA-Haifa-Routes” in the OSM wiki. This is how we can proceed with the other 14 sub-districts.
There are two different bus routes with ref=1, so the ‘operator’ needs to be part of their CSV rows.
It’s getting late so I’m only giving it a quick glance. It looks good, I only have a couple of notes so far:
First, a minor point, can the unused headers (e.g. tram) be left out? I believe most regions don’t have any trams.
Secondly, can the Metronit network (which includes one of the ref=1 lines) be shown under a separate header? This is quite specific to the Haifa region but I feel is a useful distinction.
Other than that… It’s good, I think! But bear in mind I don’t really know what to look for, I don’t fully understand yet how PTNA is used. And I will be able to take a closer look tomorrow.
This is what I’m not sure is going to work, or I don’t understand yet. This basically requires mappers to populate this list with hundreds of routes, and keep updating it as routes change, right? But to my understanding, the routes in GTFS already contain all of this data and it is kept up to date by the authorities. If local mappers are expected to keep this CSV up to date, then it will definitely quickly go out of date, if it even becomes populated enough to begin with.
What I was thinking is to generate and update this CSV from the GTFS data. It would still require some manual definitions, like which routes belong to the same route_master, but to me it sounds like a much less daunting task and it would then keep the list up to date.
Other than that, for the sake of argument let’s say this list actually will be kept up to date by local mappers. I have some questions…
Browsing around PTNA I am starting to get a sense of what it can do. One of the most useful features in my opinion is the ability to compare GTFS trips and OSM routes on a map, like this: PTNA - Compare GTFS trip with OSM route
Understandably, this option doesn’t show up for the Haifa routes yet, but can that work? Specifically, as you said route_id is now associated with a route variant and not a “route master”. Will PTNA be able to work with that? Maybe in the CSV you could allow multiple values for gtfs-route-id like 11693|11694|11695|11696 so that we can specify all the route_id’s associated with a route master?
Right, Yes, the initial effort can be significant. PTNA can create the initial list based on GTFS data though, layout changes can then be made. Maintenance depends on how often the GTFS data changes, how often buses disappear/get renumbered/pop up.
Maintaining the CSV data is not as hard as maintaining OSM Wiki tables like this (in German but I guess you get the point. These tables include relation_IDs, colours and have to be maintained by anyone (do they know that these tables exist?) who’s creating, deleting or modifying an OSM route_master/route.
The CSV list can be maintained by few mappers, triggered by monthly update of GTFS @ PTNA. I assume that changes of a route due to short-term construction works are not mapped and ignored when they pop up in GTFS. All this requires reading the news, reading the announcements of the operators, … as well.
I’ve seen so many flavours of GTFS data, only few are well enough structured to follow this approach - IL-MOT tends to be awkward (size of area, route_id maps to OSM route variant, route_short_name=1 => OSM ‘ref=1’ appears multiple times in different regions, …).
If we want to follow this approach, we must first monitor/evaluate how stable route_id, trip_id and shape_id are. GTFS requires them to be unique inside the data but does not make any assumptions/requirements that they are the same in the next GTFS data version (yes, I’ve seen something like that also).
If we want to follow this approach, I’d suggest doing this as a post-processing step during the GTFS import @ PTNA (once in a month, it allows manual intervention afterwards), rather than pre-processing before the analysis.
This is quite new and has been introduced in early 2024.
There are to ways to achieve that:
in the CSV data, you can add gtfs-feed and gtfs-route-id information to each line, defining what is expected to be mapped (example for tram ‘2’ with 2 route_ids)
2;tram;;;;תבל;IL-MOT;;"34445;34446"
in the route-relation, you can add gtfs-feed and gtfs-trip-id information, defining what has been mapped (example for tram ‘2’)
for the route which represents the outward journey
gtfs:feed=IL-MOT
gtfs:trip_id:sample=585428624_130924
for the route which represent then return journey
gtfs:feed=IL-MOT
gtfs:trip_id:sample=585002763_130924
Again: this does not make much sense if route_ids or trip_ids are volatile, change with every GTFS feed version.
For both versions, PTNA will add links called “GTFS” followed by a “compare” icon to the analysis report.
example: DE-BY-MVV bus 210 top right corner in the header
example: DE-NW-VRR bus 331 in column 3 of each row
I might/will have to add some code handling multiple route_ids here, but that should be manageable.
All in all:
we replaced the effort of maintaining OSM wiki tables by maintenance of a CSV list
we gained QA for route_master and route relations
I’m not sure but I guess, PTNA is the only tool which detects “bus: using a oneway road in wrong direction” (considering oneway:bus=no, … though)