I took a few days break, but now back at trying to import GTFS to PTNA. Thought I’d give you a little update. I’ve reached the point where I need to disambiguate routes that have the same ref. When I started writing this comment I thought it was impossible, but now I’ve found a way and it’s just annoying and cumbersome.
There are 5 bus routes with ref ‘1’ in the Haifa sub-district, and 4 of them would need to be disambiguated with from/to labels. Unfortunately the from/to which you can get from routes.txt is not usable for this - it’s too specific, giving the full name of the first/last stop instead of the general name used normally. But it turns out trips.txt has usable labels for destinations. So I need to keep the destination for forward trips and the destination for backward trips, and use that to disambiguate. I’m kinda losing my patience for this… But if I manage to keep myself together, it shouldn’t be much longer until I have some form of useful output.
Yeah, that’ll be the tricky part. As mentioned before: DE-SN-VMS has 11 times ref=‘A’
the CSV data has 11 lines with ‘A;bus;’
in such cases, PTNA evaluates the ‘operator’ data in the CSV and compares it with the ‘operator’ tag in OSM
‘A;bus;;;;OP1’
‘A;bus;;;;OP2’
‘A;bus;;;;OP3’
‘A;bus;;;;OP1’
if this is not sufficient, then PTNA also compares ‘from’/‘to’ in the CSV data with the ‘from’/‘to’ tags in OSM (and ‘from’ with ‘to’, …). If there is a match for a route relation, then the route_master and all sister/brother routes are taken into consideration as well
‘A;bus;;from1,to1;OP1’
‘A;bus;;from2;to2;OP1’
What does this mean for you?
the CSV ‘operator’ can be obtained from GTFS ‘agency’
the ‘agency’ must then appear also as OSM ‘operator’
the CSV ‘from’ and ‘to’ can be obtained from GTFS trips
route variants must be embraced by a route_master
you said, this can be achieved in GTFS by analysing route_descr?
at least one OSM relation’s ‘from’ and ‘to’ must match with CSV ‘from’ and ‘to’
CSV ‘from’ and ‘to’ should then be derived from a single GTFS trip as stop_name of the first and last stop
A match exists if CSV ‘from’ is substring of OSM ‘from’ and vice versa and CSV ‘from’ with OSM '‘to’ and so on
CSV ‘from’ and ‘to’ may include ‘|’ as the OR sign, so you can add all stop_name of all relevant GTFS trips’ first stop into the CSV ‘from’, same for ‘to’ with last stop
Let’s see whether this is still readable for humans and PTNA and how that works then
Fortunately, we’re only 1 hour apart re. timezones. So, once the code runs, I can check the result manually and by using PTNA and give feed back and you can fix and beautify and …
Regarding GTFS → PTNA CSV, progress is slow after I lost some of my motivation with the real-world hell happening right now, but I hope to get back to it now.
I’ve implemented the solution I wanted for disambiguating routes, but the results are not always ideal. Some of the destinations are in the form ‘city_destination’ e.g. ‘חיפה_מרכזית המפרץ’. Others are just not very clear, and in some cases there’s more than one option to choose from so one is selected arbitrarily. That’s a lot of downsides, but this is better than nothing.
So far I’ve made no usable output (just debugging stuff), but now the next step is to make some CSV output. I might allow human mappers to change the from/to labels of routes, without overwriting it. That way we can fix any of the bad labels based on how routes are actually mapped in OSM, while using the initial labels as a guide. The script could identify which route is which based on the gtfs route_ids associated with it.
On the other hand, that sounds like a hard challenge so that cleverness might have to wait for later. Perhaps I should just sprint to making output.
Thanks, that looks good. And: no need to code for SQL, raw GTFS files are welcome as well and still do exist during/after the import.
I’d have a different approach for def get_stop_codes(): though.
Do not search for OSM stops in the area.
Instead, select/search GTFS stops who’s lat/lon is in that area and select the GTFS trips and GTFS routes stopping there.
Overpass API or OSM API should be used only to get the polygon data of the area of interest.
No, it’s because public transit routes just aren’t mapped enough in Israel. I believe Jerusalem will fare better, but I still expect significant missing routes.
This makes me realise: The RNN region is outdated as Wackernheim and Zornheim (both in Mainz-Bingen) are part of the VMW tariff zone (i.e. located in RMV, not RNN). I’ve already updated the relation a month ago it but that has yet to be updated in PTNA.
the stop data of this “GTFS Sweden 2” data does not specify the pole’s lat/lon but rather represents the stop area
we’re working on using the “GTFS Sweden 3” data which includes shape data and precise positions of the platforms. This requires creating an account and requesting an API key which I want to avoid.