I see. In that case, I should update the import script. Right now it tries to make the smallest unique key, but it should probably bring in all the information it can get and let PTNA figure out what information to ignore. That way, that extra info can be a hint for mappers.
Should be a quick fix, but it’s sleepy time now… Or it was 4 hours ago.
Damn, I kinda wish I didn’t find that bug now. Sorry! If I had any experience with Perl I might have tried to figure it out with you, but alas, I don’t… The best I can do is wish you luck.
Some few hours ago I released a bug fix reading the CSV-Data from the OSM-Wiki.
@NeatNit thanks for pointing to that and also thanks for pointing to Perl’s “Text::CSV_XS” module which now does the magic parsing.
I ran local tests over all existing analysis configurations before releasing it.
The tests uncovered many bugs in the OSM-Wiki CSV data which have been tolerated by the old code.
Example:
219;bus;This bus is called "South Side Shuttle";North;South;Operator
where the string >> This bus is called "South Side Shuttle" << should have been enclosed in double quotes and the double quotes in the text should have been “escaped” with double quotes.
Correct version:
219;bus;"This bus is called ""South Side Shuttle""";North;South;Operator
I fixed the CSV data of the affected 19 OSM-Wiki pages except for DE-BW-RVF which I’m going to fix now.
The module “Text::CSV_XS” reports parsing errors like these:
It was an uphill battle, but I’ve done it: the GTFS->CSV importer in Israel now also imports train routes. Please get the latest version when you get it automated: NeatNit/ptna-gtfs-import - Codeberg.org
CSV output is now more robust, to pair with the improved parsing on your end
I renamed the scripts: israelGtfsRoutesInShape.py to analyze the GTFS data and create routes.json, and ptnaFillCsvData.py to fill this data into CSV. Naming is hard. I think these script names are fine. The first script is very much tailor-made for Israel, the second one is designed to work for any future imports from different sources without having to modify it.
I’m pretty confident there’s nothing left to improve now in either one of those, so these are probably their final versions!
I guess the only thing left now is better documentation. But I want to see it working (automated) first.
Is there any other specific GTFS feed that you think should be imported in the same manner? I could take a look. Keep that momentum going.
You know, I’ve been tinkering with PTNA a lot but I don’t actually use it much. In fact, I never mapped a route!
So I tried today. The button in PTNA that makes JOSM download all the stops for a trip, is amazing. I found an old, badly mapped route and tried to fix it. But when I try to compare GTFS and OSM route, I get this error:
Some cells may not show valid data, there was missing data.
israelGtfsRoutesInShape.py + create-venv.sh in the ‘bin’ folder of the gtfs repo
ptnaFillCsvData.py in the ‘bin’ folder of the ptna repo
There are similar GTFS feeds (single *.zip for the whole country) for NL, NO, LU, and CH: let’s see whether we can reuse it. LU is quite small, would be the same area for the 5 analysis - one catalogue only? Filters (@operator=TICE …) need to be applied.
Where is LU’s feed exactly? I couldn’t find it in either PTNA or the OSM wiki.
I might take a look this weekend or next week. I think if there is a “typical” presentation for GTFS data (like one route_id = one OSM route_master), then code for that would be reusable for a lot of different feeds.
Would be best if you just point out a GTFS feed that you use which is known to use the best practices. I’ll write code for that, which can be reused for lots of feeds, and maybe modified slightly as needed for specific quirks.
I think much of the code for the Israeli conversion is not reusable because of our feed’s many weirdnessed.
I’d love to! I am currently checking DE-BW-bodo and AT-VVV. I’ve updated DE-BW-bodo some days ago and saw difficulties in the GTFS generated file, because it is not “sorted” or only alphabetically sorted and then the individual sections for the different bus regions are missing later, right?
I would find that difficult to read, because these groupings of lines form regional clusters that you can focus on. Example:
Right: Sorted by by number if the first character is a digit, all others sorted alphabetically.
If the geographical regions have distinct number ranges, the approach of @NeatNit can help a lot.
See the regional busses of DE-BY-MVV 2xx, 3xx, … 9xx, They were more or less assigned to counties until, last year, new counties came into the game. We (I ?) arranged the report according to the counties then but we came back to arranging them by number ranges, because looking for bus 285 (e.g.) was tedious in the CSV data and the report.
comment = ??? (you can put anything here that can be determined programmatically, for Israel I put a link to a website that shows statistics for the route)
Any other hidden values to add that will be useful for splitting the output into different headlines, and can’t be filtered with the existing properties using regex? For example for Israel I want to add a city field that will be a comma-delimited, alphabetized list of all cities the route stops in. That way, @city=חיפה will show all routes that stop only in Haifa, and @city~\bחיפה\b will show all routes that stop in Haifa even if they stop in other cities too. So any such useful fields that could be determined from the GTFS data are possible. Of course, this can all be added later too. I didn’t implement city for Israel yet. I think it will be very useful for splitting the analysis pages into manageable bits.
And of course, where do I download the GTFS data from to work with?
Edit: I also recall seeing somewhere in your GitHub a file that lists all GTFS route types and the OSM route types they correspond to. Can you find that file for me?
For DE-BW-bodo it’s quite easy: from NVBW, the company (?) owned by the Ministry of Transportation in Baden-Württemberg, Germany (one of the 16 states in DE)
Yup, that’s what I saw! Together with the next function, they convert to an OSM route. I guess I should port these two functions wholesale to Python.
Maybe afterwards, when the code is running on the server, we can switch from processing raw GTFS files to fetching the data from SQL. Should be much faster, and benefit from any post-processing you already do to the data, including (presumably) conversion to OSM route type, making the ported function unneeded.
No, but ‘operator’ should be different in most cases and that is then sufficient.
Argh! Bus 1 appears twice with same operator in different cities.
The first work in route_long_name translates to “citybus”/“localbus”, does not help as ‘from’ though.
Ideally, from and to can be derived from the 1st and last stop_name of the trips? But this could result in a long, long list of names in both strings from and to.
I have no idea.