How is from/to mapped in OSM right now for these routes?
seems that GTFS is wrong here w.r.t. these two operators. In reality (OSM) they are different.
For what it’s worth, for Israel I currently take the trip_headsign
from trips.txt - to
from trips with direction_id=0
, and from
from trips with direction_id=1
. It’s not flawless, but it seems to match what’s already mapped in OSM in the cases I’ve checked. As I mentioned to you, PTv2 stipulates that from
and to
should be the name of the start and end stops of the route (at least, that’s my reading of Tag:route=bus - OpenStreetMap Wiki), but what’s mapped right now (and IMO more useful) is a more general thing like “downtown” or the name of the destination city when it differs from the origin city.
Well, who do you trust more?
Edit: I’ll just not add from
and to
for now, and let the scripts and PTNA complain about it. We can always deal with edge cases later.
I’ll see if I can start work on this tomorrow or early next week.
Yep, those tiny pitfalls can also be fixed manually after an import. Updates are done once a month only.
One important thing to mention about your approach is that we automatically detect/import new routes and delete no longer existing ones. This remains as the added value even for those GTFS feeds where the route_id and route_short_name never change for a given bus, …
Time to go to bed for me, so long, good night!
I don’t think it’s reasonable to expect a human to go back every month and fix something, only for the importer to automatically undo their fix the next month.
At worst, a viable solution would be to pull the buses out of the filter and type them out manualy:
1;bus;;from1;to1;...;rab-14-301-1
1;bus;;from2;to2;...;rab-14-810-1
@bus
@route_id!=rab-14-301-1
@route_id!=rab-14-810-1
# all other routes will go here...
@@
But then these routes will always appear at the top of the list, instead of their rightful place sorted among the others. Still, the lesser evil. Especially for route 1 which is supposed to be at the top anyway.
I should be doing that too - good night!
GTFS of LU is CC-BY-4.0 which is, without further notes, not compatible with OSM licenses.
Oops! I found a design issue/gap regarding the GTFS to CSV injection.
- Pro: this is triggered whenever a new GTFS version is published on the server
- Con: this is not triggered when the CSV data got enhanced/changed regarding ‘@…’ statements
Right, that is an annoyance. If you can save routes.json (aka catalog.json for Israel) and run the CSV injection script (+ wiki upload if there are any differences) as a pre-processing step for analysis, that would solve the issue.
Good point! Manageable!
- reading the CSV data from Wiki is anyway done before
- injecting data into this file is fine if e.g. IL-JM-Jerusalem.json is found in the working area
- upload to Wiki if there is a diff
- start analysis
This was so much easier than Israel.
#!/usr/bin/env python3
import os.path
import csv
import json
import argparse
import re
def main(gtfs_dir, out_file):
gtfs_routes = get_gtfs_routes(gtfs_dir)
ptna_routes = convert_to_ptna_routes(gtfs_routes)
sort_routes(ptna_routes)
output_routes(ptna_routes, out_file)
def get_gtfs_routes(gtfs_dir):
agency_file = os.path.join(gtfs_dir, 'agency.txt')
with open(agency_file, newline='', encoding='utf_8') as csvfile:
reader = csv.DictReader(csvfile)
agency_name_by_agency_id = {agency['agency_id']: agency['agency_name'] for agency in reader}
routes_file = os.path.join(gtfs_dir, 'routes.txt')
with open(routes_file, newline='', encoding='utf_8') as csvfile:
reader = csv.DictReader(csvfile)
routes = list(reader)
for route in routes:
route['agency_name'] = agency_name_by_agency_id[route['agency_id']]
return routes
def convert_to_ptna_routes(gtfs_routes):
return [gtfs_route_to_ptna_route(route) for route in gtfs_routes]
def gtfs_route_to_ptna_route(route):
# route_id,agency_id,route_short_name,route_long_name,route_type
return {
'ref': route['route_short_name'],
'route_type': gtfs_route_type_to_osm_route_type(route['route_type']),
'comment': route['route_long_name'],
'operator': route['agency_name'],
'gtfs_feed': 'DE-BW-bodo',
'route_id': route['route_id'],
}
def sort_routes(ptna_routes):
def sort_key(route):
ref = route['ref']
ref_num = re.search(r'\d+(\.\d+)?', ref)
ref_num = float(ref_num.group()) if ref_num else float('-inf')
return (route['route_type'], ref_num, ref, route['route_id'])
ptna_routes.sort(key=sort_key)
def output_routes(ptna_routes, out_file):
with open(out_file, 'w', encoding='utf-8') as f:
json.dump(ptna_routes, f, ensure_ascii=False)
def gtfs_route_type_to_osm_route_type(gtfs_route_type):
# cover just the route types present in bodo.zip for now
return {
'0': 'tram',
'2': 'train',
'3': 'bus',
'4': 'ferry'
}[gtfs_route_type]
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument("-g", "--gtfsdir", required=True, help="directory containing the unzipped GTFS files")
parser.add_argument("-o", "--outfile", required=True, help="output file, json")
args = parser.parse_args()
main(args.gtfsdir, args.outfile)
Haven’t tried it with the injector yet, I will in a second.
There’s an issue though: these train routes have no route_short_name and thus no ref:
route_id | agency_id | route_short_name | route_long_name | route_type |
---|---|---|---|---|
obb-9-WB1W-1 | obb-02 | Westbahnhof - Hauptbahnhof | 2 | |
obb-10-CH3-1 | obb-85 | Bahnhof - Lindau-Reutin | 2 | |
obb-13-CH2-1 | obb-01 | Bahnhof - Bahnhof | 2 | |
obb-14-0A3-1 | obb-01 | Hauptbahnhof - Lindau-Reutin | 2 | |
ddb-90-751-1 | ddb-V7 | Hauptbahnhof - Singen (Hohentwiel) | 2 | |
ddb-90-O401-1 | Default | Lindau-Insel - Feldkirch | 2 |
I’d say: if anything can’t be used: there will be no catalog entry
Sure. What does PTNA do if there are CSV lines with missing type or ref?
FYI, in addition to complaining about the missing ref, ptnaFillCsvData.py prints the following errors:
Error: the following routes can't be differentiated by PTNA:
{'ref': '1', 'route_type': 'bus', 'comment': 'stadtbus Ravensburg Weingarten Schmalegg - Hofgut - Huberesch - Ravensburg Bahnhof - Weingarten - Baienfurt - Baindt Marsweiler/Rathaus', 'operator': 'BW', 'gtfs_feed': 'DE-BW-bodo', 'route_id': 'rab-14-301-1'}
{'ref': '1', 'route_type': 'bus', 'comment': 'Ortsbus Meersburg BSB-Hafen-Töbele-Parkplatz Allmend-Daisendorf', 'operator': 'BW', 'gtfs_feed': 'DE-BW-bodo', 'route_id': 'rab-14-810-1'}
Error: the following routes can't be differentiated by PTNA:
{'ref': '626', 'route_type': 'bus', 'comment': 'BürgerMobil Meckenbeuren e. V.', 'operator': 'Strauss', 'gtfs_feed': 'DE-BW-bodo', 'route_id': 'bod-18-626a-1'}
{'ref': '626', 'route_type': 'bus', 'comment': 'BürgerMobil Meckenbeuren e. V.', 'operator': 'Strauss', 'gtfs_feed': 'DE-BW-bodo', 'route_id': 'bod-18-626b-1'}
{'ref': '626', 'route_type': 'bus', 'comment': 'BürgerMobil Meckenbeuren e. V.', 'operator': 'Strauss', 'gtfs_feed': 'DE-BW-bodo', 'route_id': 'bod-18-626c-1'}
Error: the following routes can't be differentiated by PTNA:
{'ref': '7382', 'route_type': 'bus', 'comment': 'Ahausen - Bermatingen - Markdorf (Schülerverkehr)', 'operator': 'BW', 'gtfs_feed': 'DE-BW-bodo', 'route_id': 'rab-14-189-1'}
{'ref': '7382', 'route_type': 'bus', 'comment': 'Meersburg Daisendorf Riedetsweiler Baitenhs. - Ahausen - Bermatingen - Markdorf', 'operator': 'BW', 'gtfs_feed': 'DE-BW-bodo', 'route_id': 'rab-14-382-1'}
Error: the following routes can't be differentiated by PTNA:
{'ref': 'BAT', 'route_type': 'ferry', 'comment': '', 'operator': 'Bodensee-Schiffsbetriebe GmbH', 'gtfs_feed': 'DE-BW-bodo', 'route_id': 'sbb-94-003Y-1'}
{'ref': 'BAT', 'route_type': 'ferry', 'comment': '', 'operator': 'Bodensee-Schiffsbetriebe GmbH', 'gtfs_feed': 'DE-BW-bodo', 'route_id': 'sbb-94-004Y-1'}
Error: the following routes can't be differentiated by PTNA:
{'ref': '', 'route_type': 'train', 'comment': 'Bahnhof - Bahnhof', 'operator': 'OEBB Personenverkehr AG Kundenservice', 'gtfs_feed': 'DE-BW-bodo', 'route_id': 'obb-13-CH2-1'}
{'ref': '', 'route_type': 'train', 'comment': 'Hauptbahnhof - Lindau-Reutin', 'operator': 'OEBB Personenverkehr AG Kundenservice', 'gtfs_feed': 'DE-BW-bodo', 'route_id': 'obb-14-0A3-1'}
It shows an “Error” line in the report with "CSV data includes errors. Line %s of Routes-Data. Contents: '%s'"
Example output: Bodensee-Oberschwaben Verkehrsverbund/Analyse/DE-BW-bodo-Linien/Template Test - OpenStreetMap Wiki
Well, in those cases, the CSV needs to handle them separately by @filter!=… and so on
Yeah, even for me it is hard to sort them out by city, … local guys (heroes) may help here @mcliquid @skyper
Currently the sorting goes by the first number in the ref. Refs with no number go first.
Sorting is done when creating the json file, in this case by this code:
def sort_routes(ptna_routes):
def sort_key(route):
ref = route['ref']
ref_num = re.search(r'\d+(\.\d+)?', ref)
ref_num = float(ref_num.group()) if ref_num else float('-inf')
return (route['route_type'], ref_num, ref, route['route_id'])
ptna_routes.sort(key=sort_key)
Or in layman’s terms: sort by type (bus, ferry, train, tram), then by the number in the ref, then by ref (alphabetically), then by route_id.
Sorting by type should usually not matter because filters wouldn’t normally allow routes of different types, but it does matter if you do what I did in the example output with:
== Everything else
@!=
...
@@
I really tried to read and understand the thread from above, but unfortunately it’s way too advanced for me in terms of technology and programming.
How can I help specifically?
In principle, it can of course be really mean and confusing, because in DE-BW-bodo (part of the four-country region around Lake Constance), a total of four countries, ten federal states and countless regional and local buses are mixed up.
Take a look here: Public Transport Network Analysis/Syntax of CSV data - OpenStreetMap Wiki under the title “CSV data import definition: @”. Try to read it and get an idea for what’s going on. Please give me feedback on how easy or hard it is to read and understand, and let me know if you have any ideas to make it clearer. Feel free to edit the page if you feel confident about it.
If you’re not familiar with “regular expressions” I expect this to be the hardest bit to understand, in which case I advise you to not go too deep into it at this stage, just know that regular expressions are a way to determine whether a string is of the right pattern. Technical details on how that works can come later.
Anyway, once you hopefully understand the idea for the templates and filters (which may require a few rounds of you asking questions), we need to create import filters/templates in this page: DE-BW-bodo-Linien/Template Test
I’ve already made appropriate filters for the first few sections:
== Züge
@train
...
@@train
-
== Busse
=== Stadtverkehre
==== Friedrichshafen
@route_id~^bod-17
...
@@route_id~^bod-17
@route_id~^(bod-18-0N3-1|rab-14-031-1)$
...
@@route_id~^(bod-18-0N3-1|rab-14-031-1)$
-
==== Überlingen
@route_id~^bod-15
...
@@route_id~^bod-15
-
== Fähren
@ferry
...
@@ferry
But I don’t know how well I did. I can’t read German!
Edit: of course, use the actual DE-BW-bodo-Linien CSV page as a basis