PTNA: new feature - GTFS / OSM comparison

PTNA provides GTFS data for selected feeds.
I implemented a new feature allowing a comparison between a GTFS-trip and an OSM-route.
Check it out by clicking on the icon grafik in column 3 of a PTNA report (see image below: example).

More will follow, allowing a comparison between GTFS-route and OSM-route_master (proof-of-concept).

Once this is all done and dusted, a comparison between GTFS and GTFS (routes/trips) can be performed as well, finding out the differences between an old and the new GTFS-feed.

All tasks with strict focus on OSM.

Long form:

The comparison is based on calculation of the mismatch between some metrics applied on stops

  • number of stops
  • positions of stops differ
    • by more than 20 meters
    • by more than 100 meters
    • by more than 1000 meters
  • GTFS-‘stop_name’ and OSM-‘name’
  • GTFS-‘stop_name’ and OSM-‘ref_name’ (if tagged in OSM)
  • GTFS-‘stop_id’ and OSM-‘gtfs:stop_id’ (if tagged in OSM)
  • GTFS-‘stop_name’ and OSM-‘ref:IFOPT’ (if tagged in OSM)

The comparison page can be reached via the PTNA report performed for a ‘network’.
A new icon grafik in column 3 of the report
grafik points to the GTFS/OSM comparison for the OSM route.

To get the icon displayed, OSM-route-relations need the following tags:

  • gtfs:feed
    • DE-BY-MVV in this case
  • gtfs:trip_id:sample or gtfs:trip_id
    • 215.T0.19-220-s24-1.3.H" in this case

See Proposal:GTFS Tagging Standard in the OSM Wiki.

This concept allows checking OSM versus GTFS if you already have found the right corresponding trip_id - does OSM-route still reflect GTFS-trip. But this task can be tedious and can take some time.
To speed up this task of finding the right GTFS-trip_id and OSM-route-relation a comparison between GTFS-route and GTFS-route_master will be implemented as well. See the proof-of-concept page.
Having done this, the icon grafik will pop-up in the PTNA report at two additional places:

  • column 3 for the route_master
    • if gtfs:feed and gtfs:route_id are tagged
  • in the large image, top right corner in the table header
    • if feed and route_id are in the CSV data in the OSM-Wiki:
      • 12;bus;"Tewksbury via Rte. 38/Wilmington Train Station";;;Lowell Regional Transit Authority;US-MA-LRTA;3001;
    • this allows a comparison even before you start tagging the OSM-relations

The concept will later on allow also a comparison between two GTFS-feed versions, their routes and their trips (spoiler: new stop #43). But this is work in progress.

What's next?

I already received some good comments:

  • add more links to edit OSM-relations in the tables
  • allow filtering in the proof-of-concept table
    • some GTFS-routes may have 30 valid trips for buses
    • some GTFS-routes may have > 200 trips for trains (same stations but different platforms)
    • hide rows where the score is above a specified value
    • can we apply this to columns as well?
  • allow skipping calculation of mismatch for GTFS-‘stop_name’ and OSM-‘name’ (example) and others
    • can be done by setting the corresponding “weight” to zero
      • per GTFS-feed
      • per PTNA report
      • as parameter in the URL (&w_name=0)
  • i18n - internationalization
Thanks so far to (random order):

@mcliquid @derloris @Kwatrecht @Patchi @JesseFTW @adamos @Mundilfari @XioNoX @nlehuby @adamos and many others (this list is limited to 10 entries though)

See also:

Edit: 2024-02-23 16:45 UTC - fix link to comparison of two GTFS-trips for US-US-LRTA so that the “spoiler alarm” is correct)

5 Likes

Wow! Thanks a lot to @ToniE and all helpers. I am looking forward using the new features.

2 Likes

Is there a possibility to flag false-positives (like in Osmose)?

And there seems to be an issue with special characters. See PTNA - Compare GTFS trip with OSM route stop #9 “Grundlsee Rösslern” looks the same but is flagged as different stop name.

Hmm, not at the moment. A feed-back channel requires authentication, moderation, …? I’ll have a look at the Osmose solution.

Hard to detect even in JOSM, but OSM Data is actually wrong/differs.

“Grundlsee Rösslern”, the first ‘s’ in ‘Rösslern’ seems to be a strange UTF-8 character looking like an ‘s’ but is an ‘s’ with double points like on top of ‘ö’

Copy and paste here does also not work.

wget -O - 'https://overpass-api.de/api/interpreter?data=[out:json];relation(8769612);(._;>>;);out;' | grep 'Grundlsee R'

1 Like

I released the next version into the wild.

You can see the icon grafik at 3 additional places (marked with the word ‘new’) in the PTNA report,

  • column 3 for the route_master
    • if gtfs:feed and gtfs:route_id are tagged
  • column 3 for routes
    • if gtfs:trip_id is not tagged but gtfs:route_id is (shown then as as “GTFS(r)” in front of the icon)
  • in the large image, top right corner in the table header
    • if feed and route_id are in the CSV data in the OSM-Wiki
    • this allows a comparison even before you start tagging the OSM-relations

Comparison of GTFS with OSM is based on GTFS stops and OSM platforms:

  • members of route relations must have ‘role’ = ‘platform’, more precise ‘role’ must start with ‘platform’

You can sort the columns of the table:

  • #Num initial sort order based on names of: 1st-stop + last-stop + 2nd-stop + 3rd-stop + … + last-stop
  • GTFS feeds based on the displayed names
  • OSM routes sorted based on score values in the cells

You can select and hide table rows manually or based on scores

What else?

PTNA provides also GTFS/GTFS comparison

  • start with your country’s GTFS overview
  • click on the icon grafik in the right-most column
  • select the versions you want to compare
  • select the routes / route-versions
  • you’ll see the same comparison table as with GTFS route vs OSM route_master

Example:

What's next?

Select and hide rows based on …

  • maybe add some more “Select rows where …”
    • grafik “suspicious number of stops: 2”
    • grafik “This trip is sub-route of …”
      • replace their “trip_id” by their “#Num” ?
      • with “onmouseover” highlight the other trips, with “onmouseout” remove highlighting
    • grafik “Trips have identical stop-names but different stop_ids/shape_ids”
      • use different icon?
      • replace their “trip_id” by their “#Num” ?
      • with “onmouseover” highlight the other trips, with “onmouseout” remove highlighting
      • those trips stick together with the initial sort order (sorted by #Num)

Table layout

  • vertical scrolling: the first 3 rows of the table are fixed and remain at their place when scrolling vertically
    • for horizontal scrolling: try to fix the first 4 columns of the table as well

Other

  • add some more information on GTFS route/OSM route-master
    • not only their “id” but also ‘name’, …
  • add more information when clicking on grafik
4 Likes

The place where I wait for the bus, in OSM terms the highway=bus_stop – is that part of GTFS? In my home town the bus_stop is mostly mapped at the stop_position, a node of the PT route – which is kind of not useful for me: For one, it is sometimes off quite a couple of meters, second it does not tell me the direction of the bus. Administrative PT router shows that very fine.

Perhaps flagging such difference might make PT mappers aware of that. I mapped some bus_stops where they actually are and constantly fear, PT mappers will just revert them to the stop_positions…

Yes, and their understanding of a “stop” is what we call ‘platform’ in PTv2 - where the passengers wait. I’m someone mapping highway=bus_stop where passengers wait. And I see hw=bus_stop as legacy and use it only (sorry: mapping for the renderer Carto) to place the bus icon. So, I don’t pay much attention on that.

Same for me.

1 Like

Sorry for using a legacy app (not OSM-Carto). hw=platform indeed, I see this a legacy too, it should be pt=platform instead, that way PT mappers would not receive a QA issue with hw=footway. But please, PT mappers, put the hw=bus_stop somewhere off the driveway, in the correct location!

1 Like

Some news for the GTFS / OSM comparison

  • GTFS-Route / OSM-Route-Master comparison

    • reworked the layout of the page
      • grafik == sub-route of … with some explanation on ‘mouse-over’
      • grafik == route has only two stops
      • grafik == other suspicious things with some explanation
      • grafik == trips have same stop_names but different stop_ids/shape_ids
  • GTFS-trip / OSM-route comparison

    • ‘weight’ for ‘name’ comparison can be set to any value in PTNA’s GTFS data
      • for the following feeds, this has been set to zero, 'cause ‘stop_name’ is always in capital letters
        • BG-22-Sofia
        • FR-PAC-Altigo
        • FR-PAC-Sillages-Scolaire
        • FR-PAC-Sillages-Urbain
        • FR-PAC-Trans-Agglo
      • anyl other GTFS feed, where ‘weight’ for 'name ’ should be zero?

What's next?

  • Highlighting the other relavant trips after click on
  • grafik
  • grafik
3 Likes

Very nice! Thank you.
I have two enhancement ideas:

  • highlight matches where OSM data links to a specific gtfs trip (draw a frame around the number oder make it bold?)
  • Add the number of trips (“Anzahl Fahrten”) of each gtfs-trip an let the table be sorted by that number.

Just released that.

The data was already available, so just the evaluation needed to be coded

  • font-size doubled
  • font-weight increased
  • It’s the number of rides for this particular trip (not: sum-rides)
    • in this example, #Num 9 is sub-route of #Num 2, but the 306 rides of #2 are not included in the 3595 of #9

BTW: in this example you can see that for grafik == “Trips have identical stop-names but different stop-ids”, the 4 trips 6,7,8,9 were already sorted descending by “rides”

1 Like

From release to release it is easier to check GTFS feeds with OSM relations, thank you.

A small wish/enhancement from my side with the weight for name. An exact match should be the primary goal but I had a lot of places (specially outside cities) where it is essential to put the city name next to the stop name. Why? Because otherwise you will have several ‘townhall’ or ‘cemetery’ stops and the same line and using the stops’ names alone you will not know at all the differences between the same name. And most of time the city name is also visible on the pole itself. See this exemple - no name matching although the stops are the right ones - in some cases the name is really different and should be adapt but most of times it is correct. After the exact matching it can be useful to check if the GTFS name is included into the OSM name. Of course it should not be weighted as much as the exact name but can be a good indication if the stop matches.

1 Like

I know this problem, although it is not that much related for editing the route when this passes several villages where the stop_names are equal.
The former, well accepted but now disabled oepnv-karte.de had the same problem with “Friedhof” when starting a query (which routes stop here) towards “bahn.de”. In such cases, “bahn.de” returned a long list of choices, the one you’re asking for was almost never in the list.
To make the story short: a tag called “ref_name” is used here which allows keeping the name=“Friedhof” (on the map), having ref_name=“Friedhof, Ottobrunn” for queries.

This is already the case, so

  • stop_name = “Gare Routière” and
  • name = “Digne-les-Bains - Gare Routière”

would be accepted (silently, w/o applying different weighting)

The problem with the given, buggy GTFS feed is:

  • stop_name = "Gare Routière " and
  • name = “Digne-les-Bains - Gare Routière”

do not match, because of the trailing blank in the stop_name.

PTNA already has a post-processing of stop_names, … for the German language (“str.” → “straße”, …“R.-Bosch-Str.” → “Robert-Bosch-Straße”, …).
Deleting ’ ’ at the begin and end of a name should be done for any language though. I’ll add that before downloading the new and available GTFS for many of the FR-PAC-* feeds.

Could be done, I’ll think about that. But: the current weight of ‘2’ for name comparison is already small.

2 Likes

The example is now looking better.

1 Like

… just because the GTFS feed has been fixed by “Zou!”

1 Like

Just found and implemented a solution for

  • horizontal scrolling grafik

based on Javascript, not HTML/CSS though. It scrolls cell-by-cell, not smoothly but fast enough. This is the first approach, fine-tuning required (scroll to end, scroll to start, better icons, …).

You might need to refresh you browser's chache : Ctrl-R / Ctrl-F5

grafik

Would it be possible to have an option to use the headsign for the trip instead of the PTv2 name?

In my case,

Parcours {route_short_name} vers {trip_headsign}

instead of

Bus {route_short_name}: {from} => {to}

Sometimes this may align with the last stop, and sometimes it doesn’t.
That way the data can closely align with real-world expectations

Hi Toni,
I just have corrected a bus route whose itinerary had changed with help of the GTFS/OSM comparison and remarked that the route was missing two stops somewhere else, so that I could correct also that. Now I am thinking it would be interesting to see all the routes in the network (Verkehrsverbund) that have a mismatch of number of stops so that I can check whether the OSM route needs to be improved (or of course, the GTFS is incorrect). Is there a way to get a list of the mismatches of the best matches OSM/GTFS?

Hmm, not sure what I should/could do here

  • ‘route_short_name’ and ‘trip_headsign’ are GTFS terms, they do not exist in OSM

  • OSM’s ‘ref’ can be seen as the equivalent to GTFS’ ‘route_short_name’

What I can do for column 4 (the GTFS trips) is: show something similar to

Bus {route_short_name} towards {trip_heasdsign}
or
Bus {route_short_name} towards {trip_short_name}
or
Bus {route_short_name}: {trip_long_name}

where ‘Bus’ can be derived from {route_type}

In theory yes, in practice there is a huge performance issue for the most interesting part.

In the image you can see the ‘compare GTFS’ icon at three positions:

  • top, header line, right most position

    • this is the feed and route_id from the CSV data
      • the values can easily be changed in the OSM wiki w/o manipulating OSM data
      • this can be seen as: the OSM data should be mapped according to this
  • column 3, 2 times

    • for the route_master and for the routes
    • this is the feed, route_id and trip_id from the OSM relation
      • they reflect what the mappers have mapped for the relations

In theory,

  • column 3 comparison is a “1:1” comparison and this will definitely be implemented in the near future for OSM PTv2 routes

  • top, header line, … comparison is an “m:n” comparison - with the performance issue

    • comparing ‘m’ GTFS trips with ‘n’ OSM routes
      • for a GTFS trip, finding the best matching OSM route
      • for most entries, this is not a big deal, but
        • for many trains, like “S 1” in Munich, this is a “284:4” comparison
        • for some buses in the rural area like 708 it can even be a more challenging “58:44” comparison
      • and sadly: such examples are the most interesting ones for such an implementation

Let’s discuss this further on Tuesday next week, 2024-04-09 during the Pub Meeting in Munich