PTNA: comparing two GTFS feeds

ToniE · September 5, 2023, 10:40am

I’m currently working on comparison of two GTFS feeds with respect to consequences for OSM

I was thinking about what could be good indications for a change in a route:

number of route variants has changed
- but even if the same: variants could use other stops
total number of stops has changed
- but even if same: stops could have been replaced by other stops
number of different stop_id has changed
- e.g. stop has same stop_name but other platform (stop_id)
- also: stop_id has been fixed
- but even if the same: sequence of stops could have changed
number of different stop_name has changed
- also: stop_name has been fixed (abbreviation expanded, …)
- but even if the same: sequence of stops could have changed
sequences of stops have changed - using md5 over the data
- using stop_id
- using stop_name
- using stop_lat / stop_lon
start and end date changed

I did not consider route_id and trip_id in the analysis, they might be stable in one GTFS feed, they might always change from version to version in other feeds.

First code handles an overview on all routes in two feeds - a single Web page with no further activities.
You may select two routes and click on “Compare selected Routes” but that’s not implemented yet.

Examples:

I would be happy receiving comments, feedback, suggestions, ideas, …

@skyper @CjMalone @miche101 @Patchi @Nielkrokodil @JesseFTW @mcliquid @flo2154 @NorthCrab @mariotomo

JesseFTW · September 5, 2023, 11:42am

This looks neat, thank you for working on it. One minimal improvement is to add delimiters in the date fields, so rather than 20230814 it would be something like 2023 08 14 (or with dashes, to match ISO).

Also maybe auto-collapse columns where none of the values changed?

miche101 · September 5, 2023, 4:30pm

Sieht sehr kompliziert aus. Und man sieht ja nicht welche Relation betroffen ist bzw. welche Variante jetzt eigentlich fehlt.

ToniE · September 5, 2023, 5:53pm

Format of dates is now ISO
hide / show unchanged routes

Nielkrokodil · September 5, 2023, 5:53pm

Good Work!

Another indicator could be route length. You could spot changes after new roads have been built.

I see this possible use case:
It will help you maintain already mapped routes, because you see where to update OSM Data and do not have to go through all routes and check manually.

If this gets extended to see differences of routes in one single gtfs feed (different variants) I would use it a lot. What I imagine is:
Render both routes (same style as now, with stop numbers you can click on) on top of each other in different colors. Color the stop numbers differently: Same stop name and same number = green, same stop name but other number = yellow, stop name only in one route = color_of_the_route.
This could be extenset to compare even more than 2 routes, as long as you have enough different colors and enough space for the stopnumber-popups.
It would be nice to compare a gtfs route with an osm route too.

ToniE · September 5, 2023, 5:54pm

Das ist hier zunächst nur eine Übersicht über das was sich geändert haben könnte.

Wenn man zwei Routes direkt vergleicht, werden die Details angegeben - wenn’s mal implementiert ist.

ToniE · September 5, 2023, 7:32pm

Good point! Not all feeds include shape data. But for those which have, I could consider that.

This is possible, for instance for AT-VVT

Tram 1 has 6 route_id versions and their stop sequence indicators are all the same - so maybe changes only in timetable?
Tram 2 has 7 route_id versions, most of the indicators differ - would be interesting to compare their start and end dates. Short term changes caused by construction?
…

Simply spoken: GTFS routes correspond to OSM route_masters, GTFS trips correspond to OSM routes - rendering two GTFS routes in one map could fill the whole map with icons though.

Yep, this is on the to-do list. E.g. in the PTNA analysis, take the GTFS info from CSV list (feed, route_id) and compare it with the route_master.

ToniE · September 5, 2023, 7:41pm

Just a remark:

the current solution is a proof of concept, data is created on the fly on the server when the page gets requested
- this has some performance issues: CH-Alle needs 18 secs to be created.
- adding comparison of shape data would increase that
the data itself is static, will not change for a given feed
- perfoming an analysis during the import and aggragation would be the goal
- would be applied for new feeds only

JesseFTW · September 6, 2023, 1:45am

The date formatting looks good. The hiding has a bug where the last row isn’t included (it always shows up). Hiding rows is a good feature, but the one I was actually suggesting was hiding columns where no row has changes. That would reduce the horizontal width of the table, which would be nice.

ToniE · September 6, 2023, 2:38am

True, but intentional: the “change” is that the Franklin Line Shuttle disappeared in the new GTFS feed.

That’s quite hard to implement. I would even have to mark those columns in the head of the table before actually starting the analysis. Alternatively code the “hiding” completely in JavaScript: hide specific columns if there is no cell colored in orange.

Patchi · September 7, 2023, 1:53pm

For me also it is a good start. Of course, such a tab may not be the ultimate tool to use when trying to see where the differences with the previous GTFS feed are, but it can help a lot as detailed information are shown. To my humble option the sequence information (stop_id, stop_name and lat/lon) doesn’t bring me a lot of information as the MD5 is not helpful (for me a text information like identical or different will do the work and spare some place in this tab).

Currently when I’m using PTNA (the part with OSM relations not the GTFS part) I look at the colour in the column last modification. If it is orange, then there are some differences, and I will have a look and try to correct the OSM relations if possible. I think such a tab will be helpful to use in the GTFS part of PTNA and display an orange column somewhere see if there any differences and not only in the validation date. It will give the GTFS part of PTNA the same kind of sensibility about changes as the OSM parts. Therefore, good work @ToniE.

I’m quite sure there will be some kind of extensions to this functionality. As discussed before the graphical comparison will be probably useful. And there are already the same kind of functionality in the OSM world. Here an example of the comparison of hiking trail using the Knooppuntnet site: it compares the OSM hiking relation with a trace and using a dedicated colour scheme to get the difference in the shape. OK not all GTFS-Feeds have shapes, but it could be a useful functionality to see what was discussed above about new roads (in case there are new roads). For the comparison of stops and/or platforms there are already some propositions.

ToniE · September 7, 2023, 2:13pm

‘identical’ and ‘different’ would be OK if you focus on the same row in the table only.

GTFS feeds based on MentzDV (those with route_ids like 19-210-s23-1,…) may be handled differently.
Often they include versioned route_ids for the same bus (19-210-s23-1,19-210-s23-2,…). In this case you might want to know whether two versions of the same bus route differ - different MD5 is a good and short indicator (having length of 32 characters instead of presenting all stop names).

Sure, the page under discussion provides an overview only on what has changed. For the details, you have to select two routes and click on “Compare selected Routes” - there is currently no code behind that.

That’s a good approach and should be applied to single OSM routes == GTFS trips only not to fill the map with too many icons, …

Nielkrokodil · September 7, 2023, 4:14pm

Well, it is shorter that the list, but for a human reading/memorising it it still is pretty (and unnecessary) long. You could cut the MD5 after the first few (5?) characters with … to expand on click/hover to save space.

ToniE · September 7, 2023, 4:56pm

Yeah I was thinking about the same, that’s how Git and Docker and … do it.

ToniE · September 7, 2023, 5:43pm

Done that now

flo2154 · September 8, 2023, 7:04pm

Cool!

“versus” could be abbreviated with “vs.” or even “→” or “/” or “|”, so that the table doesn’t grow so large horizontally.