I’m currently working on comparison of two GTFS feeds with respect to consequences for OSM
I was thinking about what could be good indications for a change in a route:
number of route variants has changed
but even if the same: variants could use other stops
total number of stops has changed
but even if same: stops could have been replaced by other stops
number of different stop_id has changed
e.g. stop has same stop_name but other platform (stop_id)
also: stop_id has been fixed
but even if the same: sequence of stops could have changed
number of different stop_name has changed
also: stop_name has been fixed (abbreviation expanded, …)
but even if the same: sequence of stops could have changed
sequences of stops have changed - using md5 over the data
using stop_id
using stop_name
using stop_lat / stop_lon
start and end date changed
I did not consider route_id and trip_id in the analysis, they might be stable in one GTFS feed, they might always change from version to version in other feeds.
First code handles an overview on all routes in two feeds - a single Web page with no further activities.
You may select two routes and click on “Compare selected Routes” but that’s not implemented yet.
This looks neat, thank you for working on it. One minimal improvement is to add delimiters in the date fields, so rather than 20230814 it would be something like 2023 08 14 (or with dashes, to match ISO).
Also maybe auto-collapse columns where none of the values changed?
Another indicator could be route length. You could spot changes after new roads have been built.
I see this possible use case:
It will help you maintain already mapped routes, because you see where to update OSM Data and do not have to go through all routes and check manually.
If this gets extended to see differences of routes in one single gtfs feed (different variants) I would use it a lot. What I imagine is:
Render both routes (same style as now, with stop numbers you can click on) on top of each other in different colors. Color the stop numbers differently: Same stop name and same number = green, same stop name but other number = yellow, stop name only in one route = color_of_the_route.
This could be extenset to compare even more than 2 routes, as long as you have enough different colors and enough space for the stopnumber-popups.
It would be nice to compare a gtfs route with an osm route too.
Tram 1 has 6 route_id versions and their stop sequence indicators are all the same - so maybe changes only in timetable?
Tram 2 has 7 route_id versions, most of the indicators differ - would be interesting to compare their start and end dates. Short term changes caused by construction?
…
Simply spoken: GTFS routes correspond to OSM route_masters, GTFS trips correspond to OSM routes - rendering two GTFS routes in one map could fill the whole map with icons though.
Yep, this is on the to-do list. E.g. in the PTNA analysis, take the GTFS info from CSV list (feed, route_id) and compare it with the route_master.
The date formatting looks good. The hiding has a bug where the last row isn’t included (it always shows up). Hiding rows is a good feature, but the one I was actually suggesting was hiding columns where no row has changes. That would reduce the horizontal width of the table, which would be nice.
True, but intentional: the “change” is that the Franklin Line Shuttle disappeared in the new GTFS feed.
That’s quite hard to implement. I would even have to mark those columns in the head of the table before actually starting the analysis. Alternatively code the “hiding” completely in JavaScript: hide specific columns if there is no cell colored in orange.
For me also it is a good start. Of course, such a tab may not be the ultimate tool to use when trying to see where the differences with the previous GTFS feed are, but it can help a lot as detailed information are shown. To my humble option the sequence information (stop_id, stop_name and lat/lon) doesn’t bring me a lot of information as the MD5 is not helpful (for me a text information like identical or different will do the work and spare some place in this tab).
Currently when I’m using PTNA (the part with OSM relations not the GTFS part) I look at the colour in the column last modification. If it is orange, then there are some differences, and I will have a look and try to correct the OSM relations if possible. I think such a tab will be helpful to use in the GTFS part of PTNA and display an orange column somewhere see if there any differences and not only in the validation date. It will give the GTFS part of PTNA the same kind of sensibility about changes as the OSM parts. Therefore, good work @ToniE.
I’m quite sure there will be some kind of extensions to this functionality. As discussed before the graphical comparison will be probably useful. And there are already the same kind of functionality in the OSM world. Here an example of the comparison of hiking trail using the Knooppuntnet site: it compares the OSM hiking relation with a trace and using a dedicated colour scheme to get the difference in the shape. OK not all GTFS-Feeds have shapes, but it could be a useful functionality to see what was discussed above about new roads (in case there are new roads). For the comparison of stops and/or platforms there are already some propositions.
‘identical’ and ‘different’ would be OK if you focus on the same row in the table only.
GTFS feeds based on MentzDV (those with route_ids like 19-210-s23-1,…) may be handled differently.
Often they include versioned route_ids for the same bus (19-210-s23-1,19-210-s23-2,…). In this case you might want to know whether two versions of the same bus route differ - different MD5 is a good and short indicator (having length of 32 characters instead of presenting all stop names).
Sure, the page under discussion provides an overview only on what has changed. For the details, you have to select two routes and click on “Compare selected Routes” - there is currently no code behind that.
That’s a good approach and should be applied to single OSM routes == GTFS trips only not to fill the map with too many icons, …
Well, it is shorter that the list, but for a human reading/memorising it it still is pretty (and unnecessary) long. You could cut the MD5 after the first few (5?) characters with … to expand on click/hover to save space.