[RFC] Feature Proposal - GTFS Tagging standard

Peter_Elderson · December 2, 2023, 8:25am

Just brainstorming: Wouldn’t this better work, and completely generic, as an application with a gtfs database that can be queried with an API using either specific id’s or names/refs or simply the geolocation as an argument?
The total workload required would be different (programming and hosting) but less (no mapping at all). And any application could use it without altering its data structure.

spaanse · December 2, 2023, 10:51am

There already exists transit.land
In my experience, the site it is slow and gives incomplete information (does not always show all route variants). I don’t know how performant their API is.
Furthermore, the onestop IDs - designed to unify duplicate stops across feeds - do not work.
See s-u1hpwr8cp5-arnhemcentraal and s-u1hpwpxstf-arnhemcentraal.
Or see r-u1h-stoptreinre19 and r-u1h-e19.
These should be the same, but are not.
Note: I cannot find a onestop ID for individial platforms. The dutch feed actually refers to platform 6b for route RE19, the german feed to the whole station.

Another website is Transitfeeds that is purely a repository of feeds and has a good interface to explore those feeds. Unfortunately it is discontinued and not updated anymore. The repository of feeds can be found at database.mobilitydata.org .

I think linking OSM and GTFS is useful from the GTFS side as well. OSM could be a solution to linking the same object across different feeds - it already has a single feature for each platform/station/… .
The positioning of stops in OSM is better; and routes contain the actual roads instead of a GPS trace (if a GTFS trip even has them).

skyper · December 2, 2023, 6:54pm

Maybe, we should go one step back and talk about what we try to solve and what the purpose of the feed relation should be.

If I have a stop or route and want to know the url of the gtfs feed tagged on the object, the url should also be on the object. Climbing up the ladder to find the url in the feed relation is probably more complicated than linking to some external source (wiki).
If I want to know the area covered by a gtfs feed a boundary relation could be the right choice.

I can understand that adjusting the url every month (or even weekly for Switzerland) is not appropriate but that is a problem of GTFS in general and it is not solved by changing the url neither on every object, the feed relation nor the external source as the OSM data needs to be check and adjusted, too.

spaanse · December 2, 2023, 8:50pm

The goal is finding the URL from a platform, stop, station, route or route_master.
Users would not care for details about the feed (like area of operation), only the timetables in that feed.

On mapping area of operation to find the feed
With an is_in query we could find all surrounding areas of a stop, including the feed relation.
This does not work for the PT relations since they do not have coordinates.
Thus for those we need to climb down the relations to a stop and use their coordinates.

Another problem is that the area of operation does not cover all stops/routes of the feed.
It is normal for routes that cross the border to be included as well.
Extending the area with tentacles around those routes is stupid.
Should we then make the international stops a member of the relation?
What about platforms mapped as ways, how do we not confuse the MP algorithm with them?

Switzerland has a permalink for each yearly timetable:
2023: https://opentransportdata.swiss/de/dataset/timetable-2023-gtfs2020/permalink
2024: https://opentransportdata.swiss/de/dataset/timetable-2024-gtfs2020/permalink
Still it would be a hassle to ask permission for a mechanical edit every year.

Look at the following quote from the GTFS standards website:

Getting Started - General Transit Feed Specification (emphasis mine)
Datasets should be published at a public, permanent URL, including the zip file name.

I think it is helpful to separate the feed into two parts: permanent and temporary routes/stops.
Temporary routes and stops should not be mapped. Most weekly changes in the feed will concern these. Therefore these do not need to be checked or adjusted.
Permanent routes and stops will rarely change during the year.
These will probably only change significantly once a year - when the new timetables are rolled out.

I think we only need to check OSM data when the yearly timetable change occurs.
In that period a lot of changes will be made to the public transport objects.
During that period you don’t want a mechanical edit that changes URLs on all objects.
Such a edit would make reverts of changesets with mistakes before it more difficult.
A single change on a feed relation or wiki is a lot easier and less intrusive.

spaanse · January 2, 2024, 1:54pm

I have updated the proposal.
The feed relation has now been removed, in favour of listing the feeds on a wiki page.
I think this makes the proposal far simpler and more likely to succeed.

Any feedback on this new version is welcome.

ToniE · January 2, 2024, 2:27pm

I just added a request “Section for Best Practice” - this section not part of the “proposal” though?

I’ve seen so many different GTFS feeds and how they organize their data and how “useful” it can be (or not) for OSM.

Just to avoid adding gtfs:* tags here and there to routes and stops, where the data is no longer valid the next day, with the next update.

skyper · January 8, 2024, 3:39pm

Thanks for the update.

Could you please list all new tags without a wiki page so far which are included in the proposal and describe them with a few words, e.g. gtfs:location_type or gtfs:platform_code. Thanks a lot in advance.

I see that you use gtfs:route_long_name and similar. Do you propose to deprecate gtfs:name, gtfs:long_name and gtfs:short_name?

spaanse · January 9, 2024, 9:54am

For all of these: they correspond to GTFS columns for which precise documentation can be found on Reference - General Transit Feed Specification
I included the more important columns as a collapsed table in the background section.
The definition of these will be: “The exact value of the corresponding column in the GTFS feed”

gtfs:stop_code - “short text/number that identifies the location for riders”
Whether it is actually public-facing can differ. OVApi has the station code (railway:ref) for stations (public facing) but the last part of the IFOPT (ref:IFOPT) for bus stops (not public facing).
gtfs:stop_name - The stop name according to the feed. May differ from name in abbreviation, capitalisation, … .
gtfs:location_type - stops.txt lists all sorts of locations, location_type distinguishes between these. In theory the value can be deduced from the type of OSM object. I included it so that a data consumer does not have to and can just use this tag to distinguish between a bus stop and it’s platform in the GTFS feed.
gtfs:platform_code - The letter/number that identifies a platform of a bus or train station. Will likely match ref but can again differ in capitalisation.
gtfs:route_long_name - Full name of route often with destinations
gtfs:route_short_name - Short identifier for route - e.g.g bus number

There are others (like gtfs:wheelchair_boarding) that are also allowed under the rule that any column name can be tagged. I don’t think it is useful to include these in the list since they are not relevant for the main purpose of the proposal - specifying a way to find timetables from a OSM object.

skyper · January 18, 2024, 2:52pm

Well, I always find it confusing when a proposal mentions non-established keys which are not part of the proposal itself in the examples.
I am not sure if we want to import the complete GTFS data in OSM. Do we really need e.g. gtfs:location_type? The names of the fields in the GTFS specification can change and even worse many GTFS providers do not follow the specifications strictly enough there for I would rather use common OSM tags with gtfs: as prefix like gtfs:short_name or gtfs:long_name.

You did not tell me how to handle gtfs:name and I do not think that we need different keys for the name of the stops and the name of the routes.

spaanse · January 18, 2024, 7:35pm

I agree that it is not useful to import all columns into OSM - and if done they should mostly end up in regular tags. (like wheelchair_accessible, route color, …)
The aim of this proposal is to specify how to reference objects in a GTFS feed.

While making the examples I found out that they could be useful for identification as well. In the stops.txt table different sorts of objects are put in the same table. location_type may be the only way to distinguish platforms and the (bus) station (if platforms do not have a platform_code). Note that GTFS also uses the term ‘station’ for regular bus stops.

I doubt this, as it would require massive effort from thousands of transit agencies and apps that consume the data. Unless it would solve a big problem with the specification, it is unlikely to change.

By using the column names we can handle these cases.

The main goal is to eliminate guesswork for the data consumer. Currently gtfs:name has the meaning “the name of this stop according to a GTFS feed”. My proposal changes this to “the precise value of the name column in the GTFS feed (of the feed suffix)”. Having gtfs:name as an alias for gtfs:route_long_name, gtfs:route_short_name and gtfs:stop_name introduces more guesswork for the consumer.