Import: Bus stops from GTFS

I understand. FOSSGIS 2020 conference page 109: not exhaustive, in German, from 2020. I tried Deepl Translator with c&p of (parts of) the PDF contents, the translation is OK. At least it gives an overview why we started developing PTNA and where we came from.

PTNA evolved in the meantime and GTFS analysis is now part of PTNA. Comparing a single GTFS trip with an OSM route is possible as well as a comparison between a GTFS route and an OSM route_master.

I started a thread here in the OSM Community: PTNA news some time ago (with 153 contributions) including a discussion on how to compare GTFS with OSM.

An exhausitve documentaion is still missing though, there are so many ideas crying for realisation/implementation.

I did some online intros in the past with up to 20 participants. Benefit: with little preparation on my side, I can react on questions, dig into details here and there, … I could arrange such a session after September 16 and announce that in the “PTNA news: …” thread.

My next plans re. intro:

  • have one or two talks on the next FOSSGIS, March 2025 conference in Münster, NRW, Germany (for DE, AT, CH).
  • depending on the location of the next “State of the Map Europe” and/or “State of the Map”, repeat those talks, provide a workshop … (I don’t want to travel by aircraft though or do that online)
1 Like

Back on the topic of the import:

I’ve realized that every time I run the import script with a bounding box, it runs the risk of deleting stops or creating duplicate stops near the edges. Therefore I want the next run on the import to finally be for the whole country. But that is a bit risky as I’m still making changes to the code and might introduce a bug that would cause damage to the map. Some of these changes cannot be tested on the dev instance because they really on Overpass.

I might try to generate an OsmChange file instead of uploading the change directly from the script. Does anyone know of a way to visualize what an OsmChange would do before it’s applied? That would allow me to try to catch bugs before actually affecting OSM data. I’m hoping for something like OSMCha’s visualization.

Currently running the import on the entire country. Since no one objected and I am now confident in my code, I decided to go for it.

The first changeset is here: Changeset: 156526115 | OpenStreetMap (still ongoing, my code uploads everything one-by-one)

There will be multiple changesets because of the 10,000 element limit. You will be able to see them all here: Changesets by gtfs2osm-il | OpenStreetMap

Changes are being uploaded from north to south :slight_smile:

EDIT: The upload is finished! These are the changesets:

North: Changeset: 156526115 | OpenStreetMap | OSMCha
Center: Changeset: 156529002 | OpenStreetMap | OSMCha
South: Changeset: 156531963 | OpenStreetMap | OSMCha

It took so long, I really regret not doing it with OsmChange uploads. However, I was not sure that it would work with OsmChange because of potential rate caps.

Follow-up on GTFS and PTNA can be fond at OSM Community: PTNA: news for Public Transport Network Analysis

gtfs ptna public-transport

Thank you! That was quicker than I expected :slight_smile:

I ran the import/update script a few more times since then, and I will try to do it regularly. I’ve also been fixing things )extracting bus stop nodes from ways) based on its output.

The most recent changeset includes some large position changes, because it has updated code to allow the script to update nodes that are “too far” from their previous position, only if that previous position was placed by a previous import. In other words, the bot will only refuse to update stops if there was human intervention.

There are still 27 stops where human intervention happened - those will need to be examined manually.

Some more manual clean-up, and ran the script again:

Manual clean-up:
Changeset: 156722530 | OpenStreetMap
Changeset: 156724803 | OpenStreetMap
Changeset: 156725172 | OpenStreetMap
Changeset: 156725439 | OpenStreetMap

Script:
Changeset: 156725845 | OpenStreetMap | achavi (achavi loads way faster than OSMCha for me)

This unfortunately includes bad data from MOT: name:en for Node: ‪בית ספר גולדה מאיר/ברקת‬ (‪1802995006‬) | OpenStreetMap is the Hebrew name. I need to email them about this but I’m in a hurry so it will have to wait at least a few hours.

I’ve now sent that email, but to be brutally honest I don’t think they’ll fix it. I probably need to find a better contact to send these issues to. If anyone knows a good address, let me know.

I went over and fixed all but one of the stops which gave me a “too far” error. I also ran the script again of course, the result: Changeset: 156783885 | OpenStreetMap | achavi - Augmented OSM Change Viewer  [attic]

The remaining troublesome stop is one where the GTFS position is definitely wrong, and we should contact the MOT to have it fixed. See the aerial view on govmap:

I will update its tags manually to the import’s schema. Hopefully the GTFS position will be fixed eventually.

sorry if this was already discussed, but do you think there is a way to merge imported stops with bad position with existing stops where position was corrected manually by mappers? Asking GTFS to correct location is nice, but currently we lose all correct position contributions.

Another question: what is the object:city tag?

I’ve created a wiki page for gtfs2osm: Gtfs2osm - OpenStreetMap Wiki. It still requires a lot of work but it has the basic info.

Hi, sorry I missed this question. My bad!
Position contributions by mappers are respected and preserved. gtfs2osm only updates position when it gets updated in GTFS. If you move a bus stop, it should stay where you put it. The only exception is if the new position is less than 3m away from the GTFS position, in which case the script assumes you moved it by mistake and tried to put it back where you found it.

See Key:object:* - OpenStreetMap Wiki, essentially it’s addr:city for things that don’t really have an address. The old imports used addr tags, but this was incorrect usage of the tag.

I updated the wiki page for gtfs2osm, I hope it’s clear and easy to read now: Gtfs2osm - OpenStreetMap Wiki

In general, imports have been going pretty well! I try to run the script every day, but I often skip weekends and sometimes forget. I’m still a bit wary of setting it up to run automatically, there would have to be more failsafes for that to be a good idea.

Yesterday I went and manually fixed a whole bunch of minor issues. Most notably, I detached bus stop nodes from any ways.

1 Like

I just added a couple of tweaks to the imported data: replace non-breaking space with normal space, and merge repeating spaces into a single space.

Changesets showing the changes:
Merge repeating spaces in GTFS data (achavi)
Replace non-breaking spaces from GTFS data with normal spaces (achavi)

I also added a FAQ to the wiki page :slight_smile:

I just made a script to create a geojson file highlighting stops that mappers have moved from their GTFS position. See it here:

And the results:

If you don’t want to run the script yourself you can just download moved_stops.geojson as a raw file and open it with JOSM or whatever other software. Looks like this (used Ctrl+A to Select All and highlight everything):

Zooming in to a random spot in Tel Aviv for example:

So this means these two stops were moved by mappers. We can use this to try and spot mistakes. For example, I am familiar with this stop in Haifa and I’m pretty sure this was wrongly moved:

If you use this to edit, you should move the stop back to where the line starts. If it’s less than 3 meters away from the line start, gtfs2osm will automatically snap it back to its exact original location.

I now also detect bus stops which do not serve any routes - these might be old leftover data, and should be checked if they still exist via survey:

Hi. I’m the creator of the older script.

The philosophy behind allowing OSM users to change the bus stops, was the basic assumption that on average, an OSM user’s edit is a good edit (otherwise OSM wouldn’t exist!).

This would mean that the OSM-MOT fused dataset is more high quality than a plain MOT (Ministry of transportation).

In practice, I saw many cases of people moving the bus stop to the correct side of the street, or fixing a stop whose MOT location is off. I also saw many tiny, sub-meter edits. I assumed these were errors where people editing the environment of the stop drag it ever so slightly during their workflow. This is where the 3-meter rule came from.

A critical note here is that the old script took the last-modified aspect into account: if the MOT location update was more recent, it did override the OSM location. In other words, the newer update always won. This is precisely why the old script compares two different GTFS files (deriving the approximate update date of the location update).

A totally different philosophy would be to treat upstream as the source of truth. This is much easier to code (the old script calculated deltas of two different GTFS releases to determine which MOT updates are newer than which OSM updates). But it has two significant drawbacks:

  1. demanding OSM users to talk to MOT is an entry barrier, and most likely many good changes would simply be lost.
  2. Some good OSM edits will be overridden. Even if they’re only a few, this is bound to annoy OSM users, especially those who made the effort to survey physically. And it also contradicts the ground truth principle. This could create hostility towards the script.

If the MOT dataset is extremely high quality nowadays, a new philosophy could be justified. But if the script aggressively reverts good bus stop location changes, even only rarely, this could be a bad idea.

Additionally, if whenever an OSM user touches a stop location, it never again gets updated by MOT, this would surely create stale stops and miss perfectly useful MOT updates. So it’s wise to apply the most recent location update, whether it came from MOT or OSM.

I will thoroughly review the new code when I have the chance. But I wonder, how do you deduce the last update date? Is it encoded in GTFS nowadays? Do you rely on the script running very frequently and assume any detected MOT change is the most recent? Is there a stateful DB?

Edit: The most-recent-wins reasoning applies to any soft-updated tag, not just location.

I’m very happy to be scrutinized like this. In reality, I preserved nearly 100% of your original design goals, and I will happily award you credit for deciding them in the first place. To answer your concerns:

This is still the case. In fact, even bus stops that were moved by mappers before I made gtfs2osm, keep their human-mapped position unless there was a GTFS update. I even made a script to show which stops were moved compared to GTFS, and it’s the very latest comment before yours: Import: Bus stops from GTFS - #54 by NeatNit

This is one of the things I like the most about gtfs2osm. I just look at the edit history of the node. If a changeset has source=mot.gov.il, then it’s a GTFS import. I iterate through the changeset history and keep track of the last value placed in each tag by a GTFS import. And then at the end I simply compare the value from the new GTFS data with the last-imported value.

The code for this is here: gtfs2osm-il/gtfs2osmHistory.py at main - NeatNit/gtfs2osm-il - Codeberg.org

There are only two caveats for this approach:

  1. gtfs2osm can’t determine which bus stops in GTFS are brand new.
  2. gtfs2osm can’t detect when a bus stop has been removed from OSM - Overpass can’t give you data about deleted elements (unless you get very creative which I decided against).

My solution to this is to assume that the routes data is better than the stops data - if a bus stop is missing from OSM and is part of any bus route, I add it to the map. If it’s missing from OSM and is not part of any bus route, I don’t add it. The downside is that new bus stops are only imported when they start actually serving a route - unless, of course, a mapper adds it with ref.

I know this approach isn’t perfect. Just today I found 4 bus stops in the Technion campus that are still part of a route in GTFS yet don’t exist in reality. But, I sent an email to the GTFS support, and I fully expect them to delete them within a week.

As I said - I welcome the scrutiny. Let me know if you find flaws in this approach. So far no one has complained.

Edit: come to think of it, a compromise is possible, where I keep some old version of the data to compare against to detect deleted OSM stops. I will consider implementing that, for exactly cases like the Technion stops that don’t actually exist. But I would have to very carefully decide what the rules are, because the script being stateless is one of my core design goals. The only “state” it keeps between runs is a cache of the node history, so that it doesn’t need to hammer the OSM API with thousands of changeset queries every day.

The greatest design flaw in the old design was the statefulness. I’m very happy you took action and improved it. Thank you for the great work :smiley:

1 Like

You might want to look at the history of node 5210712683. It appears a user copied name:ar to name, and afterwards the script removed the name:ar and name:en. It left the node with an arabic name which does not appear in name:ar

This is working as intended. Sometimes when they update a stop name they don’t add Arabic/English translations, leaving it with only Hebrew. In those cases my code removes name:en and name:ar. However, if mappers previously changed the name to Arabic, I don’t want to revert it back to Hebrew, so I just preserve the previous name.

Every now and then I email the GTFS support about stops that are missing translations, they usually fix it within a few days. I really wish they’d improve their internal QA to avoid the issue in the first place, but what can you do?