Israel GTFS release

How up-to-date is the data in the gtfs file? E.g., if a change is made to the Ministry’s database on Sunday, how long is it until the publicly-available gtfs file reflects the change? (Note that this is independent of how often the auto-update script is run)

How about some escape hatch, e.g., “The script never updates objects that have gtfs:auto-update=no”? (gtfs:auto-update would only be set manually, not by any script)

The GTFS file updates nightly. So, realistically, if the official database is changed on Sunday morning, the changes will be in the GTFS file by Monday.

I downloaded a gtfs file a couple of days ago and downloaded one today. It wasn’t the same. 2 stations were deleted and 11 were added. There are indeed frequent updates. Apparently nightly.

That could be done, but why? Note that above strategy only touches stations when the ministry updates them. If you modify a station, the bot will not force-modify it back to fit the gtfs database. It’ll only modify it if the ministry actually updates the station.

The gtfs:id is needless overhead, because “ref” is unique enough. I’ve removed it as part of the cleanup. Also, I’ve changed source=israel_gtfs_v1 to israel_gtfs, because the plan is to make incremental updates, and v1 no longer makes sense.

Some in the talk list seem to be interested in the permission. I’ve decided to translate the old comment by @yxejamir.



See also the terms&conditions from


So I think it’s okay to use this data in OpenStreetMap.

I’m not on the talk list, but I read the thread on the archives. To address the accuracy & verification concern:

Israel has something like 30,000 bus stops, and they change daily all across the country. There’s no way human mappers could ever verify the accuracy of all of them, unless you have someone working full-time on this. However, the data is considered extremely accurate, and inaccuracies are quite rare. We do have a system that announces the name of the next stop, which uses this data.

I think people from other countries don’t realize that this is not a single, private operator data, nor it’s a single city data - it’s government generated data that controls the entire public transportation network in the country. If a bus stop is not in this dataset, it doesn’t exist. There will never be anything more accurate for bus stops in Israel than this dataset.

If accuracy is important to us, we must implement this importing script, otherwise the data on OSM will get stale quickly - just like the current data is stale and shows a lot of bus stops that have been since then moved or canceled.

People who have read this entire forum thread probably know this already, but I’m posting this comment to help people coming from the talk list to understand the subject.

Thanks for the post!

So here’s my precise plan. I already got a working prototype, but there are some bugs that need resolving:

Column 1: Old GTFS dump
Column 2: New GTFS dump
Column 3: Openstreetmap

for each bus stop ref, find out in which columns it exists in which
it doesn't.If multiple bus stops have the same ref in any column,
the script halts till I manually intervene.

Exception: platforms (ratzefeem) sometimes have
db ref duplication that we should merge into one.

X       : A single bus stop with that reference exists in that column
-       : No bus stop with that reference exists in that column
=>      : action to be taken

1 2 3
- - X  => Nothing. (or maybe delete?)
- X -  => Create.
- X X  => Update. (We would have created but an OSM mapper created it first)
X X -  => Create.
X - X  => Delete.
X - -  => Nothing. (We would have deleted but an OSM mapper deleted it first)
X X X  => Update.

Updating action: scans all tags and:
- if col1's tag value does not equal col2's tag value: 
     sets osm bus stop tag value to col2's tag value.
- tags that only exist for the bus stop are not touched (e.g. shelter, wheelchair).

regarding the first option on the table, I think deleting is the right approach. If it has a GTFS ref, it means it used to be in the GTFS long ago, or someone inputted the incorrect GTFS ref when manually adding the stop. This means that it’s stale data. If someone added a bus stop that isn’t in the GTFS (those can’t legally exist for public buses, but can for private shuttles, such as shuttles from train stations to industrial zones) it won’t have a GTFS ref, so we shouldn’t touch it.

Alternatively, if we want to care about mappers accidentally adding private bus stops with a ref field (idk what they’ll write in such field in this case) and we want to preserve this data, we can add a condition that in the first table case (exists in OSM, not in the last two GTFS releases) only delete it if it has source=israel_gtfs, otherwise - keep it intact.

Another point worth considering: In Israel, each stop has a short identification number that is written on the stop sign. That is stop_code in the GTFS[1] and not the stop_id in the GTFS. We should make sure we use that, and not the stop_id field when we set the “ref” tag. stop_code is more useful for humans. Also, stop_id can theoretically change without the stop itself changing (I saw it happen in the past), but stop_code is pretty static.

[1] see documentation:

Another point worth thinking about: At the moment, OSM only has the Hebrew names for the bus stops, but the GTFS file also contains translations of these names from Hebrew to English and Arabic (see translations.txt).

When importing stops, it’d be a good idea to import their translated name too, if one exists.

Another point worth considering regarding “- - X” is West Bank area C. Do they have marked bus stops? Do they have their own ref system? If so, the rules inside area C should never be “delete” for “- - X” (for bus stops lacking israel_gtfs), regardless of what they are in Israel. Luckily we have the proper relations to do this easily if needed.

Good point! I’ll first finish the “minimum viable product” and then see if I can implement this.

You’re correct about area C.
I think the safest way would be to never delete any bus stop that doesn’t have source=israel_gtfs - this way we’ll be sure we don’t destroy Palestinian bus stops or private shuttle bus stops.

I agree. This seems like the most reasonable approach.

- - X and israel_gtfs > delete
- - X without israel_gtfs > ignore

The script is working well. (Still no live runs).

There are 1573 bus stops without a “ref” tag that the script completely ignores. What should we do with them?

When the script first runs, it’ll create duplicates that do have a ref tag for most of those stops. Some of the ones that won’t have a duplicate created may be long gone.

Without both ref and gtfs:id?

Yes. These are the ones added by people, and not by the original import.

(I removed all gtfs:id tags, by the way. ref is unique enough)

I liked to have gtfs:id for cases where ref changes for same bus stop. (Not sure if this can happen)

You can try to write a script to find pairs of gtfs and manual stops and merge them. But this is not trivial.

I could do that. Not trivial but not very hard. But it’s only safe for really close stops (e.g. <10m). We can’t know if a 20-meter far stop is a duplicate or a legit stop which is not in the gtfs db. Even manual armchair checks wouldn’t work.

We could merge obvious ones (manually or through a script) and keep the rest, hoping someone will survey them.

Or we could assume that the national gtfs db is perfect. Not in positions, but in bus stop listings, and trust it by removing all ref-less stops after the incremental update is applied. But is it really perfect?

gtfs:id causes confusion. Examples: this bulk edit, editors confusing the two, and me confusing the two while writing the script. Having dual-ids also complicates the incremental updates script unless I ignore one of the ids.

gtfs:id may be needed in the future to grab routes/translations from the gtfs files. This may have been an oversight on my part. If we ever need it I’ll add it back, but it’s likely that I can make the script use it internally where needed without ever adding it to OSM.

Ref is public facing and appears to be permanent, and in case it ever changes, the bot will simply remove the stop and add a new one in its place. A possible drawback is that some values like shelter=yes/no would be lost, or maybe that could be a good thing; if the ref changes, it’s likely accompanied by construction work and change in the physical area, and someone should re-survey to see if the shelter remains or if one has been added. It’d be really strange if they just randomly change a ref without any construction work in an area.

Just in case, I am looking for cases where it actually changed, making sure there are none or very few.

Wrote an experimental script for “merging duplicates”. If X has ref, Y has no ref, and they are closer than the threshold, delete Y.

If the threshold is 5 meters, 46 stops would be deleted.
if the threshold is 50 meters, 1200 (out of 1573 ref-less) stops would be deleted.
100 meters - 1374
200 meters - 1451

Edit: Sorry. That’s the data assuming work is done on the 2012 stops. The numbers are higher if the incremental update is applied first:

5 meters - 127
10 meters - 402
50 meters - 1311
100 meters - 1471

Edit: Script source: (Current source will output different numbers, because it was modified to require human help for non obvious cases)