Import: Bus stops from GTFS

SafwatHalaby · February 27, 2025, 10:12am

Let’s examine three different approaches.

In the current approach, when MOT updates a stop name, it’s often a change of meaning. מדרגות שצפ became אלסנדיאן. So, the leftover arabic name and the updated name:he were saying two different things. The current approach leaves name and name:he in sync for syntax updates (e.g. a spelling error fixed in Hebrew), but leaves inconsistent data for semantic updates.

Secondly, it’s always incorrect to have an Arabic name, without a copy in name:ar as per Israel name guidelines. In fact, one of my old scripts used to copy name to name:he/name:ar/name:en if that was missing.

That script is no longer active, but both @zstadler and @yrtimiD had shown interest in automating name tag enforcement. Although I don’t know if it’ll happen any time soon, in principle, you might end up with two different bots disagreeing over tagging in the future, with the GTFS bot omitting name:ar and the other re-adding it. And, depending on the specific implementations at the time, this may be a one-time fight or an infinite loop.

Another approach would be to reset the name to Hebrew when MOT updates it without providing an Arabic update. This would keep data consistent in semantic updates, but might annoy someone if the Hebrew name only had a syntax update.

A third approach could be doing the above, but also adding fixme=gtfs2osm-il: The name was auto-reset to Hebrew because MOT updated the name, but did not provide an Arabic translation for the new name. If needed, translate back to Arabic and add the translation to "name" and "name:ar"

In my opinion, the third approach is a good tradeoff.

It never leaves inconsistent data.
It’s in the spirit of the-latest-update-wins
It lets the user know the change wasn’t hostile/political in the case of a syntax update.

A disadvantage of the third approach is slightly more code complexity. One needs to be careful in maintaining the fixme:

Never overriding an existing fixme, and appending to it instead.
Never double-adding the same fixme string chained.
Removing the fixme when appropriate, taking care not to remove any non-gtfs content from it.

Using a dedicated note:gtfs2osm might be simpler and less confusing for those who want to put an unrelated manual fixme, and it seems allowed, but “fixme” creates a visual indication of something requiring attention in most editors. Come to think of it, I think note:gtfs2osm is nice and simple.

Edit: used more correct tagging keys.

SafwatHalaby · February 27, 2025, 10:17am

On an unrelated note, the script’s changeset message links to this discussion and the source code. On a timespan of many years, these links are unstable. Back in my days it was forum.openstreetmap, The hottest git and the hottest forum software change from time to time.

Consider linking to the script’s wiki page instead. It’s more likely to be eternal, and it’s one link rather than two so it’s shorter for people to read. You can always update the source code / discussion link there

NeatNit · February 27, 2025, 11:13am

That’s an older changeset, more recent ones do link to the wiki page: Changeset: 162411100 | OpenStreetMap

It took me a while to actually create that wiki page, but after I did, I added it to the changeset message.

You could argue that the other two links are unnecessary if the wiki link is there, but the way I see it they do no harm. You could probably convince me to remove them if you try though

NeatNit · February 27, 2025, 11:21am

I have no problem changing the bot’s behavior, all of your suggestions are certainly doable. I just ask that at least one or two more people agree that it needs to be changed before I go and do it. I will always go with the community consensus.

I just don’t think the current behavior is that bad, especially since the name tag mismatch is always temporary - the MOT do eventually add Arabic names to all stops, and when they do gtfs2osm updates name and name:ar. A single bus stop node should never have an issue for more than a few weeks.

As I said, I will go with whatever consensus there is, as long as it’s actually a consensus.

Edit: I also think it would be harmful for the bot to change the name tag from Arabic to Hebrew under any circumstances, if it won’t automatically change it back to Arabic when such data becomes available. But it would be possible to detect such cases from the changeset history and act accordingly.

NeatNit · February 27, 2025, 11:31am

This would not be an issue - gtfs2osm will not delete name:en or name:ar tags that didn’t come from a previous import. That is, after the other bot (or human) add name:ar, gtfs2osm will let it stay until MOT finally adds it back.

This is documented in the wiki page, please have a read.

SafwatHalaby · February 27, 2025, 12:10pm

I see. If these events are sporadic and MOT always eventually add Arabic names, it may be not worth any trouble at all!

But knowing this, I think the following is absolutely the right approach: If MOT’s new value for ar/en is null, you should pretend that MOT didn’t update the value at all. Essentially, this keeps the stale name:ar just as it keeps the stale name. This keeps to the convention at the cost of a tiny code change.

Basically, name:ar and name should always be in sync if the name is Arabic. If you decide to remove one, you should remove the other, and if you keep one, you should keep the other.

For the record, the name conventions have been a consensus for a long time. If name is he, it should be also in name:he, if name is ar, it should also be in name:ar. Israel is very multilingual and sometimes the name language varies literally in adjacent shops. You can’t even determine the local language by a place polygon. So this makes it very simple to parse. Want hebrew? go name:he. Want arabic? go name:ar. Want local language? Go name.

The point is, when two bots disagree on a principle, and one bot is doing something that is considered a mapping error for the other, issues may rise depending on future implementations. Besides, there would be changeset noise even today. (Another bot would always copy name to name:ar after your bot removes it)

I’m afraid I don’t have much skin in this game, but, if these links break, -and they will break-, then you’d have dead weight for all of eternity; generations upon generations of mappers would waste a few brain cycles parsing a dead link, a few seconds waiting for a dead link to load, and a few joules of electricity. OSM servers would forever store and transmit needless data, emitting more CO2, accelerating the extinction of mankind And if codeberg.org gets owned by an evil madman in a decade (god forbid), then you’d have no way to divert innocents away from an evil link.

NeatNit · February 27, 2025, 12:26pm

That’s a good point, I’ll make that change within the next few days.

God forbid indeed! But the same could be said about OSM - if the OSM wiki domain gets bought up by an evil madman (but the OSM database and history live on under a new name thanks to community effort), then the wiki link is just as problematic! And, oh god, what about all the time and energy we’re spending right now worrying about all this stuff? Gah!!

… I think I’ll keep the comment as it is for now. I might change my mind in a few days.

NeatNit · February 27, 2025, 3:10pm

While we’re on the topic of stop name, I would like your opinion on this: #42 - Determine language of name tag based on the city - NeatNit/gtfs2osm-il - Codeberg.org

Basically, when creating a new stop in OSM, I currently default to using Hebrew in the name tag. However, it would be smarter to look at all the other stops in the same city, and if the majority of them use the Arabic name, then the new stop will also be Arabic. I think this is a good plan and I intend to implement it when I have the time.

The question comes from the unfortunate case - what should I do if the name should be Arabic, but there is no Arabic translation in the GTFS data? These are the options I’ve thought of:

Use the Hebrew name anyway - not ideal, especially because new stops in GTFS are more likely to be missing translations.
- However, it may be possible to use the Hebrew name and later change it to Arabic when it becomes available.
Don’t set the name tag at all, just name:he, until Arabic translation becomes available.
Use a placeholder name that says in Arabic something like: “Error: name missing”. This would require the least amount of new code (existing code would detect that it’s in Arabic and existing logic would preserve it until translation is available in GTFS) but has the obvious downside of uploading what’s essentially a junk name to OSM.
Use machine translation. I am opposed to this idea, but it’s a possibility.

What do you think? Personally I think 2 is the best option, leave the name tag missing until the preferred language becomes available.

SafwatHalaby · February 28, 2025, 12:15pm

Lack of name is considered an error by many inspection tools and editors, and some bots or users may copy the local language to the name.

What about just using the Hebrew name in stop creation when Arabic is not present, and then the following for updates:

# after possibly updating name:ar and name:he...
IF
  name is arabic
THEN
   name = name:ar
ELIF
  old mot arabic name was null AND
  new mot arabic name is not null AND
  is in arabic area
THEN
  name = name:ar
ELSE
  name = name:he

I think using Hebrew temporarily is perfectly fine.

A name should reflect the ground truth and shouldn’t be forced. If MOT themselves don’t have an Arabic name for a stop, I think it would not have an Arabic name on signs, bus announcement systems, or public transport app. And stops aren’t the kind of stuff that tends to be given locally-created names. So the Arabic name would likely be valueless. No one will search for it or navigate by it. And if for some reason an Arabic name is valid (perhaps MOT forgot to update the GTFS but there is actually an official name), then a local user is free to update it, and when a name comes via GTFS, it would be applied just fine.

SafwatHalaby · February 28, 2025, 12:19pm

I have a code error. Correcting.
Edit: Corrected.

NeatNit · February 28, 2025, 1:55pm

Okay, I’ll implement something like that soon.

NeatNit · March 4, 2025, 6:07pm

This is now implemented: never delete name:* tags and add note:name when data is missing. closes #45 · 57a2789ff4 - NeatNit/gtfs2osm-il - Codeberg.org

Basically, if GTFS Arabic translation is missing, then gtfs2osm will not touch the name:ar tag. Same goes for name:en. It will also add a note:name tag to alert other mappers, see for example: Node: ‪יגאל אלון/מישור הנוף‬ (‪1803042572‬) | OpenStreetMap

I manually restored name:en and name:ar from old values for all stops where they were missing: Changeset: 163213134 | OpenStreetMap

NeatNit · March 4, 2025, 8:34pm

I’ve now done something along those lines. No idea if my code works though, time will tell!

If you want to review the code: Comparing fab423429f..b250e9d1bb - NeatNit/gtfs2osm-il - Codeberg.org

NeatNit · March 5, 2025, 11:53pm

I just ran the import and unexpectedly got 4,001 changes - 3,967 of which added the note:name tag about missing names in English and Arabic. Unfortunately I can’t debug this right now, as I must be going to sleep, so I don’t know whether there is a problem with my code or if the GTFS data actually removed translated names for nearly 4,000 stops due to some error.

This may also be somehow due to an IO issue I’m currently experiencing with my PC, but I would have expected that to result in a crash so I’m doubtful it’s that.

Anyway, it’s very good timing for the code change, because if this happened a week ago it would have removed name:en and name:ar from all of these stops! Fortunately, the result is actually minimal damage - just some note tags that I can debug in the next few days.

See the changeset on achavi, on OSMCha or on osm.org.

NeatNit · March 6, 2025, 10:52am

Okay, since most of my spare time is on public transport today, I decided to start looking into this using my phone. Conclusions so far:

gtfs2osm works perfectly on Android using Termux
I get the exact same result - missing translated names on thousands of stops.

I will have to look at the data more closely, but this could mean one of two things:

They’ve actually removed thousands of translations from the GTFS data - I can’t imagine why, but it could be some error in their upstream sources.
They’ve changed the translations format in some way that makes my code not find the translation.

I think 1 is more likely, but looking at CSV data on Android is a lot harder than just running a Python script, so I might not be able to conclude anything today.

@SafwatHalaby I would be delighted if you take a look yourself, if you’re so inclined. No pressure.

Edit: and I can’t thank you enough for convincing me to preserve older names. A relative disaster has been averted and replaced with a minor nuisance!

NeatNit · March 6, 2025, 11:02pm

Okay, I’m now damn sure the issue is upstream. It seems that all the affected stops have - or ' (or ־ U+05BE) in their stop_name. And almost all of the stops that have - or ' in their stop_name are affected.

That “almost” is really bugging me, I honestly can’t explain it. How exactly did they mess up?

Overpass: overpass turbo

node.all_stops["note:name"]->.affected_stops;
(node.all_stops["gtfs:stop_name:IL-MOT"~"['-]"];
 node.all_stops["gtfs:stop_name:IL-MOT"~"־"];)->.sus_chars; // separate line because Overpass doesn't like Unicode in regex sets

.affected_stops out count; // 3976
.sus_chars out count; // 4059
node.affected_stops.sus_chars; out count; // 3970
(.sus_chars; - .affected_stops;); out count; // 89
(.affected_stops; - .sus_chars;); out count; // 6

// Look at the data output to see the counts of above

.a;
// Uncomment either of these lines to see it on the map
//(.sus_chars; - .affected_stops;); // 89 stops which should have been affected but weren't (???)
//(.affected_stops; - .sus_chars;); // 6 unrelated - names were missing beforehand
out meta;

NeatNit · March 11, 2025, 4:43pm

They’ve fixed it, today’s import resolved the issue: Changeset: 163487505 | OpenStreetMap

SafwatHalaby · March 13, 2025, 3:35pm

Thank you for your efforts!

So far I haven’t had the chance to review the code (life), but the bot updates in my areas of interest make perfect sense, including en/ar handling.

I think the principles we’ve discussed above are working well. I’m reiterating/summarizing in case someone is skimming the discussion.

If name is Arabic/Hebrew, it always equals name:ar/name:he respectively.
If MOT lacks an Arabic/English, the bot “unlocks” them, as in stops touching name:ar/name:en, and name if it’s Arabic, giving users the chance to manually translate if it’s needed.
If these names return to MOT, the situation goes back to normal where MOT name data is prioritized. This is usually the case.

NeatNit · March 13, 2025, 4:18pm

I’m happy to hear the results are good so far. If/when you do review the code, note that I recently updated the readme and it should be a good place to start, especially the Code structure section.

NeatNit · March 13, 2025, 7:15pm

I just did some clean-up on note and fixme tags, and most of the note tags on bus stops were just listing the lines that stop there. This is better served by route_ref which is intended for this purpose.

It made me rethink: should gtfs2osm set the route_ref tag based on GTFS data? I already considered this back when I first developed gtfs2osm, but was told by some other mappers it’s useless data, better mapped with route relations instead of tags on bus stops. This tag is also not really used by renderers and apps. However, we’re not mapping for the renderer. The tag is documented, and clearly at least a few mappers think it’s valuable to map it (though some incorrectly did so using notes). Therefore, I think this is worth reconsidering:

Should gtfs2osm populate the `route_ref` tag with the routes serving each stop?

Yes
No

0 voters

Today there are 58 63 bus stops with this tag (Edit: 63 after converting some lines tags to route_ref).

Import: Bus stops from GTFS

Should gtfs2osm populate the route_ref tag with the routes serving each stop?

Should gtfs2osm populate the `route_ref` tag with the routes serving each stop?