Multi-language/default language naming conventions

Minh_Nguyen · November 6, 2024, 7:22pm

That ship sailed back when OSM API v0.6 dropped formal support for multiple values and won’t come back to port until v0.7 at the earliest. In the meantime, there can only be a single name=* tag, data consumers expect that tag to be present as a prerequisite to the localized subkeys, and there are multiple approaches to reconciling the various values that lay claim to the same key, as a matter of convention rather than as part of the database format.

One of these approaches is to just pick a language as a convention among mappers. This is how all of India has name=* in English, despite the plethora of local languages that are sometimes more prominent on the ground than English. In parts of Canada, there seem to be local cultural or historical arguments against this approach. As an outsider, I can’t really speak to these issues or second-guess them.

Another approach is hybrid naming. This is popular in regions where two local languages have opposite noun–adjective order, taking advantage of a common writing system and the tendency for the “base name” to be shared between the languages as well. In Winnipeg, the hybrid French–English names conveniently allow road users to read half the sign and ignore the other half, even if the English comes out less than fully anglicized.

(De la Cathedrale Avenue is obviously named after the cathedral, not someone with the surname De la Cathedrale. Conversely, Avenue Fifth must be a little awkward in French, but there it is on the sign. Sometimes the all-important directional quadrant gets lost in translation, as in Promenade Shady Shores Drive W.)

Not every locality is so lucky to be in a similar linguistic environment, and one can conceive of a situation in which a street has a distinct name in either language, so this is far from a general solution. Perhaps we can think of this signage practice as a kind of contraction, which is a kind of abbreviation, which we generally don’t do in OSM when we can help it. For the general case, most places with multilingual needs end up including all the names in some order, separated by some delimiter. It gets chaotic fast for any data consumer that tries to do things automatically on a global scale, or that tries to use OSM data in some unconventional manner.

Yes, this is an unsolved problem. In principle, during turn-by-turn navigation, a user unfamiliar with the area wants to hear names that match what they’ll see out the windshield, while a user familiar with the area may prefer names in their own language, but they’ll get along fine with what they see out the windshield too. We don’t have a good mechanism for explicitly indicating the name that’s useful for wayfinding (other than to map the signs themselves, which raises other issues). As a best effort, navigation systems read in name=* and maybe name:pronunciation=*, send it off to a text-to-speech voice for the user’s preferred language, and hope for the best.

Another generally unsolved problem is bilingual labeling. Ironically, the bilingual name=* values in Winnipeg are not optimized for actual bilingual labeling in renderers. OSM Carto just gets “lucky” because it doesn’t even try. But ideally, a user would want to see the name in both their own language and the local language(s), not only for streets but for other features as well. What they don’t really care about is whether French or English comes first on this road sign or any other sign. A renderer would do well to label “English (French)” to an English speaker, “French (English)” to a French speaker, and “Haida (French, English)” to a Haida speaker. “Haida (Franglais)” is cool but generally not as important, while “English (Frenglish)” or “French (Franglais)” would get annoying fast.

To oversimplify this massive thread, mappers south of the border would like to adopt a somewhat more predictable approach: multiple values in name=* separated by semicolons. This would allow a navigation application to announce an upcoming turn based on the part of the sign that the user will read, without reading localized names that aren’t as useful for wayfinding. It would also allow a sophisticated renderer to avoid the Frenglish. This approach already has more software support than traditional approaches.^[1] We care enough about this that we’ve made our own renderer and are happy to help other communities adapt it to their needs.

Unfortunately, the U.S. doesn’t have a critical mass of localities where the authorities are as committed to bilingualism as in Canada, so we’re looking to the Canadian community for a bit of leadership. Our rare opportunities to introduce bilingual names don’t last very long because they stand out so much from a database consistency perspective. What usually happens is that a mapper from abroad gets annoyed by the semicolon in OSM Carto and changes it to their locally preferred slash or dash or space, and then a local mapper gets annoyed by that choice and removes the non-English name altogether. Canada does have a critical mass of bilingual areas; if the Canadian community would support our desire for the semicolon as a delimiter, at least when hybrid naming isn’t possible, then we would have more of a leg to stand on.

One challenge we have in common is that many indigenous/First Nations territories and places are proudly bilingual, but a lone mapper exuberantly added indigenous names to these features throughout the continent using slashes and so far has refused to engage with other mappers about this typographical choice. As funny as Franglais can sound to the untrained ear, we might be able to change this practice more easily because it’s a completely arbitrary choice, without the historical and cultural considerations around hybrid street naming.

Except that OSM Carto still has outsized ~~influence in the Electoral College~~ mindshare among mappers. ↩︎