Multi-language/default language naming conventions

aseigo · November 6, 2024, 6:14pm

The same: Let the software handle localization, and keep internationalized data in the database.

name: should probably be seen as a deprecated fall-back field, at best.

If regions in the OSM model could also advertise the “official language set”, then software could really do the “right thing”.

For example, Biel/Bienne in Switzerland would note both French and German, while the country would advertise all four official languages. The polygon(s) representing Canada’s borders would have French and English, with overrides and augmentation for more appropriate local uses.

So in the case of New Brunswick, the “most near” civic border might be the Canadian one, and such software could then know to show both the name:en and name:fr fields on the map where they exist, unless the user overrides that and asks for it in their preferred localization.

Keep in mind that while we’re talking about English and French in here, there’s the issue of First Nations and Inuit languages as well. I’ve added official Haida names to places in Haida Gwaii (another place I’ve lived), as have others, via the name:hai tag. Over time, those have become the actual names in use, such as the shift from “Queen Charlotte City” to “Daajing Giids”.

So this is about more than just French and English, but the practical application of multilingualism.

It really sucks (at least imho) that even though there are lots of name:hai entries on the map of Haida Gwaii now, there’s not a generally accepted best practice in renderers of setting name preferences, nor is there a way that I’m aware of to advertise a location’s preferences.

Haida Gwaii maybe should default to the name:hai names where they exist, and fall back to the name: en, name:fr, or name tags where they don’t, but where do we define that in the data model? If anyone knows, please point me to it, but all I am aware of are the best practices documented per-region on the Multingual names wiki page, which naturally is not (easily, reliably) consumable by software that may wish to make "good " decisions about localization during rendering.

Going back to the issue of navigation vocalization for a moment: if you’ve ever used a navigation aid while e.g. driving and it’s in the wrong language, you’ll know how useless it becomes as the software mispronounces … well … just about everything. … and with OSM data, this is made harder due to the same shortcomings detailed above.

The data model should contain accurate data with enough metadata for the software to make use of it. We should not be thinking about it in political terms, but in practical ones, so that software renderings can present things in the most appropriate form possible in the context.

Jarek · November 6, 2024, 6:31pm

OK, but in practical terms, lots of OSM mappers currently use openstreetmap.org with carto tiles, and they see the name=* tag. So then they want to change the name=* tag to meet their aims of inclusion or multilingualism or representation. Which is what we’re seeing in practice in Winnipeg and in New Brunswick and in Iqaluit and on Grand River and probably elsewhere. So in practical terms, to me, we should decide what to do with the situation we have now, and not some ideal future where name=* could be seen as a deprecated fall-back field.

The current OSM solutions for multilingual street names name=* value, in practice, use either space (Hong Kong), dash (Brussels), or slash (Biel/Bienne). So we could adopt one of these since they have precedent. Or we could set a fourth precedent with Avenue Blah Avenue. Or we could daydream about the future when carto is no longer the elephant in the room.

Minh_Nguyen · November 6, 2024, 7:22pm

That ship sailed back when OSM API v0.6 dropped formal support for multiple values and won’t come back to port until v0.7 at the earliest. In the meantime, there can only be a single name=* tag, data consumers expect that tag to be present as a prerequisite to the localized subkeys, and there are multiple approaches to reconciling the various values that lay claim to the same key, as a matter of convention rather than as part of the database format.

One of these approaches is to just pick a language as a convention among mappers. This is how all of India has name=* in English, despite the plethora of local languages that are sometimes more prominent on the ground than English. In parts of Canada, there seem to be local cultural or historical arguments against this approach. As an outsider, I can’t really speak to these issues or second-guess them.

Another approach is hybrid naming. This is popular in regions where two local languages have opposite noun–adjective order, taking advantage of a common writing system and the tendency for the “base name” to be shared between the languages as well. In Winnipeg, the hybrid French–English names conveniently allow road users to read half the sign and ignore the other half, even if the English comes out less than fully anglicized.

(De la Cathedrale Avenue is obviously named after the cathedral, not someone with the surname De la Cathedrale. Conversely, Avenue Fifth must be a little awkward in French, but there it is on the sign. Sometimes the all-important directional quadrant gets lost in translation, as in Promenade Shady Shores Drive W.)

Not every locality is so lucky to be in a similar linguistic environment, and one can conceive of a situation in which a street has a distinct name in either language, so this is far from a general solution. Perhaps we can think of this signage practice as a kind of contraction, which is a kind of abbreviation, which we generally don’t do in OSM when we can help it. For the general case, most places with multilingual needs end up including all the names in some order, separated by some delimiter. It gets chaotic fast for any data consumer that tries to do things automatically on a global scale, or that tries to use OSM data in some unconventional manner.

Yes, this is an unsolved problem. In principle, during turn-by-turn navigation, a user unfamiliar with the area wants to hear names that match what they’ll see out the windshield, while a user familiar with the area may prefer names in their own language, but they’ll get along fine with what they see out the windshield too. We don’t have a good mechanism for explicitly indicating the name that’s useful for wayfinding (other than to map the signs themselves, which raises other issues). As a best effort, navigation systems read in name=* and maybe name:pronunciation=*, send it off to a text-to-speech voice for the user’s preferred language, and hope for the best.

Another generally unsolved problem is bilingual labeling. Ironically, the bilingual name=* values in Winnipeg are not optimized for actual bilingual labeling in renderers. OSM Carto just gets “lucky” because it doesn’t even try. But ideally, a user would want to see the name in both their own language and the local language(s), not only for streets but for other features as well. What they don’t really care about is whether French or English comes first on this road sign or any other sign. A renderer would do well to label “English (French)” to an English speaker, “French (English)” to a French speaker, and “Haida (French, English)” to a Haida speaker. “Haida (Franglais)” is cool but generally not as important, while “English (Frenglish)” or “French (Franglais)” would get annoying fast.

To oversimplify this massive thread, mappers south of the border would like to adopt a somewhat more predictable approach: multiple values in name=* separated by semicolons. This would allow a navigation application to announce an upcoming turn based on the part of the sign that the user will read, without reading localized names that aren’t as useful for wayfinding. It would also allow a sophisticated renderer to avoid the Frenglish. This approach already has more software support than traditional approaches.^[1] We care enough about this that we’ve made our own renderer and are happy to help other communities adapt it to their needs.

Unfortunately, the U.S. doesn’t have a critical mass of localities where the authorities are as committed to bilingualism as in Canada, so we’re looking to the Canadian community for a bit of leadership. Our rare opportunities to introduce bilingual names don’t last very long because they stand out so much from a database consistency perspective. What usually happens is that a mapper from abroad gets annoyed by the semicolon in OSM Carto and changes it to their locally preferred slash or dash or space, and then a local mapper gets annoyed by that choice and removes the non-English name altogether. Canada does have a critical mass of bilingual areas; if the Canadian community would support our desire for the semicolon as a delimiter, at least when hybrid naming isn’t possible, then we would have more of a leg to stand on.

One challenge we have in common is that many indigenous/First Nations territories and places are proudly bilingual, but a lone mapper exuberantly added indigenous names to these features throughout the continent using slashes and so far has refused to engage with other mappers about this typographical choice. As funny as Franglais can sound to the untrained ear, we might be able to change this practice more easily because it’s a completely arbitrary choice, without the historical and cultural considerations around hybrid street naming.

Except that OSM Carto still has outsized ~~influence in the Electoral College~~ mindshare among mappers. ↩︎

Jarek · November 6, 2024, 7:42pm

I do think that having a Canadian-specific OSM map (e.g. at openstreetmap.ca or similar) would be a great idea, and could help many of our current tagging dilemmas, including the one being discussed here.

But I also realize it would be a great deal of work, especially considering the relatively small size of our Canadian OSM community; and on the other hand, having a non-Canadian group run it for us doesn’t seem like a great idea either.

Minh_Nguyen · November 6, 2024, 8:40pm

Oh yes, it would be something that one of you would “run”. But to be clear, this “renderer” is really just a fancy webpage built around a stylesheet. For certain classes of customizations, you might not need to maintain anything more than a few lines of JavaScript code in a GitHub repository that gets published to GitHub Pages. We can go over that in more detail in a separate thread, if you’re interested. Meanwhile, I think one positive outcome of this thread would be to come to an agreement about what to do when the names can’t be hybridized as in Winnipeg.

hoserab · November 7, 2024, 12:02am

Okay, I get your point, I really do, but why I keep harping on this is:

That’s no bueno. That’s not an acceptable compromise. You acknowledge that…

… but you don’t want to acknowledge that name=* is the database entry which is used to represent the primary name of things, and so to put the English name in there and exclude the French name from that field gives the English name primacy. Flawed as that data model may be, it is what it is. Your solution makes the English name more preeminent, more important, and that is contrary to the legal status, to the on-the-ground status, and is something of a slap in the face to the francophones living there.

So, again, knowing that name=Niverville Avenue isn’t a workable solution, what would you propose instead? name=Avenue Niverville;Niverville Avenue?

aseigo · November 7, 2024, 7:21am

I’ve responded to each reason presented for the status quo, and offered reasons why they are spurious. It would be nice to have a defensible reason for having non-extant names in the name= tag fields that goes beyond “it is how it was done”.

Declaring it “no bueno” is not an argument, it’s a refusal to engage in the form of ultimatum. If there is not a solid reason for how it is being done, that ought to be a signal that it can be improved.

In fact, it has been noted that the current approach has multiple drawbacks and shortcomings. “No bueno” is not a justification for living with them.

Can we discuss solutions instead of making declarations?

I specifically addressed (excuse the pun?) that already, actually!

name= is insufficient for multilingual situations, and there are well-known, proven methods for localization.

Which means OSM could indeed stick to the name=-is-canonical mistake and remain sub-par, or we could discuss how to improve the situation.

OSM data is not bound in such a fashion to Canadian Law
This does not address e.g. First Nations languages
“More important” is a value judgment being made on a dataset which is open to definition. “name=” isn’t a law of the universe, it’s a schema choice.
It ignores the fact that bodging in names like this makes features such as routing unnecessarily poor (do you use multi-lingual routing? I do, and it’s pretty apparent when the data or the software is “doing it wrong”). So implicit in your “no pre-eminance” argument is an argument for a worse dataset, something you seem intent on not addressing.

Sorry, but I don’t know that.

You are asserting that is the case, but have yet to offer a convincing reason. The reasons offered are treating “name=” as if it were physical, static signage, which it is not. It is a dataset that only has any meaning or presence once parsed and rendered.

Let’s not confuse data schema with presentation.

Of the poor choices that are available in abusing a single field for multiple names, I agree that using the semi-colon for a separator is the least worst. It follows what house numbers do, and is a common separator in the dataset in general.

This is indeed also better than what is there, as it at least acknowledges the actual names rather than pretend “Avenue Niverville Avenue” is an extant name, which it is not.

It is, however, still an answer with multiple outstanding problems. None of which you appear willing to consider. I look at this and think:

How does that show up in renderers, both visual and audio?

Why is the French name first, doesn’t that grant it “primacy”? How do we choose order, since appearance in the dataset apparently matters?

How do we enforce this in the schema so that there isn’t drift over time, or should policies change in e.g. presentation we can “flip a switch” and adapt the data to it?

How does this extend to additional languages, such as the First Nations language issue? Do we just tack on another one? Where does it go in the list? What is even the point of the name:<language code> fields in that case?

What is becoming clearer to me in this conversation is that schema dogmatism in the form of e.g. “name= has political implications and consequences” is a blocker for some, to the detriment of the quality of data in the OSM database.

This is perhaps an issue that should be taken up in a more global context within the OSM community, with a focus on localization, usability, and renderer appropriateness rather than regional language politics.

I’ll look for a more appropriate place for this discussion, and revisit the issue of street names in Canadian cities should a workable technical solution be agreed upon.

edit: I keep forgetting to include this in my replies, and instead of spamming this thread even harder I’ll just include it here: “In some regions that have multiple official languages, guidance instructions will include street names and destinations in multiple languages, which is verbose and usually undesirable.” This is from https://wiki.openstreetmap.org/wiki/Multilingual_names … it then goes on to note that the name:<language-code> tags are (part of) the solution (with renderer support being the other part).

This leaves the issue of “default rendering preference” for multi-lingual locations open, for which I am drafting a proposed improvement for comment by the OSM community.

Jarek · November 7, 2024, 2:59pm

We can do both.

I think we need a community consensus for format to use in name now - because it’s still used, and because empirically mappers find it important right now and put various inoptimal formats there (which is why the thread was created in the first place!). I don’t think putting our heads in the sand and saying “well they shouldn’t do that” when seeing “Rue York Street” is a workable solution.

I think we should also discuss how to improve this, which might well be “use name:<lang> and create better tooling”, which will take time, which is fine.

Look, I’m sorry to keep banging on about New Brunswick, but it is a practically (as well as legally) bilingual region. You might be able to say that English should have primary status in St. Boniface because Winnipeg and Manitoba is majority English-speaking, but I don’t see how you can assert that one language should be primary over the other in New Brunswick. So we need an agreed solution for name for bilingual names now – while we don’t have tooling that would let us only use name:<lang>. So I keep on asking - what is your best solution for New Brunswick now? Semicolon? With which language first?

Are there, practically, any locations in Canada with commonly-used names in three or more languages?

Minh_Nguyen · November 7, 2024, 3:14pm

I guess that depends on one’s perspective:

Jarek · November 7, 2024, 3:24pm

Do you believe 唐人街 (“Chinatown”) is the Chinese street name for that part of Somerset Street, rather than perhaps for the place=neighbourhood?

Based on its Wikidata I would also suggest that the neighbourhood itself (Ottawa Chinatown) doesn’t have a commonly-used French name. Montreal’s does, though there the English isn’t official, and I would hesitate before attempting to label that neighbourhood with three languages in name…

Minh_Nguyen · November 7, 2024, 3:50pm

Haha, I should’ve chosen a better example. My point is indeed that we’re focusing too much on street names and addresses and not enough on other kinds of features: places, buildings, POIs, rivers. (Chinese street name signs do occur in other places, such as Calgary’s Chinatown, but I’ve refrained from including them in name=* because I’m not familiar enough with the area to determine whether it’s functional or decorative.)

Plenty of data consumers already know what to do with semicolons in names. If we’re concerned that the legion of diehard fans of OSM Carto would summarily undo semicolons en masse in spite of a community consensus here, then rest assured that at least Nominatim, GraphHopper, and Valhalla all handle semicolons gracefully – all three data consumers are featured on osm.org alongside OSM Carto.

phodgkin · November 7, 2024, 4:06pm

There’s no policy that OSM Carto won’t ever reformat semi-colons. Sure, it’s hard work to make changes. More can be added to the discussion here.

Minh_Nguyen · November 7, 2024, 4:15pm

Right, the concern expressed earlier was that changes to the data are somehow blocked on OSM Carto, whereas the developers of that renderer usually say they expect the opposite, in order to avoid forcing a tagging solution on mappers.

Jarek · November 7, 2024, 4:23pm

Can you point to some examples where a local community accepted an inoptimal rendering in carto in order to change tagging, and was able to maintain this tagging for an extended period of time and carto then made changes to support the new tagging?

Or is this an “it’s possible in theory so you should do that” argument?

If a new scheme was to be accepted, who will volunteer to patrol street names in Canada to make sure they are using this new scheme?

Or, who will volunteer to operate and maintain a non-carto map website for Canada?

Minh_Nguyen · November 7, 2024, 4:42pm

Despite having filed that OSM Carto feature request, I’m not holding my breath for it to make any changes regarding name labeling, for various reasons. Life’s too short.

One of the participants in this thread was eager to change all the street names in Winnipeg to English based on their perception of the output of a navigation application (unsure which one), not even OSM Carto. Let’s say they had done so across New Brunswick instead – would you not have noticed and reverted it or escalated it to the forum? Granted, making a sweeping change in the first place would require some effort. But it’s premature to even discuss this before there’s agreement on what the format should be ideally.

If you’re open to using OSM Americana en français as a starting point, then someone just needs to bonk the Fork button on this repository and configure a free GitHub Pages subdomain. This takes about 10 minutes for the basics, before considering other things from the community’s wishlist. I’d be happy to walk anyone through the process.

Jarek · November 7, 2024, 5:33pm

I will give it a shot, for the sake of doing something that might help here. (I hope you don’t mind if I rename the fork…) I do appreciate Americana for roads - maybe it can help us avoid the problem of whether ref is ON 8 or 8.

I’m not sure its current state, like not rendering any footways or paths, makes it a decent general-purpose map replacement for cities, where footbridges are pretty important parts of the map. But I’ll try.

Minh_Nguyen · November 7, 2024, 5:55pm

I appreciate your willingness to work constructively toward a solution. You can name it anything you like.

OSM Americana itself is indeed a work in progress. (Is there any other kind?) The project also welcomes contributions regardless of where you’re from. That could be another option if you’re worried about taking on long-term responsibility for a fork. We may have a bias toward a certain visual presentation, but our two countries aren’t very dissimilar in terms of cartographic conventions anyways.

phodgkin · November 7, 2024, 6:17pm

Sure, there’s a huge chicken and egg problem with proposing new tagging / a different interpretation of tags. Carto has too many conflicting goals to ever be popular - I’m not sure it has fans!

It tries to be a world-wide style, but it’s impossible for an individual deployment of the style to suit everywhere, and hence it pleases nobody.

But it would be pretty trivial to adjust the handling of name on import and deploy a regionally-tweaked fork of Carto. This might even be possible on GitHub pages?

That said, OSM Americana might be a better starting point stylistically. Vector styles will always have an advantage for multi-lingual support.

SomeoneElse · November 7, 2024, 6:52pm

If a map developer has chosen that value over name:xx tags (where xx is chosen based on the user’s language or based on the location of the POI) that’s a choice they’ve made. In multiligual areas, name is basically just “some sort of label” in a format that local mappers have chosen: “-” as a separator in Belgium, “/” elsewhere and “;” in other places.

It’d be nice if we could change OSM from having “just one, assumed to be the most important, name field”, but realistically that’s never going to happen. Any developer who wants a sensible name in a particular language should use name:xx fields first, and then fall back to name if that is not available. For some map technologies that might be difficult, but it doesn’t stop it from being a goal.

Jarek · November 7, 2024, 7:14pm

The fork is now live at OpenStreetMap Canadiana with a few quick changes: recentring on Manitoba, changing the HTML <title>, and adding some indigenous languages to the language selector UI.

I will endeavour to keep it reasonably up to date with Americana for at least a few months. Please don’t share the URL widely as we would want to change the URL if it turns out to be useful. I will probably start a separate thread here for any feedback or change requests.

It’s looking good!

Now we can decide what format(s) to use for name for data users that don’t support name:<lang>

Indeed, for starting a new-ish project in 2024 it seems obvious to start with vector styles.