Multiple delimited names in the name tag

To be clear, the focus of this discussion is the name key, which is not in a single language worldwide. This discussion arose because even a map that uses name:* keys can achieve an extra bit of sophistication by including the name(s) in the local language(s).

If we suppose that a renderer is already showing the name in one or more of the user’s preferred languages, all that’s left is to show any remaining names in the local languages. Maybe the renderer can pull in some external comprehensive source of what’s spoken or signposted in every locality, but a very tempting simpler alternative is to just use name. What a renderer can do with name depends on how its values are delimited:

  • If the values are separated by an arbitrary, human-readable delimiter, then it can only display the entire name tag verbatim, potentially repeating names that have already been listed.
  • If the values are separated by a predictable semicolon, then it can display only the names that haven’t been listed yet. It can also apply some nice punctuation or spacing between the names.

Note that names may overlap between unrelated languages. It’s “Bolzano” to both Italian and English, so an English speaker would want the Italian name to be omitted. No matter how important Italian is locally, its name for the city is redundant to the English name. At this point, we don’t even need to know that it’s Italian, just that this seven-character-long name has already appeared earlier in the label.

I would caution against relying too heavily on user preferences. Let’s consider your ideal desired behavior again:

Is there an example of an online map (doesn’t have to be OSM-based) that implements such complex fallbacks dynamically based on user preferences alone? What does the form look like to specify your preferences? Most users would not want to fill out a questionnaire just to see a map.

In reality, some language fallbacks are handled automatically for the user based on a very simple user preference. OSM Americana currently implements the ICU locale fallback algorithm, which is the bare minimum needed to make browser preferences align with the data in the map tiles. Users don’t need to explicitly add en as a fallback to en-US because Americana knows to strip off the region code.

If you speak a language like Serbian, you’d probably appreciate the more nuanced fallbacks in the CLDR language matching algorithm or MediaWiki’s homegrown alternative:

Typical UIs use English as a last resort fallback, but for maps it would be better to use the local language, hence the focus on name. Maybe the behavior could vary depending on the country you’re looking at. This crosses the boundary into what needs to be implemented server-side, where there’s less ability to respond to dynamic user preferences. But you know, OSM Americana is easy to fork – a Croatia-focused style can afford to hard-code some assumptions about its users’ language skills. Americana tries to avoid making assumptions because the U.S. is such a multilingual country.

The most sophisticated fallback strategies are difficult to implement, but internationalization is never an all-or-nothing affair. For a renderer unable to implement language identification and language-aware transliteration, presenting name is not a terrible alternative. The only catch is that it can contain multiple names separated by one of several punctuation characters that can reasonably appear inside a name too.

1 Like

That is outstanding, Minh, thank you. The fallback chains diagram is most informative.

I think what might be happening at a LOT of “levels” of this discussion simultaneously is questioning where (or there being multiple questions of where) the fallback decisions happen. People have thought about this way more than me, for sure.

One thing that hasn’t been discussed is how much what OSM does is “seen as more-standard” behavior. In some sense, what we say and do now could seriously influence how things go forward. (Like that isn’t true a fair bit already).

Minh really nails it with fork-ability and how something “already somewhat like what YOU might like” isn’t terribly far away or impossible. It’s a “use your words,” (spec it out) and implement chain.

I love what I’m seeing here: excellent words (and even diagrams).

  • If the values are separated by an arbitrary, human-readable delimiter, then it can only display the entire name tag verbatim, potentially repeating names that have already been listed.
  • If the values are separated by a predictable semicolon, then it can display only the names that haven’t been listed yet. It can also apply some nice punctuation or spacing between the names.

currently, the separators “ / “ or “ - “ are used, that’s not any possible arbitrary delimiter but just 2 alternatives

Note that names may overlap between unrelated languages. It’s “Bolzano” to both Italian and English, so an English speaker would want the Italian name to be omitted.

a simple check for substring included would make it, it doesn’t matter to the English reader if the “Bolzano” she sees is meant to be Italian or English, it is the same.

No matter how important Italian is locally, its name for the city is redundant to the English name. At this point, we don’t even need to know that it’s Italian, just that this seven-character-long name has already appeared earlier in the label.

exactly, and hence we don’t want to repeat it. We do not really need to switch from “ - “ to semicolons to omit the localized string if it is already contained in the local label.

Here I would be careful, as this might be part of real names (depending how the mappers wrote it).

3 Likes

Thats why it is important to not confuse the map render with extra characters only used to create a list of names. Individual translations need to be in thier respective name:LANG=values so the render can determine how to display the correct place names.

If only it were so simple. We’ve already established that both the slash and the hyphen are ambiguous because they aren’t only used as delimiters. Moreover, in some places like Morocco, Hong Kong, and Jerusalem, the delimiter is just a space. Am I expected to replace the semicolon with a space when mapping a bilingual Chinese–English POI elsewhere in the world?

To reiterate, a substring match would be too naïve, especially if a mere space can be a delimiter between names. “Milan (o)”? “Habana (La)”?

But let’s suppose we ignore the pesky African and Asian languages and cater to only the European languages’ delimiters. What is the order of precedence for these features, which are just the tip of the iceberg?

Of course name:* tags are important. However, this discussion presupposes that there are situations where, despite these name:* tags, there’s still a need to place multiple names in name, due to multilingualism, a geopolitical dispute, or some other intentional ambiguity. I agree that delimiters can be confusing, which is why I’m advocating for the one delimiter that causes the least confusion.

It’s not as if this is an unsolved problem. The semicolon already works in a lot of software. However, it doesn’t look pretty in openstreetmap-carto, so mappers are incentivized to preserve the status quo from more than a decade ago when openstreetmap-carto’s labeling represented the state of the art.

5 Likes

Unfortunately until OSM Carto shows that it is capable of working with OSM data as it is now and not how ir was 10 years ago, it isn’t going to be able to be part of a sensible discussion going forward. for the most part It’s not a technical restriction, it’s essentially a social one within the project.

3 Likes

Precisely. Let us (OSM’s “good conscience” going forward) be the sensible discussion. Let Carto be Carto, which might “pick up the ball and run with it” eventually, but for now appears to be a blot on wider “social understanding” of improving this relatively minor, quite solvable problem in our data.

If a single (and the “most popular,” even as it is a sort of “front door” to OSM data) renderer is the source of confusion or dislike of the current toolchain, let’s thank @SomeoneElse for pointing this out, encourage semicolon to be the delimiter of choice (perhaps so it eventually becomes “the standardized” delimiter), know that some are going to “hold their nose” as they see this (incorrectly?) render in Carto, and move ahead with much better data in our map. Renderers will catch up with well-defined / well-structured data. Or, at least, they should.

Tag. Tag well. And (does it need to be said again?!), to the extent you can, don’t tag for renderers!

4 Likes

Yes. In AU and NZ we have dual named geographic features that have “/” as part of the name.

What is the order of precedence for these features, which are just the tip of the iceberg?

are we sure this is correctly tagged? Seems like two English and French names in the name tag, eventually one has to go in alt_name. Why is English and French “split”? How can it be solved with a semicolon?

with semicolon, how would you do it here?

also how is this written with semicolons?

The region has three officially sanctioned names: name_it, name_de and name_lld. Lld is too minor a minority, so dropped. So concatenation of name_it;name_de remains.

Fact is more complicated: First, in this case, if user agent is in English locale, name_en is neither name_it nor name_de but a mix of both (it just copies name), so deduplication will fail. Perhaps, because the region actually does not have a proper English name? Much like Bolzano/Bozen, where the value in name_en only says, that in the US/UK/? the city commonly gets referred to by its Italian name, instead of its German name? Unlike e.g. Munich or Vienna, that have an original English name.

Second problem, if the user agent is in e.g. French locale, how to construct an unbiased name from name_it;name_de, and how to know, that it should be based on it/de?

These would be Dover - Calais;Douvres - Calais and Trentino-Alto Adige;Trentino-Südtirol, respectively, but I’m not necessarily advocating for the use of a semicolon in these cases. Rather, I’m pointing out that the absence of semicolons here makes semicolons necessary in other situations.

These are examples of customary combinations familiar to a local community that can’t easily be derived by concatenating name:en with name:fr or name:it with name:de. As we discussed earlier, name is a fine key for such shorthand, which inevitably includes dashes and slashes. But as long as name is used for this purpose, then the other purpose of displaying a rote list of names must use a different delimiter if a data consumer is to recognize it as a list. Note that ‪Trentino-Alto Adige/Südtirol‬ is one of the regions that, according to the wiki, ostensibly uses a dash as the delimiter between two languages’ names, but here we can see the reality is not as simple.

Given that that’s a ferry route between England and France, perhaps use “Dover” (English) for the English end and Calais (French) for the French end? :slight_smile: **

** with apologies to whoever wrote that gag for Spitting Image in the 1980s

This wasn’t exactly my point. Delimiter like “/” or “-” are common in several areas as part of a single lingual name. So to ask a software to use those as a splitting delimiter is not possible without a lot of mistakes. So definitely if multiple names have to be listed in equal priority (due to whatever reason local mappers have) in the data there is a need for another delimiter.
In OSM the most common one is “;” which additional seems to be uncritical for real-world names.

Kind of listed values in the name-tag for equal important names of an object seems to be consensus based on actual usage in such areas. So I think this is nothing we need to talk about. But whats needs to be discussed is how to make this listed names machine-readable.

2 Likes

This prompts me to spell out my intermediate summary of this topic: It is not about displaying names, but it is about labelling stuff in a way to make map users feel at home.

Which directly would lead me to propose a new tag name_label, which is a format string, eg. in the case of Bolzano/Bozen this one might work out fine:

<name_user-agent-locale><consumer-delimiter-global-start><name_it>?\consumer-delimiter-local><?name_de>?<consumer-delimiter-end>

where <> marks place holders and the question mark meant, only shown when name_user-agent-locale=name_it|de, rsp. (in case of hyphen) not shown, if there is a match.

I wrote “labeling” as shorthand, because openstreetmap-carto happens to only label features’ names, which is quite reasonable. It just happens to consume name verbatim, which is unfortunate. Other renderers, and indeed other kinds of data consumers, may have reasons to label things based on other name keys or other non-name-related keys.

For example, many navigation applications ignore a motorway’s name entirely in favor of a route number to avoid longwinded instructions. Or consider that a map may want to append some of its own explanatory text to a road label as an alternative to introducing yet another confusing color or dash pattern for roads. Obviously, glosses such as “(under construction)” and “(closed)” shouldn’t be hard-coded in any name tag in OSM.

Due to this diversity, I don’t think mappers have enough context to explicitly tag what a data consumer should display, only enough to tag what is true about the feature. One of those facts can be the feature’s “native name(s)”.

If I understand your proposal correctly, this name_label tag would clarify what’s in each part of name. This is not far from the language_format key that @imagico informally proposed back in 2017. However, I don’t think either that simpler syntax or your more complex syntax would be worth the effort for just allowing a renderer to append a native name onto a localized name without duplication. Christoph was trying to solve other problems at the same time, such as language-aware font selection.

Any kind of name metadata key would run into the same problem that no one would want to repeat this information on every individual feature, yet regional defaults are both unenforceable (as we’ve seen in South Tyrol) and impractical for data consumers.

In my opinion, the only suggestion so far that adheres to the KISS principle is to adopt the semicolon delimiter more broadly. But even that is beyond my ambitions: I would just like mappers to be able to use the semicolon without fearing criticism for making the map look ugly. This incremental improvement wouldn’t in any way preclude a more comprehensive solution in the future.

7 Likes

great you are not advocating for the semicolon in this case, we already do have the individual language names for these and the term “Trentino-Alto Adige/Südtirol” is a common form for the region, it is even part of the Italian constitution, Art. 116 La Costituzione - Articolo 116 | Senato della Repubblica

La Regione Trentino-Alto Adige/Südtirol è costituita dalle Province autonome di Trento e di Bolzano.

note how the region name is “bilingual” while the provinces are named only in Italian (in German it would be Trient and Bozen).

So then we’re in agreement that the vast majority of multilingual name concatenations, which are not specified as such in a constitution, do not strictly need to be invented by mappers using the same punctuation?

Then it sounds like it’s just the local name, rather than multiple delimited names in one field.

In that case, shouldn’t it be the exact same string for the name:it Italian forms?

I am not mapping in an area with such complications, (so I do not personally bother if “the community” decides to do it one way or the other) but I know from discussions that there can be a lot of tension around language and names in these areas, and that the status quo is the result of years of discussion, so I would rather not touch it.

no, as I wrote, it is “a common form”, not the only one, there is a purely Italian version that doesn’t have any “Südtirol” in it (the Italian alphabet doesn’t have an ü)