Multiple delimited names in the name tag

Yes, My problem is that it done by basically concatenating two strings together. Instead established a use case where the map render is able to display both names correctly while Nominatim knows to treat each as equally valid names for a particular location.

1 Like

The issue I see with the user’s preferred language is how you decide what that is. If I am travelling in Europe I need to see the name on the signs, not the name in English. It is confusing enough when the signs only say Rijsel.

Then a regular problem, Brussels. In my experience the separated names as they appear on the signposts. If my phone had to choose which one to display, which would it choose? In my case French is my next language but I am sure that is not true for all native English speaking people.

2 Likes

You may find this post interesting. It explains how we are showing both translated and local names in the Americana map style. In short we get the translated name from a name:lang tag, or wikidata, but then also display the local name from the name tag.

1 Like

It is all nice and dandy, to show a translation/foreign name instead of a local name in big letters and the local name in small print, if there is only one local name (in the name tag - not speaking of local_name) and the user agent is in neither locale. It fails to deduplicate in case, when the user agent is on one of the locales that are string-concatenated with ad hoc delimiters to an OSM “name”. Vector maps clearly are in a disadvantage over tiles servers here, that do not try to accommodate their users in such a way.

Why? I mean, local government must have already made a choice, did they not? There are street signs on the ground, right? And on them is some text in some order - that is the text that IMHO should go in name=* tag. It solves the issue (well, shifts the blame at least :smile:) to on-the-ground situation. If the city table:

  • says only “Brussel”, then it will be name=Brussel
  • says “Bruxelles - Brussel”, then it should be name=Bruxelles - Brussel
  • says “Brussel - Bruxelles”, then it should be name=Brussel - Bruxelles
  • says “Brussel” in one line and then below it “Bruxelles” then it should be name=Brussel\nBruxelles (ok mostly kidding about this last one - we should not use newline character in name, but replace it with some common alternative like name=Brussel / Bruxelles)

But it won’t solve Carto rendeder on osm.org (or other TMS services problem). To really solve the problem, one requires vector tiles and support for user preferences, and in that case, one should probably ignore name=* tag in most cases (i.e. it should be used only as last-resort fallback if no other *name:* option exists at all)

(but one should also take care of local community preferences, i.e. https://wiki.openstreetmap.org/wiki/Multilingual_names)

1 Like

If only it was that simple. In Ireland, in both Derry/Londonderry and Dingle/An Daingean, there are 3 names in 2 languages, and all are to some extent in use. In Ukraine, some streets that might have officially been called one thing a few months ago are now called something else following the Russian invasion. In Italy, in some places there have been OSM name disputes about “how official” some signposts are. I could go on; there are plenty of other examples.

2 Likes

I think this is taking signs too literally. Can a or :fleur_de_lis: be a delimiter?

In the bilingual places I’m familiar with, a delimiter between the names in two languages is just a matter of typographical style or signmakers’ convenience. You’ll encounter inconsistency from one sign to another. If you call up the local government and ask which one is official, they’ll roll their eyes at you and hang up. They’ve got better things to worry about.

We should distinguish between the raw data in OSM and how a data consumer will output something to end users. Indeed, the OSM data model doesn’t support newline characters, and newlines would be problematic inline (when displaying a fully-qualified location name inside a list of search results, or in the title of a webpage), but in many cases, there’s nothing inherently wrong about a renderer displaying the multiple names on different lines if there’s enough space.

There are indeed places where the delimiter is specified officially. In New Zealand, a slash officially and commonly separates the English and Māori names of places such as Aoraki / Mount Cook. But arguably the full name is now “Aoraki / Mount Cook”; many of the name:* tags on this node include a slash.

I think this discussion is about less formal cases where people continue to use one name or the other and the delimiter is not something to get hung up on. Why not let the data consumer figure out the appropriate delimiter based on the local language or an intuitive one based on the user’s preferred language? (After all, in some bilingual regions like Hong Kong, the two languages use different Unicode characters for a slash and different characters for a dash.)

Meanwhile, there’s the use case mentioned above, which I think is underappreciated. If a map intentionally displays the preferred-language name followed by the local-language names, for the benefit of the user, then it needs to avoid repeating a name that both the preferred language and one of the local languages happens to share. This is difficult and unreliable if the delimiter is unpredictable.

3 Likes

Well, I would still use what is on the street signs on the ground for name=* (ignoring clear vandalism like broken / defaced parts of signs). Other names go in other tags (name:en_GB, name:en_IE, name:ga, int_name, nat_name, reg_name, loc_name, alt_name, …)

In Italy, in some places there have been OSM name disputes about “how official” some signposts are. I could go on; there are plenty of other examples.

Well, those signposts are still on the ground, so they are what the user on the ground will see, and should IMHO thus be mapped as name=*, even if they are perhaps not worthy of official_name=* tag.

But also note that I’ve linked to https://wiki.openstreetmap.org/wiki/Multilingual_names which has specific sections for each local country consensus which should always take priority if it exists (for example, in Croatia we’ve decided to use exclusively Croatian names in name=*, and add name:it=* or name:sr=* etc. - even for areas which have dual-lingual signage), regardless of what global OSM consensus (or lack thereof) might be.

In Ukraine, some streets that might have officially been called one thing a few months ago are now called something else following the Russian invasion.

That’s one of the reasons why we do not edit there currently. See Russian–Ukrainian war - OpenStreetMap Wiki.

Also see OSMF Disputed Territories Information.pdf which says that we should generally use “on the ground” name=* (or at least that is how I read it):

Names
The OpenStreetMap community operates under the “on the ground” principle. If a name appears
on the ground, for example a street sign, then that is the preferred name to use since a
navigation system that does not use the same names as those that are signposted is just clearly
impractical. This is recorded as a “name” in our database and is the one generally used on our
main example map.

However, we recognise that different national, ethnic, culture or language groups may utilise a
different name. We accommodate this by providing the facility for contributors belonging to
different groups to record what they see as the name for a particular feature. For example,
“name:en” records the name used by the English­ speaking community and “name:es” that
employed by the Spanish­ speaking community. The Spanish community can then make a map
using “name:es” in preference to the generic “name” or “name:en”.

We encourage different groups and communities to use our data, collect and contribute data
important to you, and to make your own maps that are harmonious with your general usage,
culture and legal system.

I’ve hinted at it before, but to be clear – there will always be people unhappy with what name is displayed by default (unless you display all of them – and then there still be even more people unhappy why are you cluttering the map, or why are you using one order and not the other, or why are you displaying aggressor names by default at all in attacked country etc).

The best we can likely do is to finally implement vector tiles on osm.org (and hope other apps will do that too), and then have user preferences which names and scripts ( name:hr? name:en? name:zh? name:uk? name:ru? loc_name? offical_name?) the user prefers to see (and in which order if they prefer to see them, if there are multiple of them).

And even then, I foresee that the users won’t be completely happy, as ideal name displaying combination is:

quite complex set of preferences

Example of ideal name displaying preferences for me

  • if inside Croatia, prefer name / loc_name / official_name / alt_name. Do not show “/” if one of them is missing (or skip it if same as the others). Fallback to name:hr if both are missing. Fallback to name if still no name.
  • if not inside Croatia, show first of those name:hr, name:bs, name:sr-Latn, name:si, name:en. If none of those exist, take first of int_name / name:sr / official_name / name / loc_name / name:uk / name:ru, and transliterate it to Latin script if it is using non-latin script.

Having map allowing me to set preference of that level of complexity, would probably make me mostly happy with displayed names (although I’m quite sure I’d found quite a few tweaks to that if implemented).

I do not see any reasonable combination of checkboxes that would allow me to express those preferences; instead it would likely need to be some small snippet of personalized code.

Or (more likely in osm.org map case) be extremely simplified (e.g. just an ordered list of preferred languages: “hr”, “bs”, “si”, “sr-Latn”, “en”) - which if implemented would require automated mass-edit to duplicate all name to name:hr if missing in Croatia, for map to be usable to average Croatian at least somewhat better than current situation).

1 Like

In theory (but so far only in theory), default_language would obviate the need for explictly tagging name:* in the local language. In reality, OpenMapTiles and Mapbox Streets backfill translated names from Wikidata labels, papering over the lack of a name:* tag for the local language. When your language doesn’t have its own name for something, Wikidata gladly accepts a transliteration as your language’s label for it. This probably isn’t a solution for osm.org, which is naturally more focused on the OSM database as it is.

2 Likes

But also note that I’ve linked to Multilingual names - OpenStreetMap Wiki which has specific sections for each local country consensus which should always take priority if it exists (for example, in Croatia we’ve decided to use exclusively Croatian names in name=*, and add name:it=* or name:sr=* etc. - even for areas which have dual-lingual signage), regardless of what global OSM consensus (or lack thereof) might be.

this is the “majority” approach, in case of multiple signed names it is IMHO a very poor representation because it neglects the influence of other local (“minority”) languages which is big enough to make it on street signs.

IMHO we should try to represent minority language names with the prominence corresponding to their local relevance.

You wrote: “local country consensus” but I believe it must be “local consensus”, “country” is not the correct scale for this, and the fact that you also write about “areas which have dual-lingual signage” (implying some have it and others not) leads me to believe it should be the people in those areas to decide individually for every place.

5 Likes

In general I agree with you, but in case of multi-lingual signs it’s nit that easy.
The delimiter might be different on different signs or might be replaced with something like a picture or logo.

There are be three options to add it in our Database.

  1. Mapper defines delimiter (currently used)
  2. mapping multiple names with a defined delimiter, the data-user can define the delimiter based on their users cultural background/preferences or other reason or local mappers could add a local_delimiter=* to enable data-users to display the name as it’s common locally.
  3. mapping no name, instead only name:<lang> and add a separate default_language=* with a ;-separated list of local languages to be used.

2nd I would prefer above 3rd, as it won’t break anything, just some maps getting a bit more ugly until they adjusted.

1 Like

In the case of Derry/Londonderry, as discussed in the previously linked wikipedia article, it varies - depending on when the sign was put up, who put up the sign, and what aspect of the place the sign was about (e.g. council, airport, something else).

In the case of Dingle/An Daingean, the dispute was twofold - it’s officially in a Gaeltacht so names should be Irish only, but some people locally are keen for the well-known-to-tourists English name “Dingle” to also appear; and also as I understand it** the locally preferred Irish name was An Daingean rather than Daingean Uí Chúis.

That’s why it’s useful to have a name that is a “genuine multilingual name” for use by people when people really do use that as a name (people really might use the name “Dingle An Daingean” in speech). It’s only possible to do this if people don’t use name as a dumping ground for “all possible names”. As an example of that, https://www.openstreetmap.org/node/21395759/history is called Aberteifi in one language and Cardigan in another; I’ve never heard it called both as one name*** - but if you look here you’ll see I’ve avoided that errant slash.

Given the worldwide usage of “name” as “all possible names stuffed into one field” as opposed to genuine bilingual names I suspect that there’s no longer any chance to have “name” as the field for “genuine multilingual names”; maybe we need another one?

** this was mostly from speaking to people in pubs. It was not a representative survey :slight_smile:
*** A search for both words finds lot of articles where both proper names appear, but none as part of one single name.

1 Like

I’m just stating as it is (this was just the example I’m most familiar with; there are many more examples at the wiki page I linked). I personally would follow local consensus (even if I personally might not think it is the best solution), and lacking that guidance fall back to “everything as written on the ground street sign, replacing newlines with ‘/’ sign”.

If we’re talking about how it would best be resolved long-term globally, I’d tend to abolish problematic name tag in multilingual places completely (to remove contention, which is unifying goal IMHO worthier than mere names!), provide vector tiles on osm.org, keep official name in official_name, and keep language specific ones in name:XX and have app/user choose their preferences (using their browser preferences as default unless overridden by cookie or whatever) and work to have picked solution widely supported in apps before breaking existing solution.

But I’m open to any other suggestions! It’s just the ones I’ve seen so far which do not take user preferences into account (for which vector tiles often seems a prerequisite) do not seem very good to me.

(Yeah, sure, I don’t read Chinese or Arab letters and I only dabble in Cyrillic, so that huge part of the osm.org map is unreadable to me and other Latin-script-only users which is certainly quite bad solution, but the same can be said for all that pesky Latin-alphabet only USA / EU which is quite likely equally unintelligible to some Russian or China citizen – and that brokenness is currently much bigger name rendering issue then mere multiname local renderings like already oververbose Schweiz / Suisse / Svizzera / Svizra – now try to imagine also having all of it also transliterated in several different alphabets - there wouldn’t be place left to show anything else on the map!)

So, as an example, mappers in Italian city of Trieste with its Slovene-speaking minorities should be setting its own multilingual policy there (and in surrounding regions), separating themselves from the majority decision of all other Italian mappers setting the policy for the rest of the Italy?

I’m not opposed to the idea, but I don’t think that is something that can be decided globally, but on case-by-case basis in that particular country community (especially as sub-country rules are likely going to be creating some headaches for data consumers, unless something like per-region default_language=* is more widely accepted), so benefits should outweigh the problems.

Communication is the key IMHO. Trying to force global opinions down the locals throats (regardless if it is planet->coutry or country->region) is unlikely to have net-positive effect. Yes, minorities are definitively a problem which is by definition oppressed by democracy majority-based-decisions (and it is not only visible in name rendering: we’ve had examples with Discourse communities/moderators, we’ve certainly have zillion examples with tagging schemas and Proposal voting process for them etc. which are all biased against minority opinions).

That’s why I tend towards solutions (like the one I suggest above) which put user preferences at the top, instead of trying to force “one size fits all” solutions for everyone (which are unlikely to fit anyone, and might likely have negative effects of their own).

Good, so perfect candidate to put that Irish-only name in official_name=*, right?

but some people locally are keen for the well-known-to-tourists English name “Dingle” to also appear

Well-known-to-tourists should be int_name=*, right?

and also as I understand it** the locally preferred Irish name was An Daingean rather than Daingean Uí Chúis.

and that one is loc_name=*, right?

Given the worldwide usage of “name” as “all possible names stuffed into one field”

yeah, I agree name tag is (ab)used mostly as “text to actually render for those renderers which are too dumb to cater to user-speficied-preferences due to technical issues, like the TMS one currently on osm.org

as opposed to genuine bilingual names I suspect that there’s no longer any chance to have “name” as the field for “genuine multilingual names”; maybe we need another one?

I though official_name=* was the one we were supposed to use for offically-mandated names (including officially-mandated multi-lingual names).

called Aberteifi in one language and Cardigan in another; I’ve never heard it called both as one name*** - but if you look here you’ll see I’ve avoided that errant slash.

Umm, if I look there, I see you avoided rendering not only errant slash, but also avoided all of the other names too except Aberteifi (so it basically renders only name:cy, and ignores name:en, name:ur and name? – was that what I was supposed to see? :smile:)

1 Like

In my opinion there have to be a global consensus about how to add multi-lingual names of same level of importance and on local level the mappers need to find an agreement whether the area has a multi-lingual name.

1 Like

I agree. Maybe earlier suggested default_language=* ?

About “same level of importance”, that never happen in reality on the ground either. Always some name will be in preferred position (i.e. first). Same as “default_language=hr - it” would indicate that hr is prefer to it in multilingual (e.g. instruct renderer to show first name:hr followed by literal - followed by name:it).

That was exactly what you were supposed to see. “name:cy” and “name:en” are both names that are used there; arguably either would be valid to pick, and I picked the welsh one. “name:ur” is just a transliteration of Cardigan, and its use in that language (at least according to a handy web search engine) is mostly as the item of clothing. That name doesn’t really belong in OSM at all - if you pitched up in the centre of town and tried to speak exclusively in Urdu, communication may be a challenge. As demonstrated above, a map that wanted to show Urdu names could of course do that via wikidata translations.

2 Likes
  1. mapping multiple names with a defined delimiter, the data-user can define the delimiter based on their users cultural background/preferences or other reason

we could have a tag to suggest a delimiter.

I apologize in advance for the length, but several blue-sky ideas have been floated so far that I think could benefit from concrete counterexamples.

Any delimiter you like

This is very meta. Pretty soon there will be a need for default_name_separator=; because no one would want to blanket a multilingual region in the same name_separator=; tag over and over again. Nothing would know how to interpret this key today; maybe in another few years’ time?

Meanwhile, the technology already exists to understand what ; means when it occurs in a name tag. If some local communities prefer not to use it for now, that’s their prerogative. But for those communities that are already quietly using a semicolon, it shouldn’t be necessary for them to explicitly indicate that they want name to work just like any other key in OSM, such as destination (which already takes multiple names in the local language).

Any order you like

This strikes me as an oversimplification of the reality on the ground. Maybe someday someone will figure out how to use default_language in regions that have a strict system of multilingual names, but some places are just more complex.

To show you where I’m coming from, here are some pretty typical examples from places I’ve visited in the U.S. I’m very curious how folks think default_language would help us determine either a standard name order or a standard separator other than the semicolon that all data consumers already handle in some fashion and some already handle elegantly.

When the City of Houston turned streets such as Turtlewood Drive and Bellaire Boulevard into “Turtlewood Drive :traffic_light: Ngụy Văn Thà” and “Đại Lộ Sàigòn :traffic_light: Bellaire Boulevard”, respectively, they just stuck the Vietnamese-language signs wherever there was enough room for an additional sign:

Some of the English signs are so faded that an English-speaking traveler may need to rely on the Vietnamese signs in some cases. A map could show them something like “Turtlewood Dr. / Ngụy Văn Thà” or “Turtlewood Drive — Ngụy Văn Thà” or “Turtlewood Dr. (Ngụy Văn Thà)” or “Turtlewood Dr.” above the street and “Ngụy Văn Thà” below it. The specific delimiter here only matters to the map style designer. The order of the names in name doesn’t matter much either, because the preferred-language name will come first in any savvy map style or navigation guidance instruction.

The city only dual-named the through streets in this neighborhood, but other things are named differently. Turn 90 degrees clockwise and you’ll see a restaurant whose name is signposted in interleaved English, Chinese, and Vietnamese above some shops that are in English only or Vietnamese only:

The San Francisco Bay Area, where I live, happens to be very linguistically diverse. Many places of worship around me offer services in multiple languages and make every effort to unify their congregations despite a language barrier. This Jehovah’s Witness Kingdom Hall serves both English and Spanish speakers on equal terms. The placement of English to the left of Spanish is purely coincidental. As far as I know, they don’t have a preferred delimiter either.

This supermarket has two signs visible from the street. The logo on the sign in the foreground puts Korean on top of English, while the sign on the façade puts English to the left of Korean:

This doctor’s office posts its English name above and to the left of its Vietnamese name, but I think he mostly serves Vietnamese-speaking patients:

And let’s not forget that sometimes a feature can have multiple names regardless of language. Before it moved earlier this year, this flag and costume store was either “Funhouse/Flaghouse” or “Flaghouse/Funhouse”, depending on whether you looked at the sign on the front or the rear. (Customers typically parked in front and entered around back.) If I recall correctly, the receipt had both names printed on it, separated by the delimiter “***”.

Any language you like

I agree that user preferences matter a lot for rendering, but showing only the user-preferred language isn’t a panacea. OSM Americana’s local-name gloss has a lot of precedent in the American map publishing industry. Here’s a page of a small world atlas I used in school. It’s designed for students in geography class, so it’s representative of more serious reference works like those by the National Geographic Society:

Rome is “Rome (Roma)” and Naples is “Naples (Napoli)”. Wherever an anglicized name matches the local name minus diacritical marks, it restores the diacritics, as in “València” for Valencia. The only novel aspect of Americana’s language support is that it automatically chooses the main language based on your individual preference instead of making you buy a separate copy from the bookstore. But otherwise it’s a conservative approach that doesn’t necessarily open the floodgates to the complicated fallback preferences suggested earlier.

There are a couple things this atlas does that Americana can’t currently do based on OSM data. It avoids repeating a name just because English and one of the local languages happens to agree on a name. Americana also can’t automatically transliterate local names into Latin script for readability. However, that’s more of a problem to solve outside of OSM, since for example English and German require different transliteration systems for the same source language.

As long as it’s not Japanese

Anything that concatenates two arbitrary languages’ names will run into situations like this. Perhaps the most complicated example is Japanese. A high-quality Japanese-language map, such as one powered by Mapbox GL JS, will display some text vertically to better fit the allotted space, just like on shop signs.

The punctuation characters for vertical text are very different than the ones for horizontal text:

There are some very nuanced conventions for when to rotate individual characters or keep them upright in vertical text. Acronyms tend to stay upright, and if possible they get crammed horizontally into a single character block in a practice called tate-chu-yoko. This stuff keeps graphics engineers up at night.

Japan is largely monolingual, but if a renderer wants to combine Japanese text with text in some other language, it might need to try a little harder than a slash.

Meanwhile, there are other use cases for names that require data consumers to split the name on a delimiter. Colocated offices very often have multiple signposted names that appear in arbitrary order. Presumably you’d want to search for your doctor by name, not by all her associates’ names:

Overall, I think we should apply same principle as we do with abbreviations: avoid misspelling, causing offense, or violating trademark law, but otherwise aim for structured data that can be readily consumed.

2 Likes

I don’t agree with this. To really solve the problem, we need machine-understandable delimiters when you have multiple values in a name tag. This has nothing to do with vector tiles at all, and I’m not sure why it keeps coming up as the solution to a data problem.

Below is the OSM Americana map of Brussels, localized in French:

The logic we use – now, today – is to show labels in the user’s preferred language, and then below it in parentheses, the local name if it is different from the main label.

So this label SHOULD read:

Bruxelles
(Brussel)

However, instead, the style draws the user-preferred French “Bruxelles” at the top and the local language label of “Bruxelles - Brussel” below it, because it has no way to know that “Bruxelles - Brussel” is two separate names that should each be checked to see if its the same as the main label. If the name tag had multiple delimited names, the code could simply check each name to determine whether it’s different from the main label, and then only display the ones that aren’t already displayed.

The suggestion of “just ignoring the name tag” is not a useful suggestion for this use case. The idea of “the name that is used locally” is an important concept that is meaningful to display on maps. It’s important even if there are multiple local names. It’s the style’s job to figure out what to do in the case of multiple local names. However, a style can’t do that job if it can’t determine whether or not the name tag contains multiple delimited entries.

Further, this problem is solvable in both vector AND raster tiles. Even if a raster tile server is not doing localization, it can still take the multiple delimited names in a name tag and render them with a human-readable delimiter, such as a dash, a slash, or a line break.

12 Likes