Multiple delimited names in the name tag

In my opinion there have to be a global consensus about how to add multi-lingual names of same level of importance and on local level the mappers need to find an agreement whether the area has a multi-lingual name.

1 Like

I agree. Maybe earlier suggested default_language=* ?

About “same level of importance”, that never happen in reality on the ground either. Always some name will be in preferred position (i.e. first). Same as “default_language=hr - it” would indicate that hr is prefer to it in multilingual (e.g. instruct renderer to show first name:hr followed by literal - followed by name:it).

That was exactly what you were supposed to see. “name:cy” and “name:en” are both names that are used there; arguably either would be valid to pick, and I picked the welsh one. “name:ur” is just a transliteration of Cardigan, and its use in that language (at least according to a handy web search engine) is mostly as the item of clothing. That name doesn’t really belong in OSM at all - if you pitched up in the centre of town and tried to speak exclusively in Urdu, communication may be a challenge. As demonstrated above, a map that wanted to show Urdu names could of course do that via wikidata translations.

2 Likes
  1. mapping multiple names with a defined delimiter, the data-user can define the delimiter based on their users cultural background/preferences or other reason

we could have a tag to suggest a delimiter.

I apologize in advance for the length, but several blue-sky ideas have been floated so far that I think could benefit from concrete counterexamples.

Any delimiter you like

This is very meta. Pretty soon there will be a need for default_name_separator=; because no one would want to blanket a multilingual region in the same name_separator=; tag over and over again. Nothing would know how to interpret this key today; maybe in another few years’ time?

Meanwhile, the technology already exists to understand what ; means when it occurs in a name tag. If some local communities prefer not to use it for now, that’s their prerogative. But for those communities that are already quietly using a semicolon, it shouldn’t be necessary for them to explicitly indicate that they want name to work just like any other key in OSM, such as destination (which already takes multiple names in the local language).

Any order you like

This strikes me as an oversimplification of the reality on the ground. Maybe someday someone will figure out how to use default_language in regions that have a strict system of multilingual names, but some places are just more complex.

To show you where I’m coming from, here are some pretty typical examples from places I’ve visited in the U.S. I’m very curious how folks think default_language would help us determine either a standard name order or a standard separator other than the semicolon that all data consumers already handle in some fashion and some already handle elegantly.

When the City of Houston turned streets such as Turtlewood Drive and Bellaire Boulevard into “Turtlewood Drive :traffic_light: Ngụy Văn Thà” and “Đại Lộ Sàigòn :traffic_light: Bellaire Boulevard”, respectively, they just stuck the Vietnamese-language signs wherever there was enough room for an additional sign:

Some of the English signs are so faded that an English-speaking traveler may need to rely on the Vietnamese signs in some cases. A map could show them something like “Turtlewood Dr. / Ngụy Văn Thà” or “Turtlewood Drive — Ngụy Văn Thà” or “Turtlewood Dr. (Ngụy Văn Thà)” or “Turtlewood Dr.” above the street and “Ngụy Văn Thà” below it. The specific delimiter here only matters to the map style designer. The order of the names in name doesn’t matter much either, because the preferred-language name will come first in any savvy map style or navigation guidance instruction.

The city only dual-named the through streets in this neighborhood, but other things are named differently. Turn 90 degrees clockwise and you’ll see a restaurant whose name is signposted in interleaved English, Chinese, and Vietnamese above some shops that are in English only or Vietnamese only:

The San Francisco Bay Area, where I live, happens to be very linguistically diverse. Many places of worship around me offer services in multiple languages and make every effort to unify their congregations despite a language barrier. This Jehovah’s Witness Kingdom Hall serves both English and Spanish speakers on equal terms. The placement of English to the left of Spanish is purely coincidental. As far as I know, they don’t have a preferred delimiter either.

This supermarket has two signs visible from the street. The logo on the sign in the foreground puts Korean on top of English, while the sign on the façade puts English to the left of Korean:

This doctor’s office posts its English name above and to the left of its Vietnamese name, but I think he mostly serves Vietnamese-speaking patients:

And let’s not forget that sometimes a feature can have multiple names regardless of language. Before it moved earlier this year, this flag and costume store was either “Funhouse/Flaghouse” or “Flaghouse/Funhouse”, depending on whether you looked at the sign on the front or the rear. (Customers typically parked in front and entered around back.) If I recall correctly, the receipt had both names printed on it, separated by the delimiter “***”.

Any language you like

I agree that user preferences matter a lot for rendering, but showing only the user-preferred language isn’t a panacea. OSM Americana’s local-name gloss has a lot of precedent in the American map publishing industry. Here’s a page of a small world atlas I used in school. It’s designed for students in geography class, so it’s representative of more serious reference works like those by the National Geographic Society:

Rome is “Rome (Roma)” and Naples is “Naples (Napoli)”. Wherever an anglicized name matches the local name minus diacritical marks, it restores the diacritics, as in “València” for Valencia. The only novel aspect of Americana’s language support is that it automatically chooses the main language based on your individual preference instead of making you buy a separate copy from the bookstore. But otherwise it’s a conservative approach that doesn’t necessarily open the floodgates to the complicated fallback preferences suggested earlier.

There are a couple things this atlas does that Americana can’t currently do based on OSM data. It avoids repeating a name just because English and one of the local languages happens to agree on a name. Americana also can’t automatically transliterate local names into Latin script for readability. However, that’s more of a problem to solve outside of OSM, since for example English and German require different transliteration systems for the same source language.

As long as it’s not Japanese

Anything that concatenates two arbitrary languages’ names will run into situations like this. Perhaps the most complicated example is Japanese. A high-quality Japanese-language map, such as one powered by Mapbox GL JS, will display some text vertically to better fit the allotted space, just like on shop signs.

The punctuation characters for vertical text are very different than the ones for horizontal text:

There are some very nuanced conventions for when to rotate individual characters or keep them upright in vertical text. Acronyms tend to stay upright, and if possible they get crammed horizontally into a single character block in a practice called tate-chu-yoko. This stuff keeps graphics engineers up at night.

Japan is largely monolingual, but if a renderer wants to combine Japanese text with text in some other language, it might need to try a little harder than a slash.

Meanwhile, there are other use cases for names that require data consumers to split the name on a delimiter. Colocated offices very often have multiple signposted names that appear in arbitrary order. Presumably you’d want to search for your doctor by name, not by all her associates’ names:

Overall, I think we should apply same principle as we do with abbreviations: avoid misspelling, causing offense, or violating trademark law, but otherwise aim for structured data that can be readily consumed.

2 Likes

I don’t agree with this. To really solve the problem, we need machine-understandable delimiters when you have multiple values in a name tag. This has nothing to do with vector tiles at all, and I’m not sure why it keeps coming up as the solution to a data problem.

Below is the OSM Americana map of Brussels, localized in French:

The logic we use – now, today – is to show labels in the user’s preferred language, and then below it in parentheses, the local name if it is different from the main label.

So this label SHOULD read:

Bruxelles
(Brussel)

However, instead, the style draws the user-preferred French “Bruxelles” at the top and the local language label of “Bruxelles - Brussel” below it, because it has no way to know that “Bruxelles - Brussel” is two separate names that should each be checked to see if its the same as the main label. If the name tag had multiple delimited names, the code could simply check each name to determine whether it’s different from the main label, and then only display the ones that aren’t already displayed.

The suggestion of “just ignoring the name tag” is not a useful suggestion for this use case. The idea of “the name that is used locally” is an important concept that is meaningful to display on maps. It’s important even if there are multiple local names. It’s the style’s job to figure out what to do in the case of multiple local names. However, a style can’t do that job if it can’t determine whether or not the name tag contains multiple delimited entries.

Further, this problem is solvable in both vector AND raster tiles. Even if a raster tile server is not doing localization, it can still take the multiple delimited names in a name tag and render them with a human-readable delimiter, such as a dash, a slash, or a line break.

12 Likes

Here’s a page of a small world atlas I used in school. It’s designed for students in geography class, so it’s representative of more serious reference works like those by the National Geographic Society:

Rome is “Rome (Roma)” and Naples is “Naples (Napoli)”. Wherever an anglicized name matches the local name minus diacritical marks, it restores the diacritics, as in “València” for Valencia.

it doesn’t seem to care for multiple languages though, Cagliari is just Cagliari https://salimbasarda.net/wp-content/uploads/2015/09/Cagliari-cartello-bilingue.jpg
and Alicante is just Alacant

Yes, it’s just a simplistic atlas for students, not a particularly serious reference work, so it omits some details for clutter avoidance. What if it’s just picking one local language and ignoring the others, assuming that the geography teacher won’t penalize the student for a partial answer? However ill-considered that might be, reproducing that effect using OSM would still require splitting the name on a predictable delimiter.

I understood it in a way, that the mapper could specify in a tag like name_separator=* whichever delimiter is used/is common locally. So in case of Brussels:
name=Bruxelles;Brussel + name_separator= - (not sure about how to handle the space). So a data-user has the possibility to display the map how it’s common locally. Like it’s enforced today by writing it in name.

2 Likes

Would also do in your case. Of course, it’s not done with ignoring name. More or less it’s like

More or less the process would be:
Check for default_language → get the corresponding name:* values → use name as fallback → check for the user defined name=* → do your filtering and display the label

Technical it’s similar than splitting name to get the local names. It might be a compromise, so name could be however local community want’s to have it and by default_language all the more advanced data-users can build their own names. But still I think using the delimiter-ed name is the easier and more stable approach.

1 Like

Would also do in your case. Of course, it’s not done with ignoring name. More or less it’s like >

More or less the process would be:

Check for default_language → get the corresponding name:* values → use name as fallback → check for the user defined name=* → do your filtering and display the label

You then need to ensure name:en is sensible and not howlers such as ‘Arch of Triumph’, added by mapbox for some reason but now fixed after I commented.

And there are still multilingual places such as Brussels, where my app currently displays what is on the sign and experience tells me that works. If my app is set to English if it has to choose between French and Dutch? Neither are what it says on the sign. I don’t need to understand street names, just match the strings. Many street names don’t mean anything to a native speaker anyway, Wyle Cop/Mardol?

3 Likes

Maybe you didn’t get the point? The main point is, it should be possible how a multi-lingual name is labelled is in control of your app, not up to a local mapper. Whether your app shows English, French or Chinese is up to the app or up to your settings of the app.

Check osm.org and search for some famous cities like Tokyo, Bangkok, Shanghai. Good luck :wink: In deed, you don’t need to understand 上海市, but it might be helpful to know how to pronounce it. At least if you want to tell someone, where you are.

Totally true, but I assume the quality of name:* will increase dramatically when it’s used more prominently. So I don’t have any concerns about it.

As someone creating a map or app, how do I show “what is written on the signs”?

2 Likes

Wait for it… map the signs! :wink:

2 Likes

Depending which sign…
Choose between town signs:



Highway exit:

or how the town calls itself on their website:
http://www.cottbus.de/

:smiley:

I described two possible ways in:

1 Like

That example of Cottbus (in OSM) is also interesting because I’d imagine that any name(s) used might reflect a political stance too, because of this. I’m not familiar with the background discussion in OSM DE about the current “German - Low Sorbian” naming, but I bet someone here is :slight_smile:

It was the starting point of the “new” discussion in German category regarding how to put something like “dual” names in our database.
The name is something like a political_name, maybe a bit comparable to your Derry/Londonderry, where almost nobody calls it by this, but either the first part or the second. In your case it’s based on their religious background, in my case it’s based on language.

1 Like

Do you have some links to areas which are already using a semicolon commonly?

1 Like

I pulled down an overpass query today of all names with a semi-colon in order to do some cursory analysis.

This is my overpass query:

[out:csv(::type,::id,::user,wikidata,name,place,boundary,highway,amenity)][timeout:2500];
nwr[name~";"];
out;

Below is a link to the raw data. I’ve included some convenient pivot tables showing primary tag prevalence in separate tabs. This spreadsheet is in LibreOffice Calc (.ods) format.

In summary, this is used in:

  • 489 place=* objects
  • 38 boundary=* objects
  • 13,401 highway=* objects, of which 761 are highway=motorway_junction
  • 285 amenity=* objects

Those were just the top-level tags that I checked. There are still another 17K or so objects that are some other top-level tag, such as power features and landuse/land cover areas.

Examples

Below is a gallery of screen grabs from osm-carto showing rendered semi-colons, just to give a flavor of the diversity of usages:

A highway=motorway_junction near Munich, Germany:
image

A highway=secondary near Kyoto, Japan:
image

A highway=primary near Constantine, Algeria. The semi-colon is difficult to spot amongst the Arabic text, but it’s there!
image

A landuse=vineyard near Marseille, France.
image

An amenity=place_of_worship in Cincinatti, Ohio, USA:
image

An amenity=school near Brno, Czechia:
image

2 Likes

Not for dense usage. So far, any current usage would be sparse and largely confined to things that don’t show up prominently in openstreetmap-carto. (After all, mappers don’t really want to make openstreetmap-carto look ugly.) If you need areas of the map to test out semicolon replacement, you could try some of the examples elsewhere in this thread or in the openstreetmap-carto feature request.

Unlike with many feature requests related to tagging, I don’t think it should be necessary to point to an organic upswell in usage for the semicolon delimiter to be taken up by additional data consumers, for the following reasons:

  • In general, the semicolon is by far the most common delimiter between distinct values, if not in name itself then in other name keys such as alt_name, int_name, short_name, and destination.
  • The semicolon in name has long been supported by multiple data consumers.
  • So far, no cases have come up in which a delimiter would be a misinterpretation of a semicolon in a name, but for those cases there’s an escape sequence anyways.

At one point, slashes and dashes were considered for a general delimiter syntax, but semicolons won out. It would be interesting to track down those ancient discussions to see which keys people had in mind at the time.

2 Likes