Multiple delimited names in the name tag

Here’s a page of a small world atlas I used in school. It’s designed for students in geography class, so it’s representative of more serious reference works like those by the National Geographic Society:

Rome is “Rome (Roma)” and Naples is “Naples (Napoli)”. Wherever an anglicized name matches the local name minus diacritical marks, it restores the diacritics, as in “València” for Valencia.

it doesn’t seem to care for multiple languages though, Cagliari is just Cagliari https://salimbasarda.net/wp-content/uploads/2015/09/Cagliari-cartello-bilingue.jpg
and Alicante is just Alacant

Yes, it’s just a simplistic atlas for students, not a particularly serious reference work, so it omits some details for clutter avoidance. What if it’s just picking one local language and ignoring the others, assuming that the geography teacher won’t penalize the student for a partial answer? However ill-considered that might be, reproducing that effect using OSM would still require splitting the name on a predictable delimiter.

I understood it in a way, that the mapper could specify in a tag like name_separator=* whichever delimiter is used/is common locally. So in case of Brussels:
name=Bruxelles;Brussel + name_separator= - (not sure about how to handle the space). So a data-user has the possibility to display the map how it’s common locally. Like it’s enforced today by writing it in name.

2 Likes

Would also do in your case. Of course, it’s not done with ignoring name. More or less it’s like

More or less the process would be:
Check for default_language → get the corresponding name:* values → use name as fallback → check for the user defined name=* → do your filtering and display the label

Technical it’s similar than splitting name to get the local names. It might be a compromise, so name could be however local community want’s to have it and by default_language all the more advanced data-users can build their own names. But still I think using the delimiter-ed name is the easier and more stable approach.

1 Like

Would also do in your case. Of course, it’s not done with ignoring name. More or less it’s like >

More or less the process would be:

Check for default_language → get the corresponding name:* values → use name as fallback → check for the user defined name=* → do your filtering and display the label

You then need to ensure name:en is sensible and not howlers such as ‘Arch of Triumph’, added by mapbox for some reason but now fixed after I commented.

And there are still multilingual places such as Brussels, where my app currently displays what is on the sign and experience tells me that works. If my app is set to English if it has to choose between French and Dutch? Neither are what it says on the sign. I don’t need to understand street names, just match the strings. Many street names don’t mean anything to a native speaker anyway, Wyle Cop/Mardol?

3 Likes

Maybe you didn’t get the point? The main point is, it should be possible how a multi-lingual name is labelled is in control of your app, not up to a local mapper. Whether your app shows English, French or Chinese is up to the app or up to your settings of the app.

Check osm.org and search for some famous cities like Tokyo, Bangkok, Shanghai. Good luck :wink: In deed, you don’t need to understand 上海市, but it might be helpful to know how to pronounce it. At least if you want to tell someone, where you are.

Totally true, but I assume the quality of name:* will increase dramatically when it’s used more prominently. So I don’t have any concerns about it.

As someone creating a map or app, how do I show “what is written on the signs”?

2 Likes

Wait for it… map the signs! :wink:

2 Likes

Depending which sign…
Choose between town signs:



Highway exit:

or how the town calls itself on their website:
http://www.cottbus.de/

:smiley:

I described two possible ways in:

1 Like

That example of Cottbus (in OSM) is also interesting because I’d imagine that any name(s) used might reflect a political stance too, because of this. I’m not familiar with the background discussion in OSM DE about the current “German - Low Sorbian” naming, but I bet someone here is :slight_smile:

It was the starting point of the “new” discussion in German category regarding how to put something like “dual” names in our database.
The name is something like a political_name, maybe a bit comparable to your Derry/Londonderry, where almost nobody calls it by this, but either the first part or the second. In your case it’s based on their religious background, in my case it’s based on language.

1 Like

Do you have some links to areas which are already using a semicolon commonly?

1 Like

I pulled down an overpass query today of all names with a semi-colon in order to do some cursory analysis.

This is my overpass query:

[out:csv(::type,::id,::user,wikidata,name,place,boundary,highway,amenity)][timeout:2500];
nwr[name~";"];
out;

Below is a link to the raw data. I’ve included some convenient pivot tables showing primary tag prevalence in separate tabs. This spreadsheet is in LibreOffice Calc (.ods) format.

In summary, this is used in:

  • 489 place=* objects
  • 38 boundary=* objects
  • 13,401 highway=* objects, of which 761 are highway=motorway_junction
  • 285 amenity=* objects

Those were just the top-level tags that I checked. There are still another 17K or so objects that are some other top-level tag, such as power features and landuse/land cover areas.

Examples

Below is a gallery of screen grabs from osm-carto showing rendered semi-colons, just to give a flavor of the diversity of usages:

A highway=motorway_junction near Munich, Germany:
image

A highway=secondary near Kyoto, Japan:
image

A highway=primary near Constantine, Algeria. The semi-colon is difficult to spot amongst the Arabic text, but it’s there!
image

A landuse=vineyard near Marseille, France.
image

An amenity=place_of_worship in Cincinatti, Ohio, USA:
image

An amenity=school near Brno, Czechia:
image

2 Likes

Not for dense usage. So far, any current usage would be sparse and largely confined to things that don’t show up prominently in openstreetmap-carto. (After all, mappers don’t really want to make openstreetmap-carto look ugly.) If you need areas of the map to test out semicolon replacement, you could try some of the examples elsewhere in this thread or in the openstreetmap-carto feature request.

Unlike with many feature requests related to tagging, I don’t think it should be necessary to point to an organic upswell in usage for the semicolon delimiter to be taken up by additional data consumers, for the following reasons:

  • In general, the semicolon is by far the most common delimiter between distinct values, if not in name itself then in other name keys such as alt_name, int_name, short_name, and destination.
  • The semicolon in name has long been supported by multiple data consumers.
  • So far, no cases have come up in which a delimiter would be a misinterpretation of a semicolon in a name, but for those cases there’s an escape sequence anyways.

At one point, slashes and dashes were considered for a general delimiter syntax, but semicolons won out. It would be interesting to track down those ancient discussions to see which keys people had in mind at the time.

2 Likes

By the way, even though a spaced hyphen ( - ) is being used as a delimiter in some regions, it’s also very commonly used as a non-delimiter, including sometimes in those same regions. Some examples:

I don’t know if these particular spaced hyphens are culturally significant, but they aren’t necessarily tagging errors. (This reminds me that I happen to speak a language where one is always expected to put spaces around a hyphen.)

In OSM, these non-delimiter spaced hyphens tend to occur on things other than places, such as roads. I wonder if previous discussions around delimiting names may have been focused too heavily on place names (and welcome signs) at the expense of everything else in OSM that can also be multilingual. Personally, I find this unfortunate, living in a country where multilingualism arises organically and place nodes are the feature type least likely to need multiple values in name.

1 Like

Way: ‪Carabinieri Bolzano - Bozen‬ (‪395318127‬) | OpenStreetMap (int_name, but the partially multilingual name is tricky too)

I wonder what the semicolon suggestion for this one would be

The int_name reads much like a description. As a semicolon delimited list of proper names I’d suggest `Carabinieri Bolzano;Carabinieri Bozen - Italian first, as this is Bozen, where census shows an Italian majority, and - I’d not localize carabinieri, AFAIK this cannot be done.

Adding additional delimiters for name lists is probably going to be a non-starter as that requires changes to any code that parsers list. I believe creating a name_delimiter key. This would allow the labeling proceesor to use the culturally appropriate joining character as needed.

As a semicolon delimited list of proper names I’d suggest `Carabinieri Bolzano;Carabinieri Bozen - Italian first, as this is Bozen, where census shows an Italian majority, and - I’d not localize carabinieri, AFAIK this cannot be done.

the province of Bolzano has 69% German and 26% Italian population, (and 4.5% Ladin), as the name refers to the provincial command (admin level 6), maybe these numbers should be taken into account, rather than the municipal demographics?

IMHO in this case, “name=Carabinieri Bozen - Bolzano” is preferable to “name=Carabinieri Bolzano;Carabinieri Bozen” because the former is already the best possibile “unbiased” name and it seems unlikely that any actual software implementation will produce it automatically from the latter.

The only delimiter that I’m suggesting to “add”, the existing semicolon delimiter, requires no changes to any code that already parses name as a list and most likely trivial code to those that don’t. On the contrary, recognizing any other sequence as a delimiter would require nontrivial changes to any code that already parses name as a list, including Mapbox Streets, GraphHopper, Valhalla, Nominatim, the Overpass API, and Sophox.

If the delimiter used for a particular feature is inviolable, never to be replaced with something else in any context, then the delimiter should just be hard-coded in name. Otherwise, it would be far too easy for a data consumer to be unaware of a separate key such as name_delimiter and accidentally offend someone’s culture. But not nearly every case of multiple names requires such a rigid approach to delimiters, as the :traffic_light: example from Houston colorfully illustrates. So far no one has suggested a delimiter that must be used at that intersection, to the exclusion of other punctuation characters.

Many users are actually quite biased toward their own language. If a map shows the German speaker “Carabinieri Bozen (Carabinieri Bolzano)” and the Italian speaker “Carabinieri Bozen (Carabinieri Bolzano)” instead of showing “Carabinieri Bozen - Bolzano” to both, the map would be biased in favor of the user. I don’t see a problem with that.

2 Likes