Multiple delimited names in the name tag

dieterdreist · December 24, 2022, 1:02am

Here’s a page of a small world atlas I used in school. It’s designed for students in geography class, so it’s representative of more serious reference works like those by the National Geographic Society:

Rome is “Rome (Roma)” and Naples is “Naples (Napoli)”. Wherever an anglicized name matches the local name minus diacritical marks, it restores the diacritics, as in “València” for Valencia.

it doesn’t seem to care for multiple languages though, Cagliari is just Cagliari https://salimbasarda.net/wp-content/uploads/2015/09/Cagliari-cartello-bilingue.jpg
and Alicante is just Alacant

Minh_Nguyen · December 24, 2022, 1:04am

Yes, it’s just a simplistic atlas for students, not a particularly serious reference work, so it omits some details for clutter avoidance. What if it’s just picking one local language and ignoring the others, assuming that the geography teacher won’t penalize the student for a partial answer? However ill-considered that might be, reproducing that effect using OSM would still require splitting the name on a predictable delimiter.

aighes · December 24, 2022, 8:40am

I understood it in a way, that the mapper could specify in a tag like name_separator=* whichever delimiter is used/is common locally. So in case of Brussels:
name=Bruxelles;Brussel + name_separator= - (not sure about how to handle the space). So a data-user has the possibility to display the map how it’s common locally. Like it’s enforced today by writing it in name.

aighes · December 24, 2022, 9:01am

Would also do in your case. Of course, it’s not done with ignoring name. More or less it’s like

More or less the process would be:
Check for default_language → get the corresponding name:* values → use name as fallback → check for the user defined name=* → do your filtering and display the label

Technical it’s similar than splitting name to get the local names. It might be a compromise, so name could be however local community want’s to have it and by default_language all the more advanced data-users can build their own names. But still I think using the delimiter-ed name is the easier and more stable approach.

trigpoint · December 24, 2022, 1:56pm

ZeLonewolf:

The suggestion of “just ignoring the name tag” is not a useful suggestion for this use case.

Would also do in your case. Of course, it’s not done with ignoring name. More or less it’s like >

aighes:

mapping no name, instead only name:<lang> and add a separate local_language=* with a ;-separated list of local languages to be used.

More or less the process would be:

Check for default_language → get the corresponding name:* values → use name as fallback → check for the user defined name=* → do your filtering and display the label

You then need to ensure name:en is sensible and not howlers such as ‘Arch of Triumph’, added by mapbox for some reason but now fixed after I commented.

And there are still multilingual places such as Brussels, where my app currently displays what is on the sign and experience tells me that works. If my app is set to English if it has to choose between French and Dutch? Neither are what it says on the sign. I don’t need to understand street names, just match the strings. Many street names don’t mean anything to a native speaker anyway, Wyle Cop/Mardol?

aighes · December 24, 2022, 9:17pm

Maybe you didn’t get the point? The main point is, it should be possible how a multi-lingual name is labelled is in control of your app, not up to a local mapper. Whether your app shows English, French or Chinese is up to the app or up to your settings of the app.

Check osm.org and search for some famous cities like Tokyo, Bangkok, Shanghai. Good luck In deed, you don’t need to understand 上海市, but it might be helpful to know how to pronounce it. At least if you want to tell someone, where you are.

Totally true, but I assume the quality of name:* will increase dramatically when it’s used more prominently. So I don’t have any concerns about it.

SomeoneElse · December 24, 2022, 9:23pm

As someone creating a map or app, how do I show “what is written on the signs”?

Minh_Nguyen · December 24, 2022, 9:55pm

Wait for it… map the signs!

aighes · December 24, 2022, 10:06pm

Depending which sign…
Choose between town signs:

https://upload.wikimedia.org/wikipedia/commons/7/73/Ortsschild_Cottbus.jpg(image larger than 4 MB)

Highway exit:

or how the town calls itself on their website:
http://www.cottbus.de/

I described two possible ways in:

SomeoneElse · December 24, 2022, 10:23pm

That example of Cottbus (in OSM) is also interesting because I’d imagine that any name(s) used might reflect a political stance too, because of this. I’m not familiar with the background discussion in OSM DE about the current “German - Low Sorbian” naming, but I bet someone here is

aighes · December 25, 2022, 8:24am

It was the starting point of the “new” discussion in German category regarding how to put something like “dual” names in our database.
The name is something like a political_name, maybe a bit comparable to your Derry/Londonderry, where almost nobody calls it by this, but either the first part or the second. In your case it’s based on their religious background, in my case it’s based on language.

pnorman · December 26, 2022, 7:46am

Do you have some links to areas which are already using a semicolon commonly?

ZeLonewolf · December 26, 2022, 5:00pm

I pulled down an overpass query today of all names with a semi-colon in order to do some cursory analysis.

This is my overpass query:

[out:csv(::type,::id,::user,wikidata,name,place,boundary,highway,amenity)][timeout:2500];
nwr[name~";"];
out;

Below is a link to the raw data. I’ve included some convenient pivot tables showing primary tag prevalence in separate tabs. This spreadsheet is in LibreOffice Calc (.ods) format.

In summary, this is used in:

489 place=* objects
38 boundary=* objects
13,401 highway=* objects, of which 761 are highway=motorway_junction
285 amenity=* objects

Those were just the top-level tags that I checked. There are still another 17K or so objects that are some other top-level tag, such as power features and landuse/land cover areas.

Examples

Below is a gallery of screen grabs from osm-carto showing rendered semi-colons, just to give a flavor of the diversity of usages:

A highway=motorway_junction near Munich, Germany:

A highway=secondary near Kyoto, Japan:

A highway=primary near Constantine, Algeria. The semi-colon is difficult to spot amongst the Arabic text, but it’s there!

A landuse=vineyard near Marseille, France.

An amenity=place_of_worship in Cincinatti, Ohio, USA:

An amenity=school near Brno, Czechia:

Minh_Nguyen · December 26, 2022, 6:32pm

Not for dense usage. So far, any current usage would be sparse and largely confined to things that don’t show up prominently in openstreetmap-carto. (After all, mappers don’t really want to make openstreetmap-carto look ugly.) If you need areas of the map to test out semicolon replacement, you could try some of the examples elsewhere in this thread or in the openstreetmap-carto feature request.

Unlike with many feature requests related to tagging, I don’t think it should be necessary to point to an organic upswell in usage for the semicolon delimiter to be taken up by additional data consumers, for the following reasons:

In general, the semicolon is by far the most common delimiter between distinct values, if not in name itself then in other name keys such as alt_name, int_name, short_name, and destination.
The semicolon in name has long been supported by multiple data consumers.
So far, no cases have come up in which a delimiter would be a misinterpretation of a semicolon in a name, but for those cases there’s an escape sequence anyways.

At one point, slashes and dashes were considered for a general delimiter syntax, but semicolons won out. It would be interesting to track down those ancient discussions to see which keys people had in mind at the time.

Minh_Nguyen · December 26, 2022, 6:33pm

By the way, even though a spaced hyphen ( - ) is being used as a delimiter in some regions, it’s also very commonly used as a non-delimiter, including sometimes in those same regions. Some examples:

Way: ‪Bundessprachenamt - Sprachenzentrum Nord‬ (‪38880137‬) | OpenStreetMap
Node: ‪Bund der Vertriebenen - Landesverband Brandenburg‬ (‪7038187988‬) | OpenStreetMap
Way: ‪Fernwärmeleitung Kraftwerk Lippendorf - Stadtwerke Leipzig‬ (‪439440235‬) | OpenStreetMap
Node: ‪Parken Pelz - Parkplatz Flughafen Leipzig / Flughafentransfer‬ (‪5299207188‬) | OpenStreetMap
Way: ‪Carabinieri Bolzano - Bozen‬ (‪395318127‬) | OpenStreetMap (int_name, but the partially multilingual name is tricky too)
Relation: ‪Mariafjellet - Skardbekken naturreservat / Tjaetsiegaske eatnemedavje‬ (‪13346667‬) | OpenStreetMap

I don’t know if these particular spaced hyphens are culturally significant, but they aren’t necessarily tagging errors. (This reminds me that I happen to speak a language where one is always expected to put spaces around a hyphen.)

In OSM, these non-delimiter spaced hyphens tend to occur on things other than places, such as roads. I wonder if previous discussions around delimiting names may have been focused too heavily on place names (and welcome signs) at the expense of everything else in OSM that can also be multilingual. Personally, I find this unfortunate, living in a country where multilingualism arises organically and place nodes are the feature type least likely to need multiple values in name.

dieterdreist · December 26, 2022, 11:13pm

Way: ‪Carabinieri Bolzano - Bozen‬ (‪395318127‬) | OpenStreetMap (int_name, but the partially multilingual name is tricky too)

I wonder what the semicolon suggestion for this one would be

Hungerburg · December 26, 2022, 11:22pm

The int_name reads much like a description. As a semicolon delimited list of proper names I’d suggest `Carabinieri Bolzano;Carabinieri Bozen - Italian first, as this is Bozen, where census shows an Italian majority, and - I’d not localize carabinieri, AFAIK this cannot be done.

IanH · December 26, 2022, 11:45pm

Adding additional delimiters for name lists is probably going to be a non-starter as that requires changes to any code that parsers list. I believe creating a name_delimiter key. This would allow the labeling proceesor to use the culturally appropriate joining character as needed.

dieterdreist · December 27, 2022, 12:14am

As a semicolon delimited list of proper names I’d suggest `Carabinieri Bolzano;Carabinieri Bozen - Italian first, as this is Bozen, where census shows an Italian majority, and - I’d not localize carabinieri, AFAIK this cannot be done.

the province of Bolzano has 69% German and 26% Italian population, (and 4.5% Ladin), as the name refers to the provincial command (admin level 6), maybe these numbers should be taken into account, rather than the municipal demographics?

IMHO in this case, “name=Carabinieri Bozen - Bolzano” is preferable to “name=Carabinieri Bolzano;Carabinieri Bozen” because the former is already the best possibile “unbiased” name and it seems unlikely that any actual software implementation will produce it automatically from the latter.

Minh_Nguyen · December 27, 2022, 12:30am

The only delimiter that I’m suggesting to “add”, the existing semicolon delimiter, requires no changes to any code that already parses name as a list and most likely trivial code to those that don’t. On the contrary, recognizing any other sequence as a delimiter would require nontrivial changes to any code that already parses name as a list, including Mapbox Streets, GraphHopper, Valhalla, Nominatim, the Overpass API, and Sophox.

If the delimiter used for a particular feature is inviolable, never to be replaced with something else in any context, then the delimiter should just be hard-coded in name. Otherwise, it would be far too easy for a data consumer to be unaware of a separate key such as name_delimiter and accidentally offend someone’s culture. But not nearly every case of multiple names requires such a rigid approach to delimiters, as the example from Houston colorfully illustrates. So far no one has suggested a delimiter that must be used at that intersection, to the exclusion of other punctuation characters.

Many users are actually quite biased toward their own language. If a map shows the German speaker “Carabinieri Bozen (Carabinieri Bolzano)” and the Italian speaker “Carabinieri Bozen (Carabinieri Bolzano)” instead of showing “Carabinieri Bozen - Bolzano” to both, the map would be biased in favor of the user. I don’t see a problem with that.