Multiple delimited names in the name tag

That example of Cottbus (in OSM) is also interesting because I’d imagine that any name(s) used might reflect a political stance too, because of this. I’m not familiar with the background discussion in OSM DE about the current “German - Low Sorbian” naming, but I bet someone here is :slight_smile:

It was the starting point of the “new” discussion in German category regarding how to put something like “dual” names in our database.
The name is something like a political_name, maybe a bit comparable to your Derry/Londonderry, where almost nobody calls it by this, but either the first part or the second. In your case it’s based on their religious background, in my case it’s based on language.

1 Like

Do you have some links to areas which are already using a semicolon commonly?

1 Like

I pulled down an overpass query today of all names with a semi-colon in order to do some cursory analysis.

This is my overpass query:

[out:csv(::type,::id,::user,wikidata,name,place,boundary,highway,amenity)][timeout:2500];
nwr[name~";"];
out;

Below is a link to the raw data. I’ve included some convenient pivot tables showing primary tag prevalence in separate tabs. This spreadsheet is in LibreOffice Calc (.ods) format.

In summary, this is used in:

  • 489 place=* objects
  • 38 boundary=* objects
  • 13,401 highway=* objects, of which 761 are highway=motorway_junction
  • 285 amenity=* objects

Those were just the top-level tags that I checked. There are still another 17K or so objects that are some other top-level tag, such as power features and landuse/land cover areas.

Examples

Below is a gallery of screen grabs from osm-carto showing rendered semi-colons, just to give a flavor of the diversity of usages:

A highway=motorway_junction near Munich, Germany:
image

A highway=secondary near Kyoto, Japan:
image

A highway=primary near Constantine, Algeria. The semi-colon is difficult to spot amongst the Arabic text, but it’s there!
image

A landuse=vineyard near Marseille, France.
image

An amenity=place_of_worship in Cincinatti, Ohio, USA:
image

An amenity=school near Brno, Czechia:
image

2 Likes

Not for dense usage. So far, any current usage would be sparse and largely confined to things that don’t show up prominently in openstreetmap-carto. (After all, mappers don’t really want to make openstreetmap-carto look ugly.) If you need areas of the map to test out semicolon replacement, you could try some of the examples elsewhere in this thread or in the openstreetmap-carto feature request.

Unlike with many feature requests related to tagging, I don’t think it should be necessary to point to an organic upswell in usage for the semicolon delimiter to be taken up by additional data consumers, for the following reasons:

  • In general, the semicolon is by far the most common delimiter between distinct values, if not in name itself then in other name keys such as alt_name, int_name, short_name, and destination.
  • The semicolon in name has long been supported by multiple data consumers.
  • So far, no cases have come up in which a delimiter would be a misinterpretation of a semicolon in a name, but for those cases there’s an escape sequence anyways.

At one point, slashes and dashes were considered for a general delimiter syntax, but semicolons won out. It would be interesting to track down those ancient discussions to see which keys people had in mind at the time.

2 Likes

By the way, even though a spaced hyphen ( - ) is being used as a delimiter in some regions, it’s also very commonly used as a non-delimiter, including sometimes in those same regions. Some examples:

I don’t know if these particular spaced hyphens are culturally significant, but they aren’t necessarily tagging errors. (This reminds me that I happen to speak a language where one is always expected to put spaces around a hyphen.)

In OSM, these non-delimiter spaced hyphens tend to occur on things other than places, such as roads. I wonder if previous discussions around delimiting names may have been focused too heavily on place names (and welcome signs) at the expense of everything else in OSM that can also be multilingual. Personally, I find this unfortunate, living in a country where multilingualism arises organically and place nodes are the feature type least likely to need multiple values in name.

1 Like

Way: ‪Carabinieri Bolzano - Bozen‬ (‪395318127‬) | OpenStreetMap (int_name, but the partially multilingual name is tricky too)

I wonder what the semicolon suggestion for this one would be

The int_name reads much like a description. As a semicolon delimited list of proper names I’d suggest `Carabinieri Bolzano;Carabinieri Bozen - Italian first, as this is Bozen, where census shows an Italian majority, and - I’d not localize carabinieri, AFAIK this cannot be done.

Adding additional delimiters for name lists is probably going to be a non-starter as that requires changes to any code that parsers list. I believe creating a name_delimiter key. This would allow the labeling proceesor to use the culturally appropriate joining character as needed.

As a semicolon delimited list of proper names I’d suggest `Carabinieri Bolzano;Carabinieri Bozen - Italian first, as this is Bozen, where census shows an Italian majority, and - I’d not localize carabinieri, AFAIK this cannot be done.

the province of Bolzano has 69% German and 26% Italian population, (and 4.5% Ladin), as the name refers to the provincial command (admin level 6), maybe these numbers should be taken into account, rather than the municipal demographics?

IMHO in this case, “name=Carabinieri Bozen - Bolzano” is preferable to “name=Carabinieri Bolzano;Carabinieri Bozen” because the former is already the best possibile “unbiased” name and it seems unlikely that any actual software implementation will produce it automatically from the latter.

The only delimiter that I’m suggesting to “add”, the existing semicolon delimiter, requires no changes to any code that already parses name as a list and most likely trivial code to those that don’t. On the contrary, recognizing any other sequence as a delimiter would require nontrivial changes to any code that already parses name as a list, including Mapbox Streets, GraphHopper, Valhalla, Nominatim, the Overpass API, and Sophox.

If the delimiter used for a particular feature is inviolable, never to be replaced with something else in any context, then the delimiter should just be hard-coded in name. Otherwise, it would be far too easy for a data consumer to be unaware of a separate key such as name_delimiter and accidentally offend someone’s culture. But not nearly every case of multiple names requires such a rigid approach to delimiters, as the :traffic_light: example from Houston colorfully illustrates. So far no one has suggested a delimiter that must be used at that intersection, to the exclusion of other punctuation characters.

Many users are actually quite biased toward their own language. If a map shows the German speaker “Carabinieri Bozen (Carabinieri Bolzano)” and the Italian speaker “Carabinieri Bozen (Carabinieri Bolzano)” instead of showing “Carabinieri Bozen - Bolzano” to both, the map would be biased in favor of the user. I don’t see a problem with that.

2 Likes

Many users are actually quite biased toward their own language. If a map shows the German speaker “Carabinieri Bozen (Carabinieri Bolzano)” and the Italian speaker “Carabinieri Bozen (Carabinieri Bolzano)” instead of showing “Carabinieri Bozen - Bolzano” to both, the map would be biased in favor of the user. I don’t see a problem with that.

I agree about the bias, this is the easy case and can be solved by always preferring e.g. name:de for a German map, my comment was for the question how to produce/store a good “neutral” name for those that don’t want to prioritize any given language, repeating “Carabinieri” in both, German and Italian, isn’t a very good solution, it’s an unnecessary repetition and hardly anybody labelling manually would use a label like “Carabinieri Bozen (Carabinieri Bolzano)”, you’d either use a double name for the city or omit one language all together.

The admin_level 6 (province) Relation: ‪Bolzano - Bozen‬ (‪47046‬) | OpenStreetMap also has Italian first. You are invited to change that to the less biased version :wink:

Reading the multiple foreign name tags suggests, that choice of delimiter itself might depend on language. The official_name:de|it BTW are incomplete.

On admin_level 4 (region) the language delimiter changes to a slash - Relation: ‪Trentino-Alto Adige/Südtirol‬ (‪45757‬) | OpenStreetMap as the single names already contain a dash/hyphen.

Having a canonical delimiter should help in deduplicating, if the user language is among the local languages. Can it help though, to construct an unbiased name to put in parentheses, in case the user is in a third language?

PS: Americana map style in this case does not need any of that, regions only show user language name, and that is fine.

You will never have a problem with a single lingual label. The problems start, when you want to show multi lingual labels, but in a different way the mapper want to see it. Therefore you need to know which local languages are existing (I assume, they are all available in OSM) and decide an order. If my users considered to be mainly Germans, I would prefer to label everything “DE - IT”. Doing this by: name:de - name:it in whole Italy would no be helpful. E.g., Rom - Roma as there is no signs in German in Rome.

In Belgium, the rules in OSM for the order of multi lingual names in name are driven by keeping peace within the community (first one decides) rather than any consistency. But consistency is what you want to see on the map. Especially if you neither know French nor Dutch.

Ha! I wish! :slight_smile:

(drifting off from “delimiters” here somewhat, but for context…)

Unfortunately, we get these too, for example disputes about the spelling of a word in a particular language. One recent one was about the New Zealand English spelling of some words of Māori heritage.

I think the real problem with this is that this will counter-intuitive to users unfamiliar with general OSM practice, and will immediately look poor on rendered maps until the tools catch-up, which will in turn lead to some reversion of the name tag. The best technical solution might not correspond with user expectations.

The former could be addressed by offering support in iD, as other editors are used by experienced contributors. From what you say using a semicolon should be the least complex for the renderers. If we move in this direction, it’s probably best to try and synchronise, say, iD and Carto-CSS support.

A few thoughts about how this might be done in iD:

  • Names containing typical delimiters, ask if it is a multi-lingual name before upload, and replace with “:” in advanced tagging view, possibly create an additional tag such as those suggested in this thread.
  • Rename current name:xx tags from “multilingual name(s)” to something like “language-specific name(s)”
  • Encourage population of the relevant name:xx tags
  • Warn if a semicolon-separated value of the name tag is not met by one name:xx tags
  • It would be lovely if one could directly construct the name tag from name:xx tags in the editor, but I suspect that this would require significant changes to the i/f for the single tag.

PS. Anecdata on multi-linual name usage: In 1973 I did a language exchange with a boy from Liège. My ferry to Ostend (/Oostende/Ostende) was delayed by a strike and I missed my booked train. The railway timetables in the station (or at least those I could find) were all in Flemish, and I did not know the Flemish name of Liège. I worked this out by looking at the timetable of my booked train, which arrived in Luik at the right time & was able to get to Liege on the last train of the day which left shortly afterwards.

3 Likes

iD has a “multicombo” field type (looks like a token field) that could be useful for hiding the raw semicolon from the user and making multiple names feel more natural. It’s currently used in the Destination field, for example. However, it doesn’t currently work with the multilingual subkeys. It probably would need to wait until iD adds support for alt_name; otherwise, mappers would be too incentivized to stuff everything in name, undifferentiated.

openstreetmap-carto would be an easy case technically. If I understand correctly, the holdup is that the maintainers are concerned about poor optics if they refine multilingual labeling anywhere in the world before improving font selection in regions that speak CJK languages or Arabic-script languages.

This is a nice idea, but it only covers the case when the names come from different languages. So far there have been a few examples in this thread where the names come from the same language but are on equal footing. Sometimes they can be distinguished by keys like loc_name and reg_name, though iD doesn’t have fields for those keys yet. A fix is waiting for review:

I think this could be partially addressed by careful phrasing of the warning. Current iD warnings tend to avoid nuances about edge cases & different tagging approaches, e.g. “outdated” often means “a more widely used scheme”. It would be interesting to know if newish users assiduously auto-correct the warnings or ignore some of them.

2 Likes

That’s fair, if a mapper thinks that the common-sense, language-neutral name of a POI needs to combine each language’s name in a non-obvious manner, then I think they should be able to put that combination in name, cognizant of the tradeoff that speakers of German or Italian would see some duplication (a minor annoyance at most).

But if we allow for this kind of combination, then it needs to be possible to distinguish it from the POI next door that has a less sophisticated concatenation of the two languages’ names. Otherwise, the Italian speaker would now see “Bolzano (Carabinieri Bozen)”.

It’s only possible to combine the names in this manner because German and Italian share certain similarities, such as how they’d write “carabinieri”. Similarly, I’m reminded of a supermarket I used to shop at, named “Senter Market” in English and “Chợ Senter” in Vietnamese. To save space, the sign out front said “Chợ Senter Market”. (I always got a kick out of hearing the Garmin GPS say, “Arriving at Choh Senter Market.”)

But we can’t necessarily encode those combinations in name even if signposted. In the Houston seafood buffet example earlier, the sign spliced Chinese characters between each English and Vietnamese word. Even if that’s how the sign is laid out, seeing it written that way in any other medium looks like garbled text. I think there’s a limit to how faithfully we reproduce signs, because we aren’t in the signmaking business.

(based on the complaints the DWG gets) new users treat suggestions by iD, the NSI, OsmOse etc. as gospel that must be followed. OsmOse goes out of its way to indicate that its suggestions are only suggestions; iD does not.

Whilst it can be helpful to prompt a set of brand tags for something that is “a normal brand outlet”, it doesn’t help if the NSI information is incomplete, vague or otherwise wrong. As an example (that you’re familiar with but others may not be), see this fast food place - there brand info for a brand that operates in the Southern US was added to a chicken shop in London based on the very generic name (and I’m not making this up) “Chicken Express”. While fixing one rogue entry there is helpful it only nibbles at the edges of the problem - no OSM editing software should be automatically adding tags based on something as flimsy as a generic name like “Chicken Express”. The NSI is still wrong; it’ll suggest this brand anywhere in the US when it only operates in Texas and a couple of neighbouring states.

2 Likes