Name lists in OSM Americana; or: how I learned to love semicolons

Minh_Nguyen · January 9, 2023, 12:50am

OpenStreetMap Americana has joined other prominent OSM data consumers in supporting the semicolon delimiter in names.

I tell you, a cat must have three different names

Normally, keys like name are only for the feature’s primary name in the local language, while other names are relegated to different keys such as alt_name, official_name, and short_name. But sometimes a feature has multiple names of equal standing due to an inconsistency, border, or dispute in the real world. Likewise, names in other languages normally go in language-specific subkeys like name:es for Spanish. But in some places, multiple languages are commonly spoken, so there isn’t a single “default” language.

Historically, mappers in some officially multilingual regions have standardized on various ad-hoc delimiters between these names, such as a slash or hyphen, with the expectation that a renderer would lazily show the entire name tag verbatim. To put a positive spin on it, let’s call this map of mostly the Baltic Sea a celebration of linguistic diversity:

By default, though, Americana does some nicer things with names like tailoring the map to your preferred language. It also appends the name of a city in the local language, giving the map a more cosmopolitan flair and maybe prepping you for your next trip overseas. The main name key is supposed to be suitable for this purpose, but appending the name tag verbatim isn’t an option: if one of the names in it happens to match the name in your preferred language, you’d get a verbose, repetitive label, crowding out other labels for no good reason.

Unfortunately, there’s not much else to do when the name tag forms a list using common punctuation or mere spaces, because these delimiters are so ambiguous:

Semicolons to the rescue

Some 15 years ago, the global community standardized on the semicolon as a uniform, machine-readable character to separate multiple values in a single tag. It was intended to apply to all tags, and data consumers have done lots of things with it in non-name tags. But mappers have adopted it more slowly for name than for other keys, probably due to the effect it has on renderers that label name verbatim.

To show the benefit of a more structured approach, Americana now parses semicolons out of every name tag. When the name:* tag for your preferred language contains multiple names separated by semicolons – say, name:es=Aquí;Allí;Allá – each semicolon turns into something more presentable, such as a line break. The same happens with name if you set your preferred language to an unsupported language like mul (as in “multilingual”), or if the feature is only tagged with multiple values in name but not tagged with a more specific name:*. Your preferred language’s name for the feature only appears once, even if one of the local languages calls it by the same name.

See for yourself

San Francisco has multiple names in Chinese, each of them quite common both locally and abroad. Its name:zh tag contains three names separated by semicolons, which show up if you set your preferred language to Chinese:

In Kaser and New Square, New York, most residents speak Yiddish, so name contains both English and Yiddish. Depending on your preferred language, you’ll see the names in English, Yiddish, and your preferred language laid out appropriately, without repeating a name:

This works for anything that Americana labels – not only places but also parks, airports, roads, and more. This creek in Denmark has two names in name, both elegantly shown together on the map:

This community garden in San José, California, has no Spanish name, so speakers of the city’s second most common language see the names in English and Vietnamese instead:

Tagging for the renderer common good

Americana can only lay out the labels intelligently because the semicolon delimiter is predictable and unambiguous. Unfortunately, ad-hoc delimiters are still much more common than semicolons in names globally and even in the U.S., where there has never been much discussion about delimiters. Hopefully as more data consumers add support for semicolons, mappers will follow suit.

Thanks for everyone who’s already using the standard semicolon to separate multiple values in the name tag in those tricky situations when other name keys just won’t do. This gives renderers and data consumers of all stripes the flexibility to do something a little more useful with the name tag, and now you no longer have to worry about the map looking sloppy.

clay_c · January 9, 2023, 2:21am

This looks fantastic! Thank you so much for your work, Minh.

SomeoneElse · January 9, 2023, 7:27pm

To be honest, I’ve tried rereading this several times and communication isn’t really occurring. I think part of what you’re saying is “sometimes places have more than one name in a particular language, and in those cases it helps to be able to parse a known delimiter out of the ‘name:language’ value”. If so, fair enough - that bit makes sense.

What I don’t understand is how it could possibly apply to the “name” tag. On the “Ash Road; West County Line Road” example you can get those from name:left and name:right, you don’t need to do anything with the name tag.

Taking Derry/Londonderry again as an example, the “name” tag is more than just a combination of two language names (both English BTW**) - it’s a compromise reached by two language communities that is now essentially one name in its own right. How would you suggest that “Londonderry/Derry” is stored so that that form (with the stroke) can be shown to map users?

A slightly different example is this hill in Wales. It has an English and a Welsh name, and currently the “name” tag is used to hold the “name you should probably use for it” value. If you’re suggesting that the name tag shouldn’t have the “name you should probably use”, where should that go? This is probably here in Americana, but doesn’t seem to be shown.

Finally, where would “the value that will normally be seen on roadsigns” go?

** somewhat confusingly If you get the long distance bus to Derry, with is run by Bus Éireann, you’ll see two names on the front - one of the English ones, and the Irish one - but that’s something that can be done in “the name of the bus route” I guess; similarly the name of the local airport has its own name.

Minh_Nguyen · January 9, 2023, 8:17pm

I was being a bit cheeky with a T. S. Elliot/Andrew Lloyd Webber reference. Sorry it went over everyone’s head.

Yes, I avoided a road/border example because of the potential for confusion. It’s true that many examples can be handled by name:left and name:right (or, for routers, name:forward and name:backward). That leaves the question of what to put in name. If you omit name, someone will inevitably fill it in, incompletely. If you add both names to name, the delimiter doesn’t really matter, so a semicolon is reasonable. If you omit name entirely, is it really accurate to say noname=yes? Indeed, an ideal renderer would give the street two separate labels running along either side of the line. In this case, name is a stepping stone for data consumers on the way to using the more specific tags.

There are also cases where the authorities on either side apply their name to both sides, such as this road that runs just inches north of the border between Michigan and Ohio (which once fought a war over the border). To this day, Williams County, Ohio, continues to post street signs on its side of the border calling it County Road T, even though the road lies entirely within Hillsdale County, Michigan, which maintains both sides as Territorial Road. They aren’t being petty: deliverers and emergency responders need to be able to find the addresses on either County Road T or Territorial Road.

Or consider the case I brought up in this openstreetmap-carto issue of a street where two authorities have joint authority over both sides of the street. They disagree about the road name, to the point of posting competing signs up and down the street at regular intervals. Should it go without a name in favor of loc_name and reg_name? If there’s this much outcry about less sophisticated renderers showing semicolons in labels, imagine if the labels went away entirely because of an absent name.

I don’t suggest any change to Londonderry/Derry. As you say, it has become a single compound name. Users benefit from knowing why it’s called Stroke City. No one would call it Stroke City if it were routine for a city to have a stroke in its name.

If the name that English speakers should use is in fact “Twmpa”, borrowed from Welsh, then perhaps “Lord Herefords Knob” should be relegated to alt_name:en? If someone needs to know that Twmpa comes from Welsh, they can consult Wiktionary or Wikidata Lexicographical Data or infer that based on name:cy.

A more extreme example is the situation several years ago when World War II–era German names for places in Poland were being tagged as name:de, even though the politically correct tag would’ve been some variation on old_name:de.

Americana doesn’t label hills yet. Depending on stylistic considerations, it may or may not end up showing local-language glosses on natural features.

I’m not sure how this question is in opposition to what Americana is now doing. The point of local-language glosses is to give some sense of what the signs would say, or more generally, what the locals would say in a moderately formal context.

Kovoschiz · January 9, 2023, 9:06pm

The meaning of semicolon and other delimiters such as space or slash is not always the same. The San Francisco example would be a good example. While I haven’t fully investigated the entire history, you have a Simplified and Traditional form of the older name, and the newer name. The usual meaning of semicolon being a multiple applies to between the older and newer name. Simplified and Traditional are 2 forms of the exact same name, not 2 different names in 2 languages. That partly justifies why the bilingual countries I edit in use a space, to emphasize they are the same, or have the same correspondence. Semicolon would used to add other names of different meanings/concepts if needed.

Minh_Nguyen · January 9, 2023, 9:20pm

The San Francisco example reflects not only a difference between Simplified and Traditional Chinese written forms but also between national naming practices. My understanding is that Mandarin speakers in the PRC and ROC use a coined name with its own history, while Cantonese speakers in Hong Kong use a straightforward transliteration, but Chinese speakers in the U.S. don’t necessarily cut along these lines cleanly.

These distinctions are captured to some extent in subkeys such as name:cmn-Hans and name:yue, but it remains valid for a user to request a map in Chinese (zh). Some software will automatically fall back to a Mandarin name in Simplified Chinese, owing to market realities, but that would be somewhat ironic and politically charged for San Francisco. Last time I checked, even proprietary map vendors in the PRC label it as both 圣弗朗西斯科 and 旧金山.

kucai · January 10, 2023, 2:24am

How do you differentiate what you wrote concerning transliteration? san francisco in chinese doesn’t sound like it even exists. If this case is like what happened in my country, some mapper just put chinese names just for the hell of it because when translated it sounded like the official language pronounciation - basically using chinese letters to pronounce it in another language.

Minh_Nguyen · January 10, 2023, 3:10am

Like many languages, the Chinese languages have their own special names for some famous places. For everywhere else, there’s a standard transcription table mapping phonetic sounds to characters. Apparently each Chinese-speaking country has its own standard.

What’s tricky about San Francisco is that Hong Kong has largely abandoned the traditional name in favor of a phonetic transcription, but the PRC and ROC have not. So here in the San Francisco Bay Area, you can see both names on signs, on TV, etc. (The U.S. government doesn’t regulate the English language, let alone Chinese.)

You might want to check whether the name:zh tags in your country can be found in a dictionary, adhere to a particular language standard, or are just one mapper’s whim. Sometimes language enthusiasts strive for language coverage a little too zealously. That said, I have noticed that “eyeballing” a phonetic transcription is common practice among South Asian languages, both in OSM and elsewhere. Whether it’s right or wrong is not really for me to judge, since every language does things so differently in reality.