Tagging names transformed from one language to another

Senihtu · February 3, 2025, 8:50am

Es wurde nach Beispielen zum Ausprobieren gefragt.
Hier ist eines: Way: ‪Добровольск‬ (‪54690823‬) | OpenStreetMap

Hier hat man

einen russischen Namen transkribiert in de, li und pl
einen vermutlich prußischen(sic!) alten Namen übersetzt in li,de und pl
und dann kenne ich jemanden, der dort 1944 geboren ist. Da steht im (deutschen) Paß “Schloßberg”. Der Name geht auf das 16 Jahrhundert zurück. Er ist - da er noch im Pass steht! - in Gebrauch. Er fehlt bisher in OSM.

Ich tagge so etwas nicht - das wird schnell politisch und meine Zeit für OSM kann ich anderweitig sinnvoller nutzen.
Aber wer möchte kann das gerne ausprobieren

Kovoschiz · February 3, 2025, 9:25am

Yes, there’s a problem of users adding unverified Cantonese romanization directly to name:en= here, and even the endonym may be dubious (historical unused, personal or friends made-up names, disagreements between hikers). There have been related edits wars, to the extreme of a press conference being held and reported in the news. So I have been using name:en-t-zh= / name:en-t-yue= to discourage people from directly adding the name:en= again without thought. These are not name:yue-Latn= , as seen from the presence of some English words.

darkonus · February 3, 2025, 12:46pm

Yes, my scheme will indeed improve the object’s data in OpenStreetMap. Instead of using a semicolon as a separator in old_name:de, there will be two distinct old names in German:

One from the era of Nazi toponymic Germanization: old_name:de = Schloßberg
Another, tagged as a transformation from Prussian: old_name:de-t-prg = Pillkallen

The hierarchical relationships will be as follows:

name = Добровольск
 └── name:ru = Добровольск
      ├── name:de-t-ru = Dobrowolsk
      ├── name:en-t-ru = Dobrovolsk
      ├── name:pl-t-ru = Dobrowolsk
      └── name:lt-t-ru = Dobrovolskas

old_name:de = Schloßberg

old_name:prg = ?
 ├── old_name:de-t-prg = Pillkallen
 ├── old_name:lt-t-prg = Pilkalnis
 └── old_name:pl-t-prg = Pilkały

Minh_Nguyen · February 3, 2025, 2:19pm

Yes, the assumption is that this key is only necessary when the pronunciation can’t reliably be inferred somehow from the name. In other words, it’s only for the exceptions. Text-to-speech engines can already handle the rest relatively well.

Probably the first time a news article ever mentioned OpenHistoricalMap:

https://www.hk01.com/即時體育/761319/行山-共享地圖openstreetmap爆改地名大戰-山友憂慮增加意外

not:name=* might be applicable in some of these cases. There’s already about as much tooling as you’d expect for that key. The main difference is that not:name=* and its subkeys intentionally don’t get indexed by geocoders.

darkonus · February 4, 2025, 4:41pm

Today, I spent several hours classifying the various foreign-language names of the Paektu / Baekdu / Changbai / Golmin Sanggiyan Alin volcano on the Chinese-North Korean border. It was quite interesting but also exhausting—and probably a little incompetent on my part :).

The hypothesis that anything that cannot be properly tagged often ends up in alt_name is proving to be true. I managed to extract all the “extra” names from alt_name—names that mappers couldn’t assign to separate tags due to the limitations of our exonym tagging system. alt_name is a great tag, but right now, it serves as a graveyard for names that land there solely because of the imperfections in OpenStreetMap’s name-tagging system:

name = 长白山 백두산

name:mnc = ᡤᠣᠯᠮᡳᠨ ᡧᠠᠩᡤᡳᠶᠠᠨ ᠠᠯᡳᠨ
 ├── name:uk-t-mnc = гора Голмін Санґіян
 ├── name:sk-t-mnc = Golmin Šanggijan Alin
 └── name:cs-t-mnc = Golmin šanggijan alin

name:ko-Hani = 白頭山
name:ko = 백두산
 ├── name:ru-t-ko = Пэктусан
 ├── name:sk-t-ko = Päktusan
 ├── alt_name:ru-t-ko = вулкан Пэктусан
 ├── name:uk-t-ko = Пектусан
 ├── alt_name:uk-t-ko = Пекту
 ├── name:ja-t-ko = 白頭山
 ├── name:cs-t-ko = Pektusan
 ├── name:en-t-ko = Paektu Mountain
 ├── alt_name:en-t-ko = Baekdu Mountain
 ├── name:zh-t-ko = 白头山 
 ├── name:zh-Hans-t-ko = 白头山 
 ├── name:zh-Hant-t-ko = 白頭山 
 ├── name:zh-Latn-pinyin-t-ko = Báitóu Shān
 ├── name:lzh-t-ko = 白頭山

name:zh = 长白山
name:zh-Hans = 长白山
name:zh-Hant = 長白山
name:zh-Latn-pinyin = Chángbái Shān
name:lzh = 長白山
 ├── name:ko-t-zh = 장백산
 ├── name:uk-t-zh = гора Чанбай
 ├── name:sk-t-zh = Čchang-paj-šan
 ├── name:ru-t-zh = вулкан Чанбайшань
 ├── name:cs-t-zh = Čchang-paj šan
 └── name:en-t-zh = Changbai Mountain

# Struggle to attach
name:vi-t-zh = Núi Bạch Đầu
alt_name:uk-t-zh = Байтоушань
alt_name:ru-t-zh = вулкан Байтоушань

If we imagine this diagram being generated automatically, some exonyms would struggle to attach to any specific branch due to the existence of two separate branches for Chinese names: one for native Chinese and another for a Chinese version of the Korean name. This isn’t necessarily a bad thing, as the tags themselves remain accurate and effectively describe the data they contain.

Original name tags:

alt_name = Paektu-san;Paektu;Baekdu;Changbai Mountain;Baekdu Mountain
alt_name:cs = Čchang-paj šan;Golmin šanggijan alin
alt_name:en = Baekdu Mountain;Changbai Mountain
alt_name:lzh = 白頭山
alt_name:ru = вулкан Байтоушань;вулкан Чанбайшань;вулкан Пэктусан
alt_name:sk = Čchang-paj-šan;Golmin Šanggijan Alin
alt_name:uk = Пекту;Байтоушань;гора Чанбай;гора Голмін Санґіян
alt_name:zh = 白头山
alt_name:zh-Hans = 白头山
alt_name:zh-Hant = 白頭山
alt_name:zh-Latn-pinyin = Báitóu Shān
name = 长白山 백두산
name:cs = Pektusan
name:en = Paektu Mountain
name:ja = 白頭山
name:ko = 백두산
name:ko-CN = 장백산
name:ko-Hani = 白頭山
name:lzh = 長白山
name:mnc = ᡤᠣᠯᠮᡳᠨ ᡧᠠᠩᡤᡳᠶᠠᠨ ᠠᠯᡳᠨ
name:ru = вулкан Байтоушань (Пэктусан)
name:sk = Päktusan
name:uk = Пектусан
name:vi = Núi Bạch Đầu
name:zh = 长白山
name:zh-Hans = 长白山
name:zh-Hant = 長白山
name:zh-Latn-pinyin = Chángbái Shān

Minh_Nguyen · February 4, 2025, 6:41pm

If alt_name=* seems like a “graveyard”, it may be because few if any rendered maps label features with alternative names. But alt_name=* is very well supported among geocoders, likely better supported than these -t- extension subkeys. On the other hand, if you’re focused on “representing truth” rather than practical usage by data consumers, then I don’t see how it would be a graveyard, since the tag is just as accessible as any other name tag in the raw data.

Thanks for bringing this to my attention. It’s a good example of the complex state of Vietnamese exonyms. Here are the most common names for this mountain (with demotic Han in parentheses), ordered from most to least traditional:

núi Trường Bạch (𡶀長白): native Vietnamese núi (𡶀, “mountain”) + Trường Bạch (長白), Sino-Vietnamese transcription of Chinese 長白山 (= Sino-Vietnamese Trường Bạch sơn, “as-far-as-the-eye-can-see white mountain”)
núi Bạch Đầu (𡶀白頭): native Vietnamese núi (𡶀, “mountain”) + Bạch Đầu (白頭), Sino-Vietnamese transcription of Sino-Korean 백두산/白頭山 (= Sino-Vietnamese Bạch Đầu sơn, “white head mountain”). Đầu is also considered a native Vietnamese word for “head”.
núi Paektu: native Vietnamese núi (𡶀, “mountain”) + Paektu, McCune–Reischauer transcription of Sino-Korean 백두산/白頭山 (= McCune–Reischauer Paektusan, “white head mountain”)

Sino-Vietnamese is properly defined as the variant of literary Chinese associated with Vietnam (lzh-Latn-VN or lzh-Hani-VN). Individual Sino-Vietnamese words are read (pronounced) like Vietnamese words, but the grammar is archaic Chinese, closer to modern Chinese (zh) than Vietnamese. For example, the character for “mountain” comes first in the Vietnamese names but last in the Chinese, Sino-Korean, and Sino-Vietnamese names.

For most parts of the world, people in Vietnam regard Sino-Vietnamese exonyms as quaint, even obsolete, if they understand them at all, whereas overseas speakers regard non-Sino-Vietnamese exonyms as lazy code-switching, if they understand them at all. However, it’s even more complicated in the Sinosphere.

Historically, Vietnamese speakers used Sino-Vietnamese exonyms for everything Korean, but in the last few decades, people in Vietnam have switched to Revised Romanization for South Korea as the two countries have normalized relations. This sometimes bleeds over into using Revised Romanization for North Korean names too, just for simplicity, but Sino-Vietnamese is more common, and there’s the occasional McCune–Reischauer as in núi Paektu. Often Vietnamese speakers just use whichever name happens to be handy, regardless of its etymology. On the other hand, some writers embed the unadulterated Sino-Vietnamese name directly into Vietnamese text, for a more literary feel.

How to boil all these nuances down into a seven-character code? You can’t.

Jarek · February 4, 2025, 6:57pm

These look/sound like transcriptions of the Chinese name “Báitóu Shān” that is derived from the Korean name - so name:uk/ru-t-zh-t-ko?

darkonus · February 4, 2025, 6:57pm

Thank you for the wonderful and detailed message about nuances I wasn’t aware of. Perhaps my “slightly” incompetent research should simply be called “incompetent.” But it’s exactly this kind of message from you that shows how professionally members of our community can approach issues like naming.

I also fully agree that the tags I proposed won’t be able to capture the complexities of name evolution in the world.

That said, wouldn’t you agree that transferring this data into more appropriate tags is a worthy task for a mapper?

alt_name = Paektu-san;Paektu;Baekdu;Changbai Mountain;Baekdu Mountain  
alt_name:cs = Čchang-paj šan;Golmin šanggijan alin  
alt_name:en = Baekdu Mountain;Changbai Mountain  
alt_name:lzh = 白頭山  
alt_name:ru = вулкан Байтоушань;вулкан Чанбайшань;вулкан Пэктусан  
alt_name:sk = Čchang-paj-šan;Golmin Šanggijan Alin  
alt_name:uk = Пекту;Байтоушань;гора Чанбай;гора Голмін Санґіян

Apologies for the inappropriate comparison of alt_name to a cemetery—I was too harsh.

darkonus · February 4, 2025, 6:59pm

Possible, but it doesn’t seem elegant to me to describe such nested transformations.

Minh_Nguyen · February 4, 2025, 7:24pm

Personally, I feel more comfortable researching etymologies in detail for Wiktionary or Wikidata than for OpenStreetMap or OpenHistoricalMap. I had something like eight tabs open to various Wikimedia projects in order to research just this mountain in Vietnamese. Wikimedia’s projects have well-established practices around citing sources and documenting discrepancies among them. There’s also something liberating about being able to write about something in sentences or creating structured linguistic data, as opposed to the cryptic haiku of IETF language tags.

It sounds like you’re only interested in recording the list of languages that contribute to a given etymology, rather than the full etymological detail. This reminds me of how some mappers only want to indicate the existence of some past rail infrastructure, ignoring all the other essential information about a railway’s history.

If your goal is to put one set of Ukrainian names on an even footing with another set of names in OSM, I don’t think linguistic taxonomy or cladistics is going to advance this goal very effectively. If you don’t find alternative base keys like alt_name, nat_name, and official_name to be acceptable, how about the OSM-centric approach we ultimately took for Gacería with name:gaceria=*?

darkonus · February 4, 2025, 8:02pm

If I’m being honest, I didn’t quite understand your assumption here. Are you referring to mass name changes? I have no such plans, neither for Ukrainian names nor for any others, and I’m actually against that. I was simply interested in the idea of tagging transformed toponyms and started this discussion to explore it. Along the way, it became clear that this approach could help solve certain data-related issues.

For example, I believe we should free mappers from situations where they have to choose which of three equally valid English exonyms to put in name:en and which ones to list in alt_name:en separated by semicolons. You see, sometimes having parallel names is just a natural characteristic of certain places.

darkonus · February 4, 2025, 8:41pm

Ugh. It seems I have understood my mistake. It looks like “Equally valid exonyms” is not about this mountain. In the example with the mountain, I rushed to extract names from the alt_name space, even though mappers had reasons to record the names there. But still, this does not justify the existence of a semicolons

darkonus · February 5, 2025, 9:49am

Several different variants of automatic rendering that would become possible if we had parallel data for name:en-t-ko, name:en-t-zh, name:uk-t-ko, name:uk-t-zh.

darkonus · February 6, 2025, 7:08am

Thank you all for the discussion, examples, and feedback. I’m planning to take the next step and turn this into an official proposal. Even if it doesn’t get full support, going through the process will be a valuable experience for me. Plus, I hope this idea sparks interest among more OpenStreetMap contributors!

hoserab · February 7, 2025, 2:06am

Respectfully, I don’t understand what you believe this really improves over name:<language>. We can see the transliterated names from Korean and Chinese simultaneously in this example, but other than this being an interesting piece of trivia, what purpose does it serve? It solves a ‘problem’ in that it allows for the simultaneous display of two transliterated/translated names, but why would I want that? Selfishly as an English speaker, as an English-speaking user of OSM data, I don’t really care what the name of the mountain is in any other language than English.

This discussion still really centres on this:

In the case of that mountain at (or near, depending on one’s political beliefs…) the Chinese-North Korean border, in English most of us would call it “Paektu Mountain”. Paektu despite being an old transliteration (that’s why ‘Baekdu’, the more modern transliteration, is an alt_name:en) has much more currency in English than the Chinese “Changbai”. Pretty much no native English-speaker, if they’ve even heard of this mountain, would call it “Changbai Mountain”. It’s a transliteration that many see as simply the Chinese government’s attempt to advance the global use of a name based on their language, rather than the Korean, solely for their own political purposes.

It’s maybe a fascinating etymological exploration, but is this serving any real purpose for use on a map? Again, other than being explicit about transliterated names, e.g. name:ko-Latn=Baekdusan, name:zh-Latn-pinyin=Chángbáishān, I don’t care to see transliterated names of both Korean and Chinese origin in English on an English ‘version’ of the map: I only care what we call it in English.

peanuthole · February 7, 2025, 2:44am

I think the proposal could very well be an addition to the current naming scheme. The name:en of the mountain would still be Paektu, but all the alternative names could benefit from a richer tagging scheme

darkonus · February 7, 2025, 6:51am

Well, for example, I personally don’t care about roof colors or material types of buildings. Or about having old tree stumps on the map. But those tags exist anyway.

The Chinese name is just as ancient as the Korean one. However, it is possible that the Korean variant is indeed more widely used in English due to political reasons. After all, I chose this volcano merely as an example and do not claim accuracy in tagging.

For the vast majority of people using maps, this is indeed irrelevant. However, for those who create cartographic products—including OpenStreetMap contributors as well as developers of applications and services—this is far from insignificant. One example is the choice made by many local communities to display names in multiple languages in the name tag using a separator. This is, in fact, how the name of this volcano is recorded: first in Chinese, then in Korean. Contributors aim to avoid making a political choice between the two languages.

A good example of how such cases could be resolved in OpenStreetMap is New Zealand, where many geographical names exist in both their original English versions and traditional Māori names.

name = Aoraki / Mount Cook

name:en = Mount Cook
 └── name:uk-t-en = гора Кука

name:mi = Aoraki
 └── name:uk-t-mi = Аоракі

In summary, we will get the following advantages:

OpenStreetMap contributors who know a given language can either select one primary exonym and mark the others as alternative names or place two or more exonyms on equal footing. In both cases, the data will clearly indicate which exonyms originate from which languages.
Data users and developers will be able to flexibly utilize both the information on exonym priority and, most importantly, the information on the languages from which the exonyms originate.

Minh_Nguyen · February 7, 2025, 4:02pm

The mockup labeling the two names as “Chinese” and “Korean” struck me as misleading. It’s accurate to indicate that one name is Chinese-derived and the other Korean-derived. It’s also accurate to indicate that one is preferred in China while the other is preferred in North Korea, as your flag mockup did. But appending the English word “mountain” makes both names English, no longer “Chinese” or “Korean”. One of these names is far and away more common than the other among English speakers, so name:en=* and alt_name:en=* have obvious values. If people in these countries prefer one name or the other when speaking English, name:en-CN=* and name:en-KP=* might also be applicable.

If name:en=* is for the user who prefers English and name:en-KP=* is for the user who prefers North Korean English, then name:en-t-ko=* is for the user who is trying desperately to rid their life of any vestige of the Chinese language. Unluckily for them, 백두산 is Korea’s reading of a literary Chinese name, 白頭山, which is also sort of Korean (hanja). Literary Chinese is a written interchange medium that multiple cultures interpreted based on their own local spoken languages, kind of like how metric unit symbols are universal but are read differently depending on the culture.

darkonus · February 7, 2025, 4:52pm

Thank you for your critique. Obviously, when creating these connection schemes, my main goal was to illustrate the idea rather than provide definitive conclusions. As a mapper, I would never make edits in this part of the world based on superficial assumptions about the origins of place names.

The language I know best is Ukrainian. Consequently, I would feel quite confident mapping Ukrainian exonyms. In the example above with New Zealand, I am almost 100% certain that the tags I chose for the Ukrainian names derived from English and Māori are correct.

Perhaps this would be better? Note that the mountain icon is pressed. These additional labels might not have been visible until it was clicked.

Minh_Nguyen · February 7, 2025, 6:00pm

“From Chinese” and “From Korean” are more accurate at least. It would still feel a bit odd to me as a user, unless names in general are annotated with similar etymologies. But that’s a design decision for the application developer to make. There are certainly uses for that sort of presentation. A biblical scholar might prefer a specialized map of the Holy Land blanketed with annotations comparing the classical and modern names of places, as a sort of critical apparatus.

Maps for a more general audience typically make country-based distinctions on features of international interest, but not always explicitly. For example, most English-language maps label this territory as “Falkland Islands (Islas Malvinas)”. Even if the map generally only labels places in English, it will pointedly include “Islas Malvinas”, the Spanish-language name preferred by Argentina, for diplomatic reasons. For this purpose, name:es=* and perhaps int_name:AR=* would be more informative than name:en-t-es=*. (Since any discrepancy in an international or official name would be a matter of diplomacy rather than linguistics, should we allow int_name=* and official_name=* to be qualified by ISO 3166 country codes instead of IETF language tags?)

For what it’s worth, New Zealand’s English speakers often hybridize the names of features significant to Māori heritage, blurring the line between an “English” name and a name used by English-speakers:

Taking your example at face value, I understand that Ukrainian speakers consider the Māori- and English-derived names to be equally valid but never say both names in tandem. This is a situation that I would’ve used a semicolon-delimited name:uk=* tag for. Perhaps your suggested name:uk-t-mi=* and name:uk-t-en=* tags would clarify the situation further, but it would make more sense if there’s a phenomenon of Ukrainian speakers systematically preferring one source of Ukrainian names over the other, rather than a one-off exception for Aoraki/Mount Cook.

In practice, American English speakers don’t tend to use that particular hybrid name. Few of us have any proficiency in Māori, so the hybrid names seem more unwieldy to us. However, New Zealand’s practice of hybrid naming has started to enter official usage here, with some parks’ English names being changed to incorporate local indigenous names. These indigenous names aren’t necessarily exactly equivalent. For example, Máyyan 'Ooyákma – Coyote Ridge Open Space Preserve is the English name of the park, full stop. Máyyan 'Ooyákma comes from the Chochenyo language, but when the local Muwekma Ohlone people say this name, they’re most likely referring to Coyote Ridge, not the open space preserve that surrounds it. Meanwhile, many local English speakers still say “Coyote Ridge Open Space Preserve” as shorthand. Should there be a name:en-t-en=* for that shorthand, or would some *_name=* variant be good enough?