Tagging names transformed from one language to another

hoserab · February 8, 2025, 2:51am

Fair. But what I was really getting at was perfectly summarized by Minh:

There’s a big difference between

name:ko-Latn=Baekdusan
name:zh-Latn-pinyin=Chángbáishān
and
name:en-t-ko=Baekdu Mountain
name:en-t-zh=Changbai Mountain

The former two are the transliterations from the original writing systems to the Latin alphabet; the latter two are “English names” half-transliterated from their respective source languages. However, none of these is what the mountain is actually (predominantly) called in English.

As I wrote in my first reply, I think a big part your original premise is sound: there are some name:en=* values (and probably the case with many other languages) that are currently in the OSM data that probably shouldn’t be tagged as such, because they’re merely a transliteration, translation or other such form of transformation from the original endonym. That’s worth identifying in some manner; your proposed solution could be workable for such a purpose. But as I cautioned earlier, there are many perfectly valid name:en=* values in the data that you may mistakenly conclude “shouldn’t be recorded in English”. It’s hard to tell sometimes what’s genuinely “the name in English” and what’s not, and especially prone to error if your default presumption is that a transliterated name is probably not “the name”.

As pointed out earlier, I think name:en=Kapyĺ for this small Belarusian city is a great example of something that is obviously not true, and quite blatantly is just a transliteration from the original Cyrillic script to Latin. I would wager almost every English-speaker in the world has no idea what an ĺ would sound like. Amusingly the int_name, name:de and name:fr tags are all Kapyl, sans the acute accent, and the city itself is tagged as the administrative seat of the “Kapyl District”. Like I wrote earlier, the acute accent is a giveaway that this is categorically not “the name in English”: English uses some diacritical marks, especially for foreign names and loanwords, but I can’t think of a single use of an acute accent on an ‘l’ or any other consonant. (In fact, pretty much the only accent marks one sees on consonants are the c-cedilla in words like façade and the n-tilde in words like piñata.)

I agree with your “more information is better” and “give the data consumer more flexibility” principles, but still don’t really see an ultimate usable purpose. Your example of tagging Ukrainian names for places in Crimea based on either Russian or Crimean Tatar sources is kind of fascinating, as a piece of etymological trivia for someone who doesn’t speak any Ukrainian (or Russian, or Crimean Tatar), but… how would a data consumer make practical use of it? Not to say it couldn’t be done, and maybe for political or cultural reasons in places like Ukraine (or more specifically Crimea) app functionality that could pick and choose between the etymologies to display could be usable/desirable/popular, but from my perspective it’s hard for me to picture any real-world use. If anything, it may lead to the kind of misconceptions that Minh pointed out with respect to displaying both “Changbai Mountain” and “Baekdu Mountain”: it makes it seem as though both are commonly or interchangeably used in English, when that’s not really true.

Apologies for not giving this a ringing endorsement; I do actually think this is otherwise a fascinating idea.

darkonus · February 8, 2025, 9:11am

I’d like to clarify everything once again using English as an example. My idea is not to “clean up” the space of English names but to refine it. There will be name:en, which, broadly speaking, represents the internal part of English. And there will be categories of name:en with markers indicating that these names have been transformed from other languages.

However, both of these categories belong to the English language space, as both keys start with name:en.

It would look something like this:

Original English names
Main Street
Cherry Hill
Lakewood
Salt Lake City
New York
Greenwich

And:

Local names	Transformed names in English
ꦔꦪꦺꦴꦒꦾꦏꦂꦠ	Yogyakarta
凉水镇	Liangshui
بنها	Banha
Hà Nội	Hanoi

Whether this is a good idea or not is up to each person to decide. But it’s definitely worth discussing, so thank you for sharing your thoughts!

darkonus · February 8, 2025, 8:21pm

I don’t quite understand your point. Let me walk you through my interpretation of this name, and you can point out where I’m mistaken.

First, the original name in the Chochenyo language: Máyyan ’Ooyákma. Then, after the dash, we have the English name: Coyote Ridge. Finally, there’s a descriptive or status part: Open Space Preserve, also in English.

Please explain, in as much detail as possible, why the full name should be included in name:en. There should be a clear criterion for this, not just a preference.

hoserab · February 8, 2025, 10:01pm

How do you make this distinction? For example, to me the name Hanoi is no less “an internal part of English” and no more “transformed from another language” than the name Greenwich, from the Saxon Grenewic, or (New) York, from the Old Norse Jórvik. Hanoi isn’t merely a transliteration or other ‘transformation’ of Hà Nội into the English alphabet: Hanoi is the city’s name in English. That is its “English name”.

I don’t know much about Máyyan 'Ooyákma – Coyote Ridge Open Space Preserve, but I can easily believe that is its name in English if that’s what English speakers commonly call it.

darkonus · February 9, 2025, 12:25am

If English speakers have recently taken a name from another language and changed it for convenience (transcription, transliteration, simplification, adaptation, translation), then it is a transformed name.

However, if a name has long been established in English, changing “from within”—it belongs to the internal part of the language.

Here are examples of internal Ukrainian (latn) names: Avdiivka, Andriivka, Kamianske, Chornomorske, Mykhalchyna-Sloboda, Holubivka, Zhmerynka, Chaplynka, Okhtyrka.

And transformed names in Ukrainian (latn):
Karachi, Stambul, Kinshasa, Lakhor, Mumbai, San-Paulu, Tiantszin, Ukhan, Tokio, Dunhuan.

There is a clear distinction between these two types of names.

hoserab · February 9, 2025, 1:27am

It’s not very clear to me.

The distinction you are making seems to me to be “names of things in Ukraine” and “names of things outside Ukraine”, or “names of Ukrainian etymological origin” and “names of things not of Ukrainian etymological origin”. Which is… fine, I guess, but it doesn’t take a rocket surgeon to deduce that “То́кіо” is etymologically of Japanese origin. However, that does not mean “То́кіо” is not its name in Ukrainian. You don’t need a name:uk-t-ja tag extension to tell you the name of the capital city of Japan comes from Japanese, do you?

How “recent” is “recent”? When does a name “truly” enter into a language’s lexicon? Sorry, but this seems ridiculously subjective. Maybe this distinction is more obvious or culturally relevant for a Ukrainian, but as an English (and French) speaker, this distinction is not relevant to me at all. Maybe that’s what I and others are most struggling with here: I just don’t understand why and how this distinction is important.

hoserab · February 9, 2025, 1:36am

I’m trying to think of ways to explain my point of view using a real-life example and maybe this will work:

I come from Alberta, Canada. In northern Alberta there is a lake called Lac La Biche. The etymology of this name is French; “Lac” = lake, “La Biche” = doe ( a deer, a female deer ). The English name of this lake is not “Doe Lake”: the English name is Lac La Biche.

Minh_Nguyen · February 9, 2025, 1:53am

Just as in Ukrainian, an English toponym very often consists of a specific paired with a generic. It’s also acceptable to omit the generic when the context is abundantly clear, but name=* doesn’t necessarily reflect this informal shortened name.

The feature in question is a nature preserve surrounding a ridge that’s known as Coyote Ridge in English. Because parks are human organizational constructs, we pay a little more attention to their official names than for natural features. The reserve’s owner made the decision to officially rename the preserve from “Coyote Ridge Open Space Preserve” to “Máyyan 'Ooyákma – Coyote Ridge Open Space Preserve”, redundantly prepending the Chochenyo name of the ridge. The ridge is still Coyote Ridge in English and Máyyan 'Ooyákma in Chochenyo.

One could argue that this is only the official_name:en=*, and that common usage still favors keeping “Coyote Ridge Open Space Preserve” as the name:en=*. However, the people involved with the preserve included the Cochenyo name very intentionally, and I don’t know of any controversy regarding that change. For a map that otherwise has all the formal names, selectively omitting the Cochenyo part of the English name would surprise visitors as much as truncating the name to “Coyote Ridge” on the map. “Coyote Ridge Open Space Preserve” is currently tagged as alt_name:en=* rather than old_name:en=*, out of recognition that people will sometimes skip the parts of a name that they don’t know how to pronounce or type.

Sometimes we do simplify the name for practical reasons. I’ve been tempted to shorten “The Heekin Family/PNC Grow Up Great Adventure Playground” to just “Great Adventure Playground” based on what people have been shortening it to online, despite the signs, but it’s really a gray area. I’m not sure everyone who passes by even knows whether it’s “Great Adventure Playground” or just “Adventure Playground” after removing the names of the sponsors (the Heekin Family and PNC’s Grow Up Great initiative). A lot of parks and other recreational facilities are like this.

Right, and this is one end of a spectrum. On the other end, many English speakers write “Huế” to avoid confusion with “hue”. In the middle, places can take a variety of forms, like Đà Nẵng (Da Nang, Danang), Điện Biên Phủ (Dien Bien Phu, Dienbienphu, Dienbien Phu), and even Vietnam (known as Viet Nam in diplomatic contexts). Anglicization is a process, not an event. These names at every step of the process are all English names by virtue of being used in English.

You want a clear criterion, rather than a preference, but your preference for a clear criterion is at odds with the local English-speaking population. As I said, English liberally borrows from other languages, not always very cleanly. Because the English language is almost completely unregulated, Anglicization is neither deterministic nor rule-based, nor is de-Anglicization. The transformation codes you’re proposing presuppose a clear distinction between pure and impure English that doesn’t generally exist in practice. I do not deny the existence of such a phenomenon in Ukrainian, however.

alan_gr · February 9, 2025, 8:32am

It might be helpful if you could explain how your concept would work for English for well known examples such as Beijing and Mumbai, where the commonly used English name has changed during my lifetime.

darkonus · February 9, 2025, 8:43am

It seems like I’m starting to understand why my idea is facing such strong resistance. This issue will quickly become very political in communities when name:fr has to be left as is, but name:en needs to be specified as name:en-t-fr. When name:es has to be left, but name:en needs to be specified as name:en-t-es.

The Ukrainian community will also “split” when the same thing happens—when for some settlements in Ukraine, name:ru has to be left as is, but name:uk needs to be specified as name:uk-t-ru.

Now I understand that the proposal for these keys will be pointlesss.

darkonus · February 9, 2025, 10:30am

@alan_gr, at your request:

Beijing naming scheme

name = 北京市
name:zh = 北京市
name:zh-Hans = 北京市
name:zh-Hant = 北京市
name:zh-Latn-pinyin = Běijīng shì
alt_name = 北京
alt_name:zh = 北京
old_name = 北平
old_name:zh = 北平
 ├── name:en-t-zh = Beijing
 ├── alt_name:en-t-zh = Peking;Peking Municipality
 └── old_name:en-t-zh = Peiping Municipality;Pei-p'ing Shih;Peiping Municipal Administrative Area

Mumbai naming scheme

name:mr = मुंबई
 └── name:en-t-mr = Mumbai

old_name = Bombay
old_name:en = Bombay

alan_gr · February 9, 2025, 10:47am

Thanks. Would these also still have the existing name:en tags, not shown in your examples?

darkonus · February 9, 2025, 10:56am

No, there are no additional name:en tags that aren’t shown in my examples. I even added old_name:en = Bombay in the Mumbai naming scheme

alan_gr · February 9, 2025, 11:03am

So from the mapper point of view, a mapper who knows the city is called Beijing in English can no longer tag that? They would have to know what language it originated from, and understand the more complex tagging. And the same for mappers tagging exonyms in all other languages.

And from the data user point of view: maps rendering names in English would be broken in the short term. They would need to search for all tags of the form name:en-something. And if there is more than one they would have to break the tie somehow.

Have I got that right? It seems like a lot of disruption even where there are no politcal complications

darkonus · February 9, 2025, 11:10am

Of course, mappers will still be able to tag name:en. The tag with the extension is for those who are sure about which language the name originates from and want to specify this in OSM.

Provided that these tags spread slowly and developers gradually adapt their processing systems, there shouldn’t be any issues. A few name:en-t-<lang_code> tags are quite a unique case that data users would indeed need to handle separately.

Jarek · February 9, 2025, 4:16pm

I expect that you will see a chicken-and-egg problem: mappers will be mostly unwilling to remove name:en until data consumers do support name:en-t-<lang>, but data consumers won’t bother supporting name:en-t-<lang> tags that have very few uses.

darkonus · February 9, 2025, 4:21pm

To bypass this, we can leave the tag name:<lang_code> for the object and add the same value in name:<lang_code>-t-<lang_code>.

darkonus · February 9, 2025, 4:51pm

So, to summarize:

Exonyms exist.
The nature of most exonyms is transformation, which is an established linguistic phenomenon.
For most exonyms, it is possible to clearly determine the source and the language from which they were transformed.
A standard for describing such transformations has existed for 13 years.
Even though this information is not important to everyone, OSM contains a lot of data that is only relevant to specific user groups.
The question of the linguistic origin of names is as politically sensitive as the names themselves.
Making assumptions can easily introduce inaccurate data about the linguistic origin of a name.

I remain convinced that the tags I proposed have the right to exist—and, in fact, they already do.

UPD: For your interest — mapping and fast rendering in QGIS of Ukrainian exonyms derived from Māori place names (highlighted in red) and Ukrainian exonyms originating from English:

mueschel · February 19, 2025, 8:54am

I think it’s reasonable to add these transformed names for major features well known in different countries, like names of cities or mountains.
On the other hand, I highly doubt the usefulness of such tags on tiny POIs like little fast food restaurants like here:

While the shop might possibly have a version of their name transliterated to the Latin alphabet, I doubt that it has an English name and much less a name transformed from Turkish to English while being located in Ukraine.

darkonus · February 19, 2025, 12:55pm

Thank you for the example. At the time I made the change, it seemed to me that the chosen tag was correct.

As a counterexample, I would like to present this restaurant node, where the tag I proposed is appropriate:

name	СТУМАРІ
name:ka	სტუმარი
name:uk	СТУМАРІ
name:uk-t-ka	СТУМАРІ

Photo of the signboard:

Screenshot of the packaging with the inscription: