After a quick search on this forum, I couldn’t find a discussion specifically about this topic. If this has already been covered, my apologies.
I believe most active mappers here are familiar with the concepts of exonym and endonym, but for clarity, here is a brief explanation.
What are endonyms and exonyms?
- Endonym – The name of a geographical feature used by local inhabitants in their native language. For example, Wien for Austria’s capital and Firenze for the well-known Italian city.
- Exonym – A name for a geographical feature used in a different language that differs from the local name. For instance, the capital of Austria is known as Wien in German, but Vienna in English, Vienne in French, and Відень in Ukrainian. Similarly, Firenze in Italian is Florence in English, Florencia in Spanish, and Florenz in German.
Exonyms are especially common for country names, capitals, and major cities. However, there are also many lesser-known geographical features that only have a name in one language.
Examples of endonyms and exonyms in Ukraine
Endonym (name, name:uk) | Exonym (name:en) |
---|---|
Чорне море | Black Sea |
Дунай | Danube |
Димерчин ставок | |
Крим | Crimea |
Закарпаття | Transcarpathia |
Харитонівська сільська громада | |
Київ | Kyiv |
Витягайлівка | |
Цмоки | |
вулиця Леонтія Свічки | |
2-й провулок Сергайовки | |
урочище Попові Корита |
Although OSM Wiki guidelines suggest avoiding transliterated, transcribed, or translated names in multilingual tags, name:en often contains such transformed versions.
For example, here are some names listed under name:en for places in Belarus, even though none of these can truly be considered part of the standard English lexicon:
Nyasvizh
Stowbtsy
Valozhyn
Kletsk
Viliejka
Stolin
Luniniec
Kapyĺ
And similarly for Egypt:
Manial Shiha
Hadayek helwan
Wadi Hof
Ghamaza Al-Kubra
Qiblya
Dahshur
Proposal: Using BCP 47 Extension T Standard for transformed content
The BCP 47 standard allows specifying transformed content, including transliterations, transcriptions, and even translations. Here is a quote from the standard:
Identification of transformed content can be done using the ‘t’ extension defined in this document. This extension is formed by the ‘t’ singleton followed by a sequence of subtags that would form a language tag as defined by BCP47. This allows the source language or script to be specified to the degree of precision required.
(…)
For example:
Language Tag Description ja-t-it The content is Japanese, transformed from Italian. ja-Kana-t-it The content is Japanese Katakana, transformed from Italian. und-Latn-t-und-cyrl The content is in the Latin script, transformed from the Cyrillic script.
And here is how language tags would appear in OpenStreetMap:
name:ja-t-it
name:ja-Kana-t-it
Why іs this needed?
Exonyms vary significantly. Consider different names for Germany:
Name | Etymology |
---|---|
Deutschland (endonym) | From Old High German diutisc (“of the people”), derived from Proto-Germanic þeudō (“people”). -land means “country,” so Deutschland means “land of the people.” |
Німеччина (Ukrainian exonym) | From Old East Slavic нѣмьць (“mute person”), referring to Germanic people who did not speak Slavic languages. |
Germany (English exonym) | From Latin Germania, a term used by the Romans for the lands beyond the Rhine. The origin is unclear but may derive from Gaulish germani (“neighbors” or “twins”). |
Allemagne (French exonym) | Derived from the name of the Alemanni tribe, who lived in southwestern Germany and Alsace. |
These are true exonyms and, in my opinion, belong in name:uk, name:en, name:fr, etc.
However, for a small Ukrainian town like Згурівка, other language versions are actually transformations of the original Ukrainian name.
Using the “t” singleton for transformed names
For such cases, the t singleton can indicate that a name has been transformed from the original language:
name:be-t-uk = Згурыўка
name:ru-t-uk = Згуровка
name:en-t-uk = Zghurivka
name:crh-t-uk = Zhuriwka
name:ro-t-uk = Zgurivka
Benefits and challenges of this approach
This approach ensures more accurate tagging, as a transformed version of Згурівка does not truly belong to Belarusian, Russian, or English linguistic systems. It also helps distinguish between native English names like New York or Main Street and transliterated names such as Khreshchatyk or Zghurivka, which do not form part of the standard English lexicon. However, implementing this system requires data users to adapt their processing algorithms to recognize the -t-<lang_code> extension in tagging keys, which may introduce an initial technical challenge.
Advanced usage: Specifying transformation methods
BCP 47 extension T allow even more precise tagging by indicating the transformation method. This can be useful for historical and cartographic research.
Example: Different Romanization standards for the same Ukrainian place name:
Transformation Method | Tag | Transformed Name |
---|---|---|
US Board on Geographic Names (1965) | name:und-Latn-t-uk-Cyrl-m0-bgn-1965 |
Kam’yanyy Brid |
UNGEGN Standard (2012) | name:und-Latn-t-uk-Cyrl-m0-ungegn-2012 |
Kamianyi Brid |
In conclusion, I believe that my proposal to use the BCP 47 extension T is important, especially for OpenStreetMap, as there are not many databases in the world that are as open and international. Accuracy in specifying languages, alphabets, and such transformed exonyms is essential for the development of mapping and global interaction within the project. This is a topic worth discussing, and I look forward to a constructive discussion. Thank you for your attention!