Users of Latin script have come to expect that for areas where it is not the local script, maps are available that show names of geographic locations in Latin script in addition to the local script so that they are able to read them. This is not controversial, and I think it should be equally non-controversial that users of other writing systems than Latin script are entitled to the same expectation. I am most familiar with Bulgaria Cyrillic script (which is foreign to me, my native language is Dutch): paper maps in Cyrillic of areas Bulgarians commonly travel to are available and Google Maps shows many place names first in Cyrillic and then in local script if if the preferred language is set to Bulgarian.
By this post I would like to start a thread to discuss if and how to implement addition of transliterated names to locations on OSM. This subject has been touched upon in other threads such as
https://community.openstreetmap.org/t/name-ru-bij-nederlandse-plaatsen/78589/24
https://community.openstreetmap.org/t/tool-to-find-and-fix-incomplete-multi-lingual-names/143881/7
https://community.openstreetmap.org/t/tagging-names-transformed-from-one-language-to-another/125144
but has not been discussed systematically afaik.
I would like to address some of the concerns in these earlier discussions here.
Transliteration usually requires local knowledge such as the pronunciation of the name. Cyrillic has been developed for writing the Bulgarian language, so it is adapted to it and is almost completely phonetic. To be able to correctly transliterate a foreign name, its pronunciation needs to be known. Afaik AI (LLM) is limited to written language so does not have the necessary information about pronunciation. Therefore, it needs a human to correctly transliterate a name, and it canāt be automated. I assume this is also true for other scripts that are usually developed to write the sounds of a specific language (Japanese katakana for instance). These correct transliterations need to be stored somewhere, and since OSM aims to store geographic information, it would be a good place to store it. We already store exonyms, so why not transliterations? Or would Wikidata be a better place to store all non-local names?
Transliteration is not translation: the name is still in the local language, but written in a script that is not commonly used for it. The same is often true for place names in other writing systems that are transliterated to Latin script, but that doesnāt stop us from adding it to the map⦠itās so useful for map users that canāt read the local script, so why not help them?
Itās true that there is often no ground truth that can be used to verify correct transliteration as the foreign script is unlikely to be displayed on any sign. I think we have to assume good faith here, and hope that the mapper who added it is familiar enough with both the location (how the name is pronounced) and the foreign script. There are often other mappers with the same familiarity who can check the correctness and adjust if necessary. Just like translation, transliteration is not an exact science and sometimes choices have to be made (whether the place has an exonym in the foreign language or not, to transliterate or translate, which foreign character to use to transliterate a sound, etc.) that can be disputed. That doesnāt mean itās complete anarchy, however: most cases are obvious, there are conventions on how to transliterate difficult cases, and it can be discussed which transliteration is most likely to be understood correctly by map users that read that script. I think the fuzziness of transliteration is similar to that of tagging the surface of an unpaved highway: plenty of discussions here on what the differences are between ground, dirt, compacted and gravel⦠Anyway, if you canāt verify the correctness of a tag, that is not a reason to remove it: Donāt remove tags that you donāt understand
Thatās only a problem if we run out of database space. We could limit the number of names in foreign scripts that are only slightly different (Bulgarian Cyrillic vs. Russian Cyrillic, for instance) by deciding that the Cyrillic transliteration should be in name:ru=* (Russians being the largest group of users of the Cyrillic script) and name:bg should only be added if the spelling is different. There is such an agreement between the Serbian and Bulgarian OSM communities, see here https://community.openstreetmap.org/t/serbian-names-for-bulgarian-places/111488. The same may be possible with other closely related writing systems such as Arabic & Persian, Devanagari & Bengali, Thai & Lao, etc. (I donāt know enough about these scripts to know if this is feasible). We could also agree that if for a certain writing system, several different transliterations are possible, we should add only one: the one most useful to map users. We should also not transliterate alt_name, etc. If a true exonym exists, it should always be preferred.
That can be done, but I donāt see (yet) why it is necessary or useful to make the distinction.
