Multilingual names in Bulgaria

rhhs · May 10, 2023, 2:44pm

I like the picture (it’s used on Romanization of Bulgarian - Wikipedia) because it nicely illustrate how not to treat personal names of foreigners (Sofia municipality did it wrong, and they replaced the sign later) and how to do treat names used by businesses. I could take a picture of how it’s now (I live 10 min walk from that junction), but it doesn’t have the Happy advertising any more so it won’t be as illustrative.

rhhs · May 10, 2023, 4:06pm

Is this a better illustration picture? I think not.

Dimitar155 · May 10, 2023, 8:59pm

“Petofi Sandor” might be a good replacement for that photo if you need one (Google Maps).

rhhs · May 11, 2023, 5:24am

That’s a nice one! The Latin script one also follows the Hungarian standard of putting the last name first. But the sign below it is an issue: Прага transliterated would be Praga, but instead Praha is written, which is what it would be in Czech. So maybe the “don’t transliterate” rule should not only be applied to foreign person names, but also to foreign geographic names? Maybe not: Google Maps and Google Maps

plamen · May 11, 2023, 3:45pm

Sofijska obshtina se e olyala malko. Oswen da sa malko stari tabeli otpredi zakona:
Чл. 9. (1) Имената на исторически личности и географските имена от съвременни чужди книжовни езици, които използват версия на латинската азбука, се изписват в оригиналния им вид.

(2) Имената на исторически личности и географските имена от съвременни чужди книжовни езици, които не използват версия на латинската азбука, се изписват съгласно системата за транслитерация на съответния език.

How is the famous bul. Bruksel? In French? or English?

rhhs · May 11, 2023, 4:29pm

So if we stick to the law (I’m in favour in principle), булевард Копенхаген would have to be bul. København, then? I can imagine Sofijska obshtina doesn’t have an ø in their letter case…

Dutch, of course! Bul. Brussel! Or maybe bul. Bruxelles - Brussel, according to https://wiki.openstreetmap.org/wiki/Multilingual_names#Brussels ?

What a can of worms!

plamen · May 11, 2023, 6:07pm

Ще го дадат на някоя рекламна агенция срещу скромна сума
Няма такъв прецедент в Европа поне. Има един бар в UK и една улица в Кипър кръстена Copenhagen. Е, и няколко обекта в скандинавските държави, но там си разбират ø-то.

rhhs · May 14, 2023, 8:26am

I updated the wiki to reflect that the rule about not transliterating foreign names (persons and geographic) but using the original latin spelling is part of the transliteration law.
Also updated int_name=bul. København; we could be fined up to 300 lv. for not following the law (Чл. 12. (3)) Now searching for streets named after Nâzım Hikmet…

Wulfmorn · May 16, 2023, 10:59am

It bothers me a bit that int_name is used for “transliterated to Latin alphabet”, because what is the equivalent for “transliterated to a not-Latin alphabet”?

We get obvious transliterations of towns that have no reason to have a Russian name

name = Tynset 
name:ru = Тюнсет

Using int_name is a practical solution, but it feels so itchy Should be name:transliteration:latin and name:transliteration:cyrillic. As it is now, there is no obvious difference made between translation, transliteration and exonyms.

Anyway, I also think that duplication of name to name:bg will be a constant struggle to keep up to date. It would be much simpler to set a good example with name + int_name. Apps with a need to show either value will adapt and a precedent is set for other countries.

But again, I don’t know if int_name is a good thing to have in OSM. Might as well be “universal_name” or “useful_name”. It’s all equally ambiguous. This stuff keeps me up at night :

rhhs · May 16, 2023, 3:28pm

You are right that in hindsight, it is not the most elegant name for a tag. If I would have the chance to start from scratch, name:bg-Latn= would have been better because it clearly expresses that the name is still in Bulgarian, but using the Latin script instead of the official Cyrillic. This is how they do it in the Serbian community, where it is more important to get this right because Serbia has one official language but two official scripts.
Alternatively, name-Latn= could be an option for a name in any language transliterated to Latin script. However, apart from Bulgaria, int_name is also used in Belarus, Greece, Kazakhstan and Northern Macedonia, so you’d have to convince 5 communities to change their habits. Now that would keep me up at night!

plamen · May 16, 2023, 4:19pm

int_name is a popular tag in Japan and South Korea as well.
To cover all situations probably name:transliteration:xxx is a good choice.

Съвременната система за транслитерация в България общо взето следва английския език и по отношение на буквите. Отпаднаха č и š. Да не говорим за надписите на френски по ж.п. гарите.

Dimitar155 · May 16, 2023, 6:10pm

Don’t forget Mongolia, Kyrgyzstan, Russia and Tajikistan. Kazakhstan is supposed to change their alphabet to Latin in the coming years (Kazakhstan Presents New Latin Alphabet, Plans Gradual Transition Through 2031 - The Astana Times).

There are quite a few name tags out there. Adding another one is very easy but making sure that most data consumers and apps use it would be quite difficult.

plamen · May 16, 2023, 9:22pm

Имах предвид, че е подходящ таг за нов проект. int_name е толкова популярен, че неговото заместване с друг таг или дори пренебрегване е доста трудно в днешно време. Общо взето повтаря транслитерацията (не само в България) към английски, който де факто е международен език днес.

Minh_Nguyen · May 16, 2023, 10:36pm

There’s already a well-established pattern for this in other languages, consistent with the Internet standard BCP 47: name:bg-Latn. The standard even has a way to specify which transliteration standard is being used, though that’s rarely seen in OSM apart from Chinese and Korean.

It’s pretty straightforward for some renderers to make use of the more fully-qualified fields. For a Planetiler-powered renderer such as OSM Americana, adding bg-Latn would merely entail editing this line. If necessary, iD can be tweaked to expose the bg-Latn language code in the Multilingual Names field.

Wulfmorn · May 17, 2023, 9:18am

If you are duplicating name to name:bg, why not duplicate int_name to name:bg-Latn. If adding transliteration tags is uncommon now, it won’t become more common by itself. Break that ice Bulgaria!

This page has an interesting point, however, at the very end: Transliteration code - OpenStreetMap Wiki
transliterations are in most cases unnecessary since it is usually possible to automatically transliterate between scripts in the renderer.
Basically, let the machines do the work, but then of course, it’s gonna have flaws, or it’s gonna be a lot of work to get the bots perfect.

rhhs · May 17, 2023, 9:58am

That’s why…

Minh_Nguyen · May 18, 2023, 5:52pm

This guidance is about blanketing the database with “sound-it-out” transcriptions that can be generated by a lookup table, not for real transliterations that require context about the source language and even some etymology. If it’s signposted, that’s a good sign that it’s worth tagging somehow (no pun intended).

The anti-transliteration sentiment was bolstered by an early proof of concept by the German Mapnik developers demonstrating that Unicode’s ICU library (which powers text processing and localization on most platforms) could be used to automatically transliterate any name tag into Latin, no sweat. This approach later worked its way into OpenMapTiles. You can play with the online demo; set to “Any-Latn”.

ICU’s Any-Latn transliteration is good enough for demoing to a executive excited about not having to hire linguists or translators, but it won’t stand up to scrutiny by anyone who actually speaks the affected languages. It’s intended for collation – for ensuring that entries in various languages sort in a stable, predictable manner – but not for display to end users as human-readable text. This explains why it uses a single transliteration system for all text in a given script, regardless of language.

If OSM were to explicitly indicate the language of a name tag, then a data consumer could select a more specific ICU transliterator or even a dedicated library. Some developers are experimenting with detecting the language based on a spatial query and regular expressions, but this will only work for certain writing systems and languages and countries, and there are plenty of unhandled edge cases.

Wulfmorn · May 19, 2023, 9:02am

Good points. If the age of AI delivers, things may change rapidly, but it does indeed sound like proper transliteration tags would be a good way to go. I think int_name looks good initially but its a pipe bound to clog.

I probably dont get a vote on Bulgaria, but I’d vote for setting up latin transliteration tags for Bulgaria, and to drop all int_name. name:en is still ok, but only for places that have English names.

SomeoneElse · May 19, 2023, 10:30am

Er, I thought that the “German Mapnik” approach was absolutely developed for display to end users. What else does Mapnik do? It’s good enough so that I regularly use it myself.

rhhs · May 19, 2023, 11:24am

I’ll vote in favour too, but if you ask me to spend time on it… Would it be safe to automatically replace all int_name tags that are in Bulgaria by name:bg-Latn?