Multilingual names in Bulgaria

That was the goal, but the implementation is based on a “transliterate anything to Latin” ICU transform that is not designed for this purpose. It’s only good enough for display to users who are fine with equating Japanese with Chinese, adopting the PRC’s language policy in Taiwan, conflating Bulgarian and Ukrainian with Russian, and various other faux pas.

Even then, the output is closer to what you’d see in a Chinese or Russian dictionary’s pronunciation guide than anything that would ever be posted on a sign or labeled in an atlas. It doesn’t distinguish between common and proper nouns. Words are rife with diacritics that would be unnecessary in normal usage but that ICU requires for sorting stability. For some scripts like Thai and Hanzi, it doesn’t even know where words begin and end.

ICU has other transforms that achieve higher accuracy (while remaining very pedantic), but they require advance knowledge of the input language. That’s the real limitation, which is addressed by keys such as name:bg but not by name alone.

Would it be safe to automatically replace all int_name
I think you could assess that by running an overpass on them. If you cant immediately find problematic cases, chances are the workload of converting all with a few manual fixes is viable. But if you immediately see that there are going to be a lot of issues, well, then you know.

Theoretically yes. In practice it would be very easy if we have a script with two list. One of the lists will hold Bulgarian names that can be automatically transliterated (Ivan Vazov, Vasil Levsky…). The other list will hold hard coded transliterations (like Прага → Praha, Лайош Кошут → Lajos Kossuth, Гладстон → Gladstone). I will try to create such script when I have more free time but it won’t be soon.

I mean to automatically replace int_name=Vasil Levsky with name:bg-Latn=Vasil Levsky while making sure int_name=Deve Bair is not replaced by name:bg-Latn=Deve Bair (that could start a war :slight_smile: ).

that should be “Kossuth Lajos” :slight_smile:

That’s what I was thinking. It can do some extra validation to make sure that everything is correct.

Deve Bair should be in Macedonia so the script won’t be able to catch it.

Other countries incl. Macedonia and Greece also use int_name so the script should check and only change those that are in Bulgaria.

I didn’t mention it, but the data will be sourced by Overpass turbo. It already has a filtering function so that it won’t return anything from nearby countries.