Multilingual names in Bulgaria

At Multilingual names - OpenStreetMap Wiki it is described that names in Bulgaria should be given in Bulgarian in name=, and be transliterated using the official transliteration system and entered in int_name= Optionally, names in other languages can be given, with English as an example.
I would like to propose to update this so it confirms with Multilingual names - OpenStreetMap Wiki on the same page, i.e. that the Bulgarian name should not only be given in the name= field but also in an additional name:bg= field. This is to make sure that map users who have Bulgarian as their top preferred language will always get to see the name in Bulgarian and not the transliterated name.

1 Like

First, I’m a little bit surprised why this section since we have only one official language in Bulgaria. Almost all signposts are transliterated indeed, but that is true globally.

name = Атанасовско езеро
int_name = Atanasovsko ezero
name:xx = whatever_it_takes
is pretty enough,
name:bg = Атанасовско езеро is not necessary, this is just a redundant tag - the lake is inside Bulgaria’s borders. I tagged many objects with name:bg, but that is another story :slight_smile:
For me, name:bg makes sense only for objects on the borderline like the river Danube or outside Bulgaria.

Logic here is not clear to me.
What does means:

… In such case the solution would be to

  • display name:pl=*
  • if it is unavailable display name:en=*
  • if both are not present display name=*

IMHO a better solution would be:

  • display name:pl=*
  • if it is unavailable display name=*
  • if both are not present display name:en=* (or what language is selected as a fallback)

This is app-specific logic and I’m unsure where and what is implemented.

I don’t see a point in duplicating that data. The only thing it does is to make it harder to maintain it.

@plamen name:en is more desirable since not everyone can read Cyrillic/Greek/Arabic/Chinese languages while English is readable for most people.

Since all names in Bulgaria should be tagged in Bulgarian in the name= field and transliterated to latin script in int_name=, we effectively have 2 “official languages” on OSM. This means that there will always be potential for the wrong “language” to be displayed by a map user’s app, if that app is unable to determine which language is used in the name= value. I can’t judge how likely this is to happen, but apparently the community thought this is likely enough to suggest to repeat the name= value in a name:xx= field to ensure the local users will always see the name in their local language.
name:en= is mentioned as optional: I suspect that it is often given for mentioning names in latin script in Bulgaria because iD doesn’t have an easy way to add int_name and adding name:en= is easier. It would be nice if int_name was available as one of the languages in iD.

And to make it better for all mappers, we should add a third “official language”?

It’s the other way around, the app should determine in which country the object is and determine the language by looking for it’s official language. That way it will search for name and name:bg in Bulgaria, name and name:el in Greece and so on.

It’s not likely to happen here since we have only one official script and only one official language. It’s a different story in Serbia (they have two official scripts) and Macedonia (where Macedonian and Albanian are official languages).

int_name should be added as a field some day, see Add some common name fields by 1ec5 · Pull Request #215 · openstreetmap/id-tagging-schema · GitHub.

@rhhs

Since all names in Bulgaria should be tagged in Bulgarian in the name= field and transliterated to Latin script in int_name=, we effectively have 2 “official languages” on OSM.

More or less this is true. All villages, streets, bus stops, etc. have their transliterated name in Latin script on their name boards. So in OSM, we should have name and int_name tags for these objects.
Note 1: int_name is not using any known language, think of it as a description.
Note 2: name is in the Bulgarian language, but many amenities like McDonald’s are tagged name=McDonald's which is not in the Bulgarian language. Backward transliteration is rarely used like name:bg=Макдоналдс.
Which language is used in the name tag can not be decided unless there is an additional tag clarifying this for example lang:name=Bulgarian.

It depends on the app. We cannot safeguard all scenarios in badly written apps. We can make life easier for app developers with additional tag but it will duplicate name tag for all instances.

The app is in English (or whatever language using Latin script you like):
The user is browsing in UK:

  1. Show name:en
  2. If not present, show name
  3. Stop here if no name tag

The user is browsing in Bulgaria (location can be determined and is within Bulgarian borders)

  1. Show name:en
  2. If not present, show int_name (as closer to English)
  3. If not present, show name tag (before that app can try to transliterate it to Latin script)

The above procedure can be used even when the app cannot decide that object is in Bulgaria, simply by checking if the name is using Latin script or not.

Adding name:bg cannot be of too much use IMHO.

@Dimitar155 В уикито за “int_name”:

Consider using language specific names instead; e.g., name:en=…

Това не е много ОК за България и вероятно и за другите държави използващи кирилица.

На табелата на курортен комплекс Златни пясъци пише:
Златни пясъци
Zlatni pyasatsi

Кое е по-правилно:
name:en=Zlatni pyasatsi или name:en=Golden Sands

Без int_name един средно-статистически английски турист ще има “Golden sands” в своето приложение (app), а по табелите на входа и изхода ще види “Zlatni pyasatsi”.

name:en=Golden Sands е правилно ако си чужденец. Ако следваме буквата на закона (чл. 8 от Закона за транлитерацията), трябва да е name:en=Zlatni pyasaci, евентуално с alt_name:en=Golden Sands.

@Dimitar155

Ако следваме буквата на закона, то и name:fr трябва да е name:fr=Zlatni pyasaci
От друга страна name:ru няма как да не е:
name:ru=Золотие пески (може и да е малко сбъркано, ама не го намерих в OSM)

int_name трябва да си остане и малко да се коригира уикито в тази посока. Т.е. да изчезне това “instead”, ами да има “also” или нещо подобно.

#сайт# курортен комплекс Слънчев Ден

leisure=resort
name:en=Sunny Day
name:ru=Солнечный день
name=Слънчев ден
wikidata=Q12294060

То и на сайта им си е “Sunny Day Resort”, как да им напишеш “Slanchev den”, като това за повечето туристи ще е непознато. Но да кажем, това не е населено място и няма някакви подобни табели с транслитерация.

Не се замислих за тази част… В такъв случай можем да оставим name:en да бъде английския превод, а int_name да бъде транлитерацията. Другия вариант е да има две имена в name:en, разделени с наклонена черта или точка и запетая, но това е доста нежелан вариант в OSM.

I think we should not change on the well-established practice to add the names in Bulgarian in name= (Cyrillic) and int_name= (latin script, official Bulgarian Cyrillic transliteration). I’ve seen many cases where automatic transliteration results are horrible (often because transliteration for Russian Cyrillic is used, resulting in Хасково becoming Khaskovo :face_vomiting:) so we should do that manually to make sure it’s equal to what is shown on signs on the ground.

About repeating the name in name:bg= to help “dumb” apps figure out to display the native language first for native users, I think we should follow the world-wide consensus. But maybe this consensus needs updating if most apps can now be assumed to be “smart” enough to identify the language in the name= field (by location within borders and by script used). It is @Mateusz_Konieczny who wrote that section of the wiki, so I would like to invite him to give his opinion.

Could you please keep an eye on this on behalf of the Bulgarian OSM community? I hope to soon see this list with “International name” (or even better “Transliterated name”) at the top of the list. The sorting of languages seems to be location specific, but I think the order should be “International”, then “Bulgarian” (if we decide to add name:bg= everywhere), then “Turkish” (to respect that it is the most important minority language in Bulgaria) and then the rest.
image

Added to the list of things to look out for.

I will let you know when it’s added to iD. About the ordering of names, there is order already. For Bulgaria it is bg, en, ru, tr, de (see iD/data/territory_languages.json at 140e56768ec4a57f371c1ba32e8ca46aca168280 · openstreetmap/iD · GitHub). I’m not sure how it was chosen but it can always be changed.

1 Like

В Лондон широк център има към 1000 обекта с name:en като: Way: ‪Pudding Lane‬ (‪4260247‬) | OpenStreetMap

such task is possible with some decent success in some regions, but in general is not something that can be always done

and requires extreme preprocessing, building something like that requires month/years of work and resource-intensive preprocessing of OSM data to work semi-reliably across world

are you aware of any tools actually supporting this?

To take some simple case: this tools will likely fail on Node: ‪Billige Zigaretten‬ (‪4469421454‬) | OpenStreetMap (shop in Poland, near German border, name in German)

I don’t, that’s why asked you to give your opinion :slight_smile:

What do you think about Bulgaria in particular? For geographic names, there’s one official language and one official transliteration, so would it be sufficient to add name:bg only in border areas like @plamen suggested? We have to balance the effort of adding name:bg everywhere with the need for it to ensure all Bulgarian users get to see geographic names in their own language and script.

For business and institution names, I think it’s sufficient to map what’s on the ground, i.e. what is on the shop window or what the company itself uses. Lots of shops here use a brand name in latin script only, so that’s the name and no need to give a second name. If they use Cyrillic and no latin transliteration, it’s not useful to give the transliteration if anyway it’s not visible. Only if both are shown, both should be tagged.

(off topic) If the only sign on a shop is a description of what they are selling, I’d tag it with name:signed=no :wink:

I would add name:bg only to objects with names in more than one language (like I am doing in Poland that has similar situation).

Where name and int_name are only ones, I would not bother with name:bg

Note: my familiarity with Bulgaria is low, I answered because I was asked directly.

That section was added in wiki to explain why blindly removing “duplicated” name tags on items with multiple languages tagged can cause data loss, it was not intended as request to put both name and name:XY on every single object with name.

1 Like

I think we can agree that adding name:bg is a good idea, but we shouldn’t spend much effort on making sure all geographic names in Bulgaria have it. I added to the wiki that it’s recommended in border areas and where there are names in other languages.
I also decided to be bold and add more explanations about name-tagging shops etc and not transliterating names of foreigners to the wiki. Hope you like it :grinning:

@rhhs not so bad. The picture is old and has already been replaced, right?. In that case, do you have a recent picture of this name board?

@Mateusz_Konieczny Yes, you are right, It would be difficult for a universal algorithm to be written showing geographic names as per user’s preferences for languages. I’m not aware of how programmers do that. For example, OsmAnd is using customized maps for each country. It is possible for language info to be inserted during conversion from the OSM database. They had also problems with geographic names in non-Latin countries like Greece in the past.

We don’t have information in what language is written value in the name tag. For names within Bulgarian borders, it is assumed that this is the Bulgarian language. But the app should have calculated the object’s position according to the boundary relation first. Another option is to have name:bg tag for every object. We are coming to space–time tradeoff in programming.

We have in Bulgaria transliterated names on every signboard, but they do not fall in any known language. We can’t say that this is English. The closest approach here is to use int_name for them.
Users who understand languages based on Latin script can use transliterated names for easy navigation.
Any routing app like OsmAnd should be aware of that and should offer int_name when the user is asking for English, French, Polish, or any other language based on the Latin script. But again app should calculate the position and offer int_name instead of name when they are in Bulgaria or Greece.

So what about the is_in tag? It was declared obsolete in JOSM. It would have saved computing, but you should have it on every single node or way.

this can be reliably calculated from OSM data, unlike language of name

(and in cases where it cannot be, then parsing is_in will not be reliable either)

We have also destination tag and Relation:destination_sign where language is not specified. int_names on destination signs are also not handled.