So, I went and checked (the benefits of living nearby!) and OSM is actually “wrong”. The street signs in Biel/Bienne are German/French, not French/German. So if the data on OSM is meant to reflect actual practice, then the entire city of Biel/Bienne is incorrectly named
Currently, fixing this means changing all the names in the city. Not great!
With the proposal, it would mean one edit to one tag.
(Your example of the Basque capital remains, however. But makes it even more of an edge case …)
Unfortunately, order matters for rendering. When the pixels display on our screens, or are read out loud by text-to-speech, the names end up in some order. Aside from applicable laws where they exist, it should match what is on street signage to make maps as useful as possible. And street signage also displays names in some order.
That said: Can you provide examples of location where this would be an actual issue? Or, even more interestingly: can you provide an example location where the results of the proposal would be worse than what is currently on the map?
(I prefer discussing concrete use cases, so that we don’t end up spending energy on theoretical problems that don’t actually exist in practice.)
Heh, fair point! I was just following the official instructions, but you are correct…
name would be a fallback, and provide backwards compatibility for renderers that are not updated. Due to this, the proposal does not propose changes to the name tag, and it would remain as-is.
(Whether name should contain multiple languages or not would be up to editors interested in that area, as it is now … but it’s kind of a moot point as no matter what is put there, it has unresolvable drawbacks such as poor audio rendering, or how hard it is to make it globally consistent. Due to name being, at least imho, broken for multi-lingual cases, the proposal focuses instead on new tags which may actually resolve the problems seen with using a single name.)
Not necessarily the “broadest audience”, but arguably IMHO the best documented is Tilemaker. If you could show how easy it would be to write “is in a Gaeltacht” code in there, I’d probably use it
I’m not sure languages:preferred is a great key name for this suggestion. The term “preferred” to me implies a hierarchy: “OSM is my preferred geo database; I prefer OSM to Google Maps”.
This feels problematic particularly applied to minority languages. A minority language might be second in sign order (e.g., bilingual signs in Kashubian) but tagging that the majority language is “preferred” seems ripe for misunderstandings and conflict.
Conversely, two languages might have equal legal status, and they are ordered purely for visual and practical reasons (e.g., the New Brunswick example of space-saving signs printed “Boul Babineau Blvd”), where tagging languages:preferred=fr;en also doesn’t seem great since it’s not actually preferred.
If the purpose is to order languages purely for visual rendering and the intent is to mimic local signage, I would suggest something like languages:order_used_on_signs or languages:sign_order or something else similarly descriptive that does not imply a hierarchy. Unfortunately it’s likely to be a longer key name.
That’s my cue to come back to something I glossed over when talking about Dingle. You said that “language policies are generally set by governmental agencies, and typically map to boundaries within their jurisdiction”. But the proposal seems to rely heavily not just on boundaries but on adminstrative boundaries. Gaeltacht areas in (the Republic of) Ireland not administrative entities in the usual sense - certainly not part of the usual administrative hierarchy. Some (not all) Gaeltacht boundaries have been mapped and tagged as boundary=gaeltacht - no easy task as you can see from the Galway Gaeltacht:
But that is the easy part. In Northern Ireland, there are no formal Gaeltacht areas. But the flexibility of the name= tag, for all its flaws, allows for a reasonably satisfactory result for a Gaeltacht linked to a residential landuse area.
I take the point that the proposal doesn’t necessarily make the situation worse: mappers in Ireland could simply ignore it and rely on the name tag as always. That should work so long as everyone involved knows they should ignore preferred language tagging.
Incidentally, while it wasn’t something I was looking for, the Galway Gaeltacht map also highlights the issue of objects in more than one language area. Note how Lough Corrib, which is partly in the Gaeltacht, has been given a dual name. That’s not the usual practice in Ireland, but again the name= tag is flexible enough to give a reasonable result. Perhaps the special case (tag on a feature) would also handle this situation.
As it is not only for things with signage, this is also not great.
Would languages:order:* be better to your eyes? Or perhaps languages:display_order:* might work?
Do you have alternative suggestions to “official” and “preferred”? Having two tags services actual use cases, so …
I am pretty sure that we can find editors who will look at the word “order” and respond as you have: who defines this order, why does this language have precedence over the other? Language is highly political, so I suspect at best we can minimize such responses.
For me, this is actually a great example of why using nameis poor.
A small number of editors are deciding what the best visualization of the end names are. The result is it sounds pretty silly when text-to-speech is applied, and it prevents renderers from making any other sensible choices.
With languages:preferred=fr;en, a renderer would be free to see that there is a shared part of the name, and render it as it is currently hard-coded.
But then it could do this consistently. Not by the whim of local map editors, but for what makes sense for the use case in question.
As for space-saving signs, that’s a matter of the form factor of physical signage which has a set of requirements and constraints that a database does not have, namely: they are immutable, they are physically constrained while requiring ease of reading at a distance, and they must also represent (in this case) both languages.
The street signs in Biel have one line for the German and one line for the French name (and no separator character), while you can perhaps imply an ordering you definitely don’t have to. One of the more subtle things about Bienne is that the municipality is officially named Biel/Bienne (so that is not a composite label, that is the actual name).
PS: note no spaces in Biel/Bienne
PPS: as I just happened to have pgsql open
gis=> select strsp, count(strsp) from gwr_entrances where dplzname='Biel/Bienne' group by strsp;
strsp | count
-------+-------
9901 | 10999
9903 | 10901
or maybe
gis=> with temp as (select distinct strname,strsp from gwr_entrances where dplzname='Biel/Bienne') select strsp, count(strsp) from temp group by strsp;
strsp | count
-------+-------
9903 | 343
9901 | 351
with other words there is no preferences in addresses or street names in Biel/Bienne in official usage for either language.
As I understand it with Dingle / An Daingean the situation was that there were two Irish names; one “official”, one used locally.
FWIW what I currently do for languages on maps of IE/UK is to use:
name:cy for “somewhat Welsh speaking parts of Wales” (a bit beyond the isogloss)
name:gd for a box containing most Scots Gaelic areas.
name for Ireland, since as noted above it’s used mostly sensibly there - for example “Derry/Londonderry” where multiple names are needed, but just one name (either Irish or English) where that makes sense.
name:en for the remaining bits of England, Wales and Scotland
If it is also for things without signage, what determines the order of languages:preferred in that case?
And, what determines the order of languages:official where two languages have equal status?
Yes but that is why I suggested order_used_on_signs. Order on signs is often a hierarchy [1], but the goal would be to be as descriptive as possible – to describe what the hierarchy applies to – my intent was to say that the hierarchy if any applies to visual presentation of names on signs.
I was only using New Brunswick for the example of two languages with equal status and order on signs being mostly a matter of practicality, with no hierarchy intended.
in my other message, I first described the signs as “Polish above, Kashubian below”, but had to delete it because I found it distracting how wrong it sounded to me ↩︎
Does the “official” naming, including signage, on the ground follow these boundaries?
It is probably safe to assume that in some nooks and crannies in the world, the name tag may be the best bodge available.
Currently, OSM is recording (and rendering) untold numbers of entire cities poorly (or even incorrectly), and that is worth improving even if we can “only” identify a 99% solution.
That said … looking at street-level imagery for the area in Belfast you link to, the street signage there is in both languages.
Do you know roughly how common that is in Belfast? The reason I ask is I wonder if this would be another candidate for “Special case: Defined on a Feature”, as that would allow tagging e.g. these specific streets in Belfast with the languages:preferred (or whatever it ends up being) field. But if it is thousands of streets in a single city, that may also be unworkable and looking for other options may be called for.
… or in lieu of that simply recognizing that sometimes name will still work better in a (relatively) small number of specific cases.
Not all things that have names have signs, but if we want to write them down in an official manner, even those tend to follow the policies and traditions for language in that area. Signage is generally an artifact of local requirements and customs, not the other way around.
Yes, I got that.
My question in return was if just “order” or “visual_order” would work for you?
If the word “sign” appears in the tag, we should expect questions about “what about things that don’t have signs” as well as “but the signs don’t display full names”.
(The latter is seen around various Canadian cities, as already noted by people in this thread. They are optimized for physical street signs with specific physical constraints and usage requirements, but there is a clear order on those signs despite abbreviations, e.g. French then English. The street signs are not trying to get people to vocalize “Avenue Nerville Avenue”, but are trying to display the French and the English name (in that order) in a space-conserving manner.)
I’m hoping we can find names for the tags that are acceptable to OSM editors which does not beg additional confusion.
It isn’t about hierarchy, of course, but the physical constraints of displaying or vocalizing two things in the same space.
Those street signs you refer to do have an order to them, and it is consistent. Why? Because consistency helps, and in laying out multiple names some order results. OSM can (should?) reflect those implementations so that a person looking at a rendered map or listening to a vocalization of the name(s) can relate them to what they see on the street, without being encumbered by the literal layout choices made (for reasons the proposal covers in detail).
Of course, if someone can point us to city street signage that does not display names in some order, or even one where that order rotates from street to street in deference to fairness, that would be very interesting and represent a difficult use case to model.
Does “order” or “visual_order” work for you? Why or why not?
Yes, an Daingean was at one point the only official name, and the only name that was supposed to appear on local road signs. The preferred local names were Dingle in English and Daingean Uí Chúis in Irish. A plebiscite of local voters supported the compound name Dingle Daingean Uí Chúis, and that was recognised in legislation (after quite a long time).
Having these in various name:* tags as you detailed makes complete sense to me.
If multiple of these appear on e.g. signage or official maps, it would also make sense to have these displayed by OSM renderers as well. And as you detail them, it sounds like that may be definable by regions: show Welsh names alongside English in regions of Wales, etc, etc.
The map user should also hopefully also be able to communicate their preferences to the renderer (e.g. to use the local names) to override these defaults.
There is allowance for exceptions for specific features in the proposal, so that should be good too.
… with the caveat of when that exception appears on the boundary=administrative feature itself, something that is noted in the Open Questions section of the proposal. That will need a solution at some point.
Broadly yes, e.g. within these boundaries road signs usually show only the Irish name of local towns, in contrast to dual names in the rest of the country. As noted, Dingle is very much an exception.
Street signs and road direction signs in Ireland (outside the Gaeltacht) usually display the Irish name above the English name. A bit like Biel/Bienne as mentioned, but with an extra touch of creative ambiguity to discourage anyone tempted to treat that as “Irish first, English second”: the Irish version is usually displayed in smaller type and often in italics (*). However mapping practice for name= in Ireland is closer to “what most people locally call it” than “a literal transcription of what is displayed on signs” - so in those areas, name= normally holds the English name only.
I think very uncommon for street signs, but it’s a long time since I’ve been there. There are quite a few schools and sports grounds with Irish or dual names, but from a quick look they are mostly tagged with only the name= tag. In any case, I mentioned this less for its own sake but because I wondered if there are larger-scale examples of cities with non-administrative areas that mainly use a minority language.
(*) With weird results in the case of Dún Laoghaire, which is in an English-speaking area but only has an Irish name, so despite being a substantial population centre, often appears on road signs in a small font surrounded by a lot of empty space.
If you’re referring to the languages:official=* part of your proposal, this is a simplistic concept that may work in some regions but not others. Many jurisdictions confer special status upon certain languages while avoiding the term official language. A language may be official in practice but not in theory, or in theory but not in practice. Some jurisdictions declare official languages for certain people or purposes but not for others.
For example, in my home state of California, the constitution declares English to be the official language and therefore prohibits people from spelling their names with diacritical marks, yet government agencies sometimes try too hard to communicate with me in Vietnamese, one of the many languages they’re federally mandated to provide for certain services. There are whole neighborhoods where you will find hardly any signs in the Latin writing system. That said, road signs are almost universally in “English” – which includes a plethora of Spanish-derived names, replete with diacritics.
Massachusetts declares English to be the “common public language”, but government materials are routinely provided in a variety of commonly spoken languages. Louisiana recognizes a constitutional right for “the people” to promote heritage languages, originally an implicit way to protect the French-speaking majority, but over time, this provision has been interpreted at times to protect a wide variety of minority languages too.
South Dakota declares English and Sioux to be co-official statewide, but good luck finding any wayfinding signs in Sioux outside Sioux reservations. Conversely, the tiny town of Oldenberg, Indiana, proudly signposts its streets in German, even though the town council lacks the authority to declare an official language. The state’s official language is English, but they’ve even managed to apply a German name to the state road running through town.
Your proposal isn’t just about language policy; it also calls for a languages:preferred=* key and seems to place more importance on it from a software perspective. Many places don’t regulate the languages that residents are allowed to use, let alone which languages software systems should associate with the place.
The proposal downplays the possibility that language usage can be fluid and nuanced:
This sometimes happens in the case of monuments, streets with historical names still in modern use despite language changes, or specific geographic features. This is uncommon, however, and nearly all features which are not administrative boundaries should not have languages:* tags.
I just wonder what’s your basis for making such sweeping claims about the world? Have you ever been to a Chinatown, or any ethnic enclave for that matter? At least in North America, even when an ethnic enclave has a well-defined boundary, language usage inside and outside is far from uniform. Going back to Oldenberg, the roads and shops may be named in German, but the buildings, the churches, and the maypole are named in English.
But really, what does this have to do with “preference” anyways? If the language of a thing is in a certain language, then say so. If the user prefers a particular language and there is a name in that language and the software is capable of presenting it, then the user’s preference is none of the mapper’s business.
Please name the “audio renderer” that you’ve been referring to in multiple threads, or at least describe it in more detail, so we can understand the limitations and constraints that have led you to your proposed solution.
A few weeks ago, I was wandering around Seattle’s International District looking for some Seattle-style teriyaki and bánh mì. At each intersection, the street signs at the northwest corner are in English followed by Japanese, while on the southeast corner, the signs for the very same streets are in English followed by Chinese. Further down Jackson, the signs are in Vietnamese instead. Meanwhile, the shop signs are a rowdy mix of all four languages.
Of course, there are other problems with assuming that signs are laid out linearly in plain text. This sign is laid out in French, English, and Spanish, but English clearly has priority, because it’s bigger:
But just because a name is bigger doesn’t mean it’s the main name. One time I noticed that a Vietnamese restaurant in Brunei posts its Malay name in Jawi first and much larger than the Malay name written in Latin or the English name, but it turns out the Jawi name is decorative or pro forma; no one uses it.
I would be remiss if I didn’t close this comment on a more positive note. There is certainly room for improvement in how OSM encodes feature names. But if the megathreads around name=* are any guide, consistency is almost a nongoal because it hardly exists in the real world. What we should be aiming for instead is to accurately describe each feature as people experience it, minimizing our own personal preferences as mappers to a reasonable degree, and structuring it just well enough for data consumers to adapt to user preferences. If data consumers sometimes have to rely on heuristics and generalizations, at least that’s better than mappers having to make those generalizations on their behalf. After all, there’s only one database but many data consumers.
This is provided for in the proposal, as is almost every single one of your other examples, except for ones like these two:
How does this affect the default rendering of names of features on maps? The answer is “It does not”, and therefore it is not relevant to the proposal.
This proposal is about rendering of names in multiple languages for locales where multi-lingual signage and designation is the mandated norm. It does not intend to cover recording multiple names in the same language for one feature.
The alt_name tag is already in use in the OSM dataset, though it could probably also be improved. Perhaps someone will propose improvements there at some point as well. But it is not in scope for this particular proposal.
The other examples in your post are covered by the proposal.
This is exactly why a preformatted text in a name tag is undesirable in nearly all cases, and is the motivation for the proposal in the first place.
It seems you have some key misunderstandings with regards to the proposal:
It is not about the the politics of language, but the recommended default rendering of names retrieved from the OSM database. Currently this is done by OSM contributors by hard-coding it one name at a time into the name field, and it produces poor results all over the world.
User preference (“show it in my preferred languages”) is not at all impacted by the proposal. Software would still allow users to set their language preferences, and you and I could query the database for the languages we wish to.
The names of the tags are the least interesting and most easily changed part of the proposal. If other names such as “languages:visual_order” are less distressing than “languages:official”, great, let’s agree on a set of names that as many of the OSM contributors are comfortalbe with as possible.
There is no suggestion in the proposal that 100% of all cases will be perfectly covered. That is a non-goal. The goal is to significantly improve default rendering of maps, visually and otherwise (e.g. audio). If we can improve 99% of cases and not harm the remaining 1%, then we would have a roaring success. Perfect must not be allowed to be come the enemy of good.
As for point #3 above, the naming of the proposed language:* tags, it is already being discussed in this thread. I would love to read your feedback on the alternatives already put forward, and/or for you to suggest alternatives that would make sense to you.
edit: I forgot to respond to this:
Sat navs / navis / direction routers … many of these have the ability to voice the directions, and is used all the time in cars, but also in other cases where visually reading them is either not possible or difficult.
If you have ever used such a device or service in one language while navigating in a city where names are in a different language and the device/service does not know what language(s) the names in the directions are in, you’ll know how bad (and funny) it can be.
To correctly voice directions, they need to know the language of the name, which is addressed in the proposal and which OSM currently fails at by shoving all languages into name with no metadata as to what language(s) are actually represented in the value.
I’m not sure I fully understand the references to audio. Is that a particular problem for multilingual regions, or does it apply everywhere? In monolingual regions where typically only name= may be populated, I don’t think OSM data directly says what language is used in name=? So does this proposal go beyond regions where multiple languages are typically rendered (which you have suggested elsewhere it is confined to)?
It’s an issue in two primary cases that I am aware of:
a) multi-lingual areas, as you noted! …
when one name is in e.g. French and the other in e.g. German, but the software does not have metadata marking which is which, the resulting audio can be fairly mangled for at least one of them. It’s not great, though locals can get used to it. Some software does a better job at auto-detecting languages than others, as well.
But it does not help when we have, for example, streets in Canada with the text “Avenue Nerville Avenue” in the name tag … how can the text-to-speech software even detect which language part is which? Is the first “Avenue” English or French? What about the second one?
b) when the language of the navigation aide does not match the language of the locality.
A few years back I had the experience of driving about with a friend who had a satnav in their car set to English (which they spoke) as we drove through French and German language towns. The names were absolutely butchered, to the point I could barely understand them even though I knew some of the streets by name, as the nav tried vocalizing the place names as if they were English.
As the name tag does not include language information in OSM, text-to-speech vocalizations of navigation directions using those name values relies on a mix of heuristics (“people speak XXX in this country…”) and language auto-detection. Neither are exactly perfect, nor available in every nav device/service.
Having language metadata for the name(s) gives a more reliable and simpler path to “vocalize this name with the correct pronunciation”, particularly when multiple languages are involved.
I find the audio use case helpful in reminding me that OSM isn’t a map but a database and that there are a lot of different ways OSM data gets used.
That’s fair, and I’m glad to hear it, but I think what I’m struggling to understand is the difference between languages:official=* and languages:preferred=* and why both are needed. The proposal says languages:official=* “distinguishes endonyms from exonyms”. To me, this is unfortunate because, although Hoa Thịnh Đốn is technically an exonym for Washington, D.C., this is because of the language’s geographic origin, not due to official government policy. Normally one speaks of exonyms as being used by foreigners, but Hoa Thịnh Đốn is only used by Americans, and particularly by the local residents, never by people abroad.
I guess the two keys are intended to triangulate around a list of names in the “native” language. Is the only difference between the two keys that a language can be “preferred” by common usage while “official” must be codified in law?
There’s probably a limit to how well we can sanitize names to be monolingual. Your example is tricky because French and English have influenced each other so much over the centuries. Not far from me, the street names are all of the form “Rue Bordeaux” and “Rue Le Mans”. These would be valid French names, except they’re actually English names that a real estate developer chose to give a more sophisticated air to the neighborhood.
This happens so frequently in English that most text-to-speech engines would have no problem faking French pronunciations that sound passable to an English speaker. In fact, if my phone tells me to turn onto Rue Paris, I’ll have an easier time understanding roo parriss than roo parree or /ʁy pa.ʁi/. Roo parriss would cause a Parisian to faint, but the more etymologically correct pronunciations would cause me to miss my turn. As you note, perfect isn’t the enemy of the good.
On the other hand, in some parts of the world, there’s a trend to prepend the indigenous name onto the English name. Up in the hills above me, there’s a nature preserve that goes by “Máyyan 'Ooyákma – Coyote Ridge Open Space Preserve”. That’s the full English name, out of respect for the Muwekma Ohlone Tribe. I have very little confidence that a TTS engine would faithfully pronounce this name in an arrival instruction. Once I find out how it’s supposed to be pronounced, I’ll add a name:pronunciation=* tag to hopefully steer these English voices in the right direction.
Language identification would be more important in a place like Hong Kong. There, the streets are all tagged bilingually. If you feed “夏慤道 Harcourt Road” into a TTS engine, it might insert an awkward pause or even produce an error because it isn’t prepared to vocalize text that’s so different from English. It’s also annoying to isolate the problematic text, because the delimiter is a mere space.
The status quo depends on the application to present name:en=* to an English speaker, but in another part of the world, the same behavior would force the user to look for a name that’s only in fine print or not even visible at all. One approach would be to only present name:en=* if the same value appears somewhere in name=*. (“Nerville Avenue” is a substring of “Avenue Nerville Avenue”.) Otherwise, try to present name=*. If name=* doesn’t contain anything that the TTS engine can vocalize, then fall back to name:en=* as a last resort.
As you note, this isn’t perfect. For one thing, what if the user’s language is Japanese, so the default TTS voice has less of a problem parsing CJK text like 夏慤道? The phone could end up saying something like natsu-michi, but I don’t know if the user would really prefer that over a Japanese-accented rendition of hākoto-rodo.
Yes, I think languages:visual_order could be fine. languages:order would probably cause more questions than answers.
Unless you were hoping to have it used by a speech application then maybe something like languages:presentation_order?
I would also encourage you to think about how the proposal to have languages:official being potentially used in rendering order [1] can be reconciled with regions that have two or more languages of equal official status.
The more I think about it, the more I’m starting to think that languages:official (or whatever name ends up being chosen) should be ordered only alphabetically (as the least controversial of the orders?), and the order should have no significance in any rendering or processing. Or perhaps it should not be tagged at all.
from proposal: “If no such tags are found, a renderer would instead display all name:<language code> tags which match those in the languages:official tag, respecting the order they appear in that tag” – emphasis mine ↩︎