OSM Americana, your local language companion

OpenStreetMap Americana’s recent addition of a simulated globe signals a commitment to a global perspective, despite the blatantly American aesthetic. For one thing, it’s one of the most multilingual OSM-based maps, with the option to label the world in any of hundreds of languages. Multilingualism and local knowledge are equally important to those of us who map in a country as linguistically diverse as the United States. In support of these efforts, the Americana project is excited to share several recent improvements to map internationalization, along with a new option for Web developers.

More thorough language detection

Both OSM’s convention for name:*=* subkeys and OSM Americana’s localization support conform to BCP 47, the IETF standard for locale identifiers. Usually, the IETF language tags in OSM take the form of a simple ISO 639 language code, but most software uses more detailed language tags. This includes modern Web browsers.

If you use your browser in American English (en-US), OSM Americana will first consult the name:en-US=* tag. But differences between American and British English rarely affect place names, so it falls back to the much more common name:en=*, corresponding to English in general. Similarly, Brazilian Portuguese (pt-BR) falls back to Portuguese in general (name:pt=*). This is fine for most languages, because your browser indicates your region whether the application needs it or not.

By contrast, some languages like Chinese vary by multiple aspects. Your browser would indicate not only the language but also whether it’s Simplified or Traditional Chinese and whether that Traditional Chinese follows the Hong Kong or Taiwan standard. name:zh-Hant-TW=* is much less common than name:zh-Hant=*, and name:zh-TW=* also occurs. Based on your browser’s built-in functionality, OSM Americana can now check each of these keys before falling back to the compromise name:zh=* value as a last resort:

zh-Hant-TW zh-Hant zh-TW zh-Hani zh-US zh
name:zh-Hant-TW name:zh-Hant-TW name:zh-Hant-TW name:zh-Hani-CN name:zh-Hant-US name:zh-Hans-CN
name:zh-Hant name:zh-Hant name:zh-Hant name:zh-Hani name:zh-Hant name:zh-Hans
name:zh-TW name:zh-TW name:zh-TW name:zh-CN name:zh-US name:zh-CN
name:zh name:zh name:zh name:zh name:zh name:zh
name name name name name name

The result is less clutter whether you specify a more generic or more specific Chinese locale:

Although the map is more readable, it’s also potentially politically fraught in the case of Chinese. The choice to resolve zh to PRC Simplified Chinese comes from your browser, not OSM Americana. Fortunately, if you set your computer to any Chinese variant, a modern browser will always indicate the specific variant you prefer, so the map will avoid offending you in normal usage.

In case you do want a compromise listing of different names for the same place, you can specify an atypical combination like zh-US to fall back to name:zh-Hant=*, which includes Traditional Chinese from every region, or zh-Hani to fall back to name:zh=*, which includes all the Chinese variants. (The language picker only lists some common languages, so you need to manipulate the URL to see them.)

A tip of the hat to @user10 for the insight that made this improvement possible.

More places with local names

One of OSM Americana’s marquee features is its labels in both the user’s preferred language and the local language at the same time. At first, we only had dual language labels for cities, based on the consensus among American publishers of atlases and globes. Unfortunately, when you see a simple language label, you can’t easily tell if it’s because the your language just uses the name the locals call it or if it’s because the place is too minor to get special treatment. This is because OSM’s classification of places as cities, towns, and villages can be chaotic and unintuitive, especially in the U.S. The threshold between a town and a city is arbitrary and seldom followed, so we gave towns the dual language labels too.

But why stop there? We also enabled dual language labels for tribal reservations because language preservation is so important to many indigenous groups. Ironically, many populated places within these reservations got no such treatment because they were too small. Now we also dual-label villages for consistency, like Red Cliff on the Red Cliff Reservation in Wisconsin:

Our hyperlocal focus on language doesn’t stop at indigenous languages. Thanks to a thorough implementation of BCP 47, OSM Americana also supports all the language varieties listed in the IANA Language Subtag Registry, which includes seemingly every English from Scouse to Oxford.

For example, Boontling is an English-based jargon spoken only in the remote villages of Anderson Valley in far northern California. Now you can learn some Boontling by setting the language parameter to en-boont:

Why go through all this trouble to expose rare language varieties and child’s games like Boontling? One of OSM Americana’s goals is to serve as a quality assurance tool. By playing the role of a realistic, consumer-grade renderer, we hope to shine a light on unfortunate tagging that has gone unnoticed because other popular renderers play it safe. Boonville’s Boontling name was mistagged for nine years before OSM Americana exposed it. Many other names may be syntactically correct but factually incorrect.

More local dialects

The language URL parameter can include an ISO 3166 alpha-2 country code, such as pt-BR for Brazilian Portuguese versus pt-PT for European Portuguese. Besides these national dialects, the country codes are often useful for dealing with geopolitical disputes between countries that speak the same language. For example, Vietnam takes a stance on the English name of the sea to their east that differs from the English spoken elsewhere:

In some countries, people looked at these international disputes and figured they can have more parochial disputes with their neighbors within the same country and language. The reservoir that straddles the North Carolina–Virginia state line has two names in its name=* tag because of a geopolitical dispute. Virginians have always called it Buggs Island Lake, but the federal government renamed it to Kerr Reservoir and North Carolinans prefer that name. No one is willing to split the difference and accept a name change at the state line. The only road sign calls it Buggs Island Lake, but that’s because the only bridge is on the Virginia side. It got so bitter that, until a decade ago, officials on that side were even prohibited from uttering the name “Kerr”.

If your computer speaks American English, OSM Americana acknowledges the situation as neutrally as possible, listing both names on both sides of the border, thanks to a semicolon-delimited value list in the name:en=* tag:

This is a fair compromise but still a mouthful. Now, for more local color, you can customize the language further, specifying language=en-u-sd-usnc or language=en-u-sd-usva in the URL for North Carolina or Virginia English, respectively. The map immediately changes to prefer one name or the other:

This is possible because of corresponding name:en-u-sd-usnc=* and name:en-u-sd-usva=* tags on the same feature, taking advantage of a Unicode extension to BCP 47:

English Unicode Subdivisions United States North Carolina
en- u- sd- us nc

The same Unicode subdivision syntax is also being used on some features of international interest, due to official naming disputes that don’t affect colloquial language as much. These names are tagged in official_name:*=* subkeys that OSM Americana hasn’t added support for. There are plenty of other naming disputes between federal and local authorities or between locals and out-of-towners, but these usually manifest as loc_name=* versus reg_name=* versus nat_name=* or name=* versus official_name=*.

In principle, we could supplement reg_name=* with name:en-u-sd-*=* to clarify exactly what “regional” means. And not only for disputes of an official nature: “The City” refers to different cities in different parts of the country. Traditional New Mexican Spanish has unique names for various places, like Wyoming, that a Spanish speaker from anywhere else would greet with a blank stare.

Sounding like a local

Now, let’s suppose you only speak English, normal English. Visiting a small town, you want to blend in with the locals but don’t want to overdo it. Did you know OSM can serve as a local pronunciation guide? By setting the language parameter to en-fonipa, you can see English names transcribed into IPA wherever it would be unpredictable based on the spelling alone.

See the difference between San Jose (ˌseən hoʊˈzeɪ) and San Jose (ˌseən ˈʤoʊz)? The good text-to-speech engines would pronounce them both as /ˌseən hoʊˈzeɪ/ while the bad ones would pronounce them both as /ˌseən ˈʤoʊz/, but there’s a big difference!

Putting it together, what if we could indicate meaningful pronunciation differences between local dialects that an off-the-shelf text-to-speech engine wouldn’t know about? The good folks of Louisville are quick to offer lessons on the proper pronunciation of that city’s name, /ˈluːəvəl/. They don’t want to hear /ˈluːivɪl/ even if it’s the more widely known pronunciation elsewhere.

Unlike the more common name:pronunciation=* subkey, the name:*-fon*=* subkeys explicitly indicate both the language and the phonetic alphabet (IPA, X-SAMPA, Kirshenbaum, etc.). By comparison, name:pronunciation=* often contains other alphabets or ad hoc spellings, which are less useful to data consumers but could be more useful if we move them to more standards-compliant subkeys. This will allow localized software like OSM Americana to pair the pronunciation guide with the right language. When a navigation application or screen reader sends the pronunciation guide to a text-to-speech engine, it’ll be able to select the most appropriate voice based on the language of the transcription.

And so can you!

To promote multilingualism on OSM-based maps more generally, we’ve developed Diplomat, a new reusable plugin for localizing map labels in MapLibre GL JS using just a few lines of code. You can install this plugin as a traditional JavaScript script or as a module via NPM.

Unlike earlier localization plugins, Diplomat has a dual language labeling option, just like what you see in OSM Americana. It supports all the major OSM vector tile schemas, plus OpenHistoricalMap and custom GeoJSON overlays. It can switch to any valid language that the tiles expose, not just a commercial tile host’s hard-coded list.

OSM Americana is able to label so many languages because the underlying OpenStreetMap U.S. Tileservice exposes every valid name:*=* localized subkey in OSM, no matter how rare and obscure. If you host your own tiles, Planetiler has a simple option to expose all the languages. Alternatively, if you use openstreetmap.org vector tiles, Spirit is in the process of adding support for multilingual names.

You can see Diplomat in action in OSM Americana, of course. Or check out the AARoads Wiki, an English-language wiki about roads all over the world, where the map of this Omani highway complements the article’s Arabic references. On the OSM Wiki, any OSM Americana or OpenHistoricalMap slippy map automatically switches to your preferred interface language. You can insert a slippy map into any page using the {{Vector map}} or {{Map compare}} template.

Diplomat is in beta testing while we work out any warts in the API design. Please give the library a try and let us know of any issues you run into. Diplomat pairs nicely with the MapLibre Shield Generator, another OSM Americana spinoff that inserts route shields where they belong on any map. As the OSM Americana project proves out new techniques, we strive to put them in developers’ hands so that OSM-based maps can meet baseline user expectations more easily. Let us know what else is missing – we love to hear about route shields, but we also love to hear about anything else that makes a map a map.

34 Likes

Very good initiative. I tried Telugu language, but complex letter shaping is broken. For sample rendering of Telugu, take a look at Telugu wiki. Hope you can fix it soon.

1 Like

Yes, this is a known issue in MapLibre. There is a proof of concept for fixing the issue, which is slowly making its way into MapLibre step by step.

3 Likes

This is now fixed. As before, you can find the current preferred languages in the lower-right corner of the page:

Clicking the Change button brings up a redesigned dialog box that accepts any valid IETF language tag (such as an ISO language code), or you can enter a language by name as long as your browser knows its name in the current language. (Browser support varies.) Here’s what it looks like in Firefox and Safari:

The Americana project has gotten some feedback about language names that are still missing from this dialog box. While we’re interested in adding more languages to this interface, it would be even better to report these missing languages to Unicode’s CLDR project. An addition to CLDR will work its way into countless software systems across the computing industry and the Internet. The benefit to your language community would be similar to how contributing to OSM impacts much more than a single map provider.

3 Likes