(Sorry for using English, and for the disorganized thoughts.)
The IETF’s BCP 47 standard is documented in the “Names” and “Multilingual names” articles. This standard specifies that pt-BR
is the IETF language tag for Brazilian Portuguese. The standard is technically case-insensitive, but the well-established industry convention is to capitalize the region code and the first letter of the script code. We used to completely lowercase name:*
subkeys, but in 2018, there was a mass migration to mixed case, thus name:zh-Hant
instead of name:zh-hant
.
According to the standard, pt
means Portuguese in an unspecified region. A data consumer is typically aware of both the user’s preferred language and region and will fall back to the unqualified language code if necessary, typically using either the truncation fallback algorithm or the CLDR language matching algorithm. If a feature is tagged with name:pt=*
and name:pt-BR=*
but not name:pt-PT=*
, a user who prefers pt-PT
will see the name:pt=*
value.
However, if the feature is tagged with name:pt-BR=*
and name:pt-PT=*
but not name:pt=*
, a user who prefers pt-TL
may see name:pt-BR=*
, name:pt-PT=*
, or name=*
, depending on the application, browser, or operating system. A data consumer that implements the CLDR language matching algorithm would consider it a tie and probably choose name:pt-BR
just because it comes earlier alphabetically. But if it uses the less sophisticated truncation fallback algorithm, the user will see name=*
instead.
Unfortunately, some data consumers like OpenMapTiles and Mapbox Streets recognize only name:pt=*
and ignore name:pt-BR=*
and pt-PT=*
. Planetiler-based tilesets have a little more flexibility; I’ve requested support for name:pt-BR=*
and name:pt-PT=*
for the vector tile server that powers OSM Americana and other community projects.
To avoid inconsistencies, we should try to set all three keys – name:pt-BR=*
, name:pt-PT=*
, and name:pt=*
– whenever the dialects differ, even if some of them have identical values. name:pt=*
could be the more locally relevant spelling, the more internationally popular spelling, or a semicolon-delimited list of the two spellings (see the precedent in Chinese). The specific fallback value doesn’t matter very much as long as we fill out as many name:pt-*=*
tags as we can.
Miscellaneous technical considerations
For user-facing text, most OSM-related software applications assume that pt
is specific to Portugal. Some operating system platforms like iOS expect applications to provide specific Brazilian and Portuguese localizations; for Angolan and Timorese users, the operating system would choose one of them automatically.
Software | pt |
pt-BR |
pt-PT |
---|---|---|---|
OpenStreetMap Website Nominatim |
— | ||
Every Door | — | ||
Go Map!! | — | ||
iD iD Tagging Schema Editor Layer Index OSM Community Index |
— | ||
JOSM | — | ||
MapLibre Native | — | ||
Mapbox Maps SDK | — | ||
Organic Maps | — | ||
OsmAnd | — | ||
Potlatch | — | ||
StreetComplete | — | ||
taginfo | — | ||
Vespucci | — | ||
Waymarked Trails | — |
A few years ago, the iD project discussed using en
to represent a compromise “international English”, to avoid confusion among translators into other languages, but this idea went nowhere because it would have created many interoperability problems.
Most software applications don’t have a practical reason to present a mix of Portuguese dialects to the user at the same time, except as a fallback. On the other hand, a mixed-language map is also atypical but common in the OSM ecosystem. We’re using name=*
for the name in an unspecified (implicitly local) language, whereas the most specific ISO 639 language code would probably be either mul
or und
. BCP 47 allows either an ISO 3166-1 country code or a UN M.49 region code in the second position. The UN M.49 code for “unknown” is 0000
, but so far no one has ever used name:*-0000
, and it could potentially cause confusion with the date namespace that some mappers prefer.