Multiple delimited names in the name tag

iD does maintain a territory–language mapping based on industry-standard CLDR data. iD only uses it to decide which name:* fields to show first in a very long list of languages, so the stakes are very low. It doesn’t matter much that, for example, Portuguese is listed as a language for Switzerland, because all it means is that Portuguese is easier to find in the list.

I wonder how good of a starting point default_language would actually be for building a more nuanced dataset. I think it would require more research than punctuation-fiddling. For example, are all the labels in Switzerland really only in German? Are all the labels of Morocco really only in Arabic? It’s nice that this key can occur on subnational boundaries such as South Tyrol, where signs are apparently posted only in German and never combined with Italian. But it isn’t nuanced enough for places where the streets are in German but everything else is in English. How is a data consumer to determine that a Chinatown anywhere outside Québec mostly uses Chinese when the Chinatown’s boundary may not be well-defined, let alone verifiable enough to add to OSM where it can be used as a source for this dataset?

These are all rhetorical questions, of course. Everyone knows Switzerland speaks four languages separated by slashes. Far be it for me to question that tagging.

If OSM insists that data consumers take a bring-your-own-languages approach to native name labeling, ignoring name entirely, then I see three possibilities:

  1. Each data consumer builds a slightly different meta-dataset of regional defaults, some even taking advantage of a backdoor for making OSM dependent on proprietary data. Some alternative distributions of OSM data might bring in other names that never work their way back to OSM.
  2. Data consumers who still care about open data turn to Wikidata. Some renderers have been quietly giving priority to Wikidata’s public-domain labels over OSM name:* tags for years, so this is not without precedent. But name remained the one name key that was entirely OSM’s own. I think some data consumers have appreciated the hyperlocal knowledge in name. A hard stance about delimiters could convince some data consumers to take another look at Wikidata’s native label (P1705) property, which is structured as a list.
  3. Data consumers ignore the entreaties in this thread. Nominatim continues to split name on semicolons.

Regardless, mappers may not particularly care where renderers get their labels from, except when their hand-curated choice of languages doesn’t seem to have an effect.

1 Like