In practice, it’s far from infinite because an individual tag value is limited to 255 characters. This is a hard limit enforced by the OSM API.
I’ve always thought of alt_name=* as a “junk drawer” of keywords for search engines to consume. It’s a form of SEO, similar to tags on Flickr or <meta name="keyword"> in HTML. This is not that unusual. Look at almost any official national-scale digital gazetteer and there will be a field for alternative names much longer than we have in OSM. Our Gulf of Mexico feature has no alt_name=* in English or Spanish, but the GNIS entry that sparked so much debate a year ago lists more than 30 alternative names from a variety of sources. In principle, we could replicate the whole list in OSM.
The downside of this junk drawer is that it says nothing about why something is in the list. Unlike GNIS, we don’t track the source or justification of each individual value in alt_name=*. If a name’s inclusion there isn’t obvious, then we can clarify by moving it to a more specific key or subkey.
It isn’t just for Nominatim. Some East Asian maps also conventionally omit the generic when it’s already understood from the typography or symbol. But I agree that improved language awareness in data consumers might someday allow them to backfill this information when it’s missing from OSM. In the meantime, short_name=* would be a fine place to put these generic-less names.
The U.S. Census Bureau does systematically append generics to place names, but people don’t normally use these fully qualified names outside of demography. Instead, official_name=* contains the legal title such as “City of Paris”. That said, we did originally import the Census names as e.g. tiger:NAMELSAD=Paris city but removed them a couple years ago as import cruft.
More relevantly, the Vietnamese community has historically included the generic in name=*, but I’ve proposed to move it to official_name=* and border_type=* to give geocoders and renderers more flexibility.
Anyone can submit a request to the IANA Language Subtag Registry for a variant subtag representing a notable transliteration scheme.
In the meantime, the BCP 47 standard for IETF language tags allows us to use private use subtags of our own choosing. This would take the form of name:abc-x-defghijk, where abc is the language code and defghijk is an arbitrary string one to eight characters long. OSM Americana recently added support for this and other miscellaneous extensions to BCP 47.
I can think of a couple reasons why these unaccented spellings wound up in alt_name:en=*. English is perhaps more aggressive than other languages at dropping diacritics when borrowing a toponym. Unlike most other major Latin script languages, English has no native diacritics that could conflict with the transliteration scheme. Native speakers only recognize at most a handful of diacritics from other languages like French, Spanish, or Māori, depending on the dialect, but (to my chagrin) most consider the diacritics inessential, a purely stylistic matter. Unaccented English is also what gave rise to the ASCII standard, which is so prevalent in computing.
For better or worse, English is often seen as the “international” language, the “default” Latin-script language. When a traffic sign anywhere in Asia has a subtitle in Latin script, it almost invariably contains English translations like “park” and “airport”, just as an English speaker would use, whereas these names only transcribe the generic without translating it. For this reason, some mappers prefer to put the lightly anglicized name in int_name=*. This has been especially common in China.
This would make it clearer where the “English” name comes from, but it would still assume that stripping diacritics from pinyin or Wade–Giles is a strictly English phenomenon. Are we OK with making that assumption? It would put the simplified spelling further out of reach of anyone who speaks some other Western language, such as French or Italian. I think those languages tend to strip diacritics from Chinese transliterations also.
The premise of this guideline is that software can automatically transliterate just as well. For CJK, this is almost like saying we don’t need translations in OSM or non-English versions of Wikipedia because Google Translate or ChatGPT already works well enough. Even if we don’t intend to comprehensively transliterate everything in OSM, I think there will always be some need for human-curated transcriptions of CJK names. Sure, the software will keep getting better, but these improvements are only possible because of craft translation and craft transliteration.
Besides, text transformation software inherently lacks the geographical context that an OSM element provides. We have name:pronunciation=* and name:en-fonipa=* because often two places named exactly the same in English are supposed to be pronounced completely differently. This is increasingly common in other Latin-script languages too, as globalization makes language communities more open to unadapted borrowings from other languages. Going in the other direction, text transformation software can’t reliably transliterate an English name into Chinese without knowing the English name’s correct pronunciation.
If OSM doesn’t encode transliterations explicitly, then most data consumers will naturally turn to Wikidata for this information before they try to generate them automatically. But Wikidata has a notability standard, not nearly as strict as Wikipedia but still much stricter than OSM.
Incidentally, old_name=* is primarily intended for old names that people nonetheless use normally. It isn’t intended to be a solution to the problem of finding each geographic reference in a Zheng He map or Journey to the West and resolving it to a present-day place or administrative boundary. That’s what Wikipedia is for.
If someone is researching the name history in such great detail in order to be able to use the obsolete date namespace syntax, what they’re really doing is historical mapping. Historical mapping belongs in OpenHistoricalMap, where we have much more sophisticated conventions for handling the evolution of a feature over time and recording details about dates and sources. Or Wikidata, where the data model is designed for time series data about individual attributes such as names.
This is largely why we didn’t copy all those obscure alternative names for the Gulf of Mexico from GNIS. Most of them came from centuries-old nautical charts that everyone forgot about until political commentators scrambled to justify the common name on a historical basis. But someone did attempt to catalogue a select few of them using the date namespace. I would like to remove those eventually, but for now they serve the purpose of nerd-sniping keyboard warriors who would otherwise edit war over their favorite name for the Gulf.
This is the norm in some languages like Vietnamese that have less consistent standards for exonyms and transliteration. So far, we’ve been content to use alt_name:vi=*. In principle, we could move these names into more specific subkeys like name:vi-x-mofa=* and name:vi-x-hanviet=*, but some names are just interchangeable without a particular rhyme or reason, like all the different ways to spell Australia when we start nitpicking about hyphens and tone marks.
Nominatim is very lenient about punctuation and spaces. Any half-decent geocoder would have to skip over punctuation when tokenizing input text in order to be usable. Geocoders can also match on many abbreviations automatically. This is one reason why we feel comfortable recommending that mappers avoid abbreviating.
I suspect these permutations are intended to avoid picking a side on a relatively minor stylistic matter. We could eliminate most of them under the principle that abbreviations don’t belong in alt_name=* either. But some minor variations would remain. Straight or curly apostrophes? Why not both! ![]()