Can we tag an infinite number of alternative names for a place?

An interesting aspect of the alt_name tag is that the OSM wiki does not clearly define what kinds of names should not be included in this tag. Instead, it appears that almost any alternative name can be placed in alt_name if no more specific tag is available. As a result, some elements contain a large number of alternative names within a single tag, sometimes exceeding a dozen entries:

alt_name = 'Aicha Dhkhira;’Aïcha Edkhera;'Aicha Edkhera;Aáicha Edjera;Aaicha Edjera;’Aïcha Dkhira;'Aicha Dkhira;Aïcha Dkira;Aicha Dkira;Aïcha Deïra;Aicha Deira;Aïcha Dhrira;Aicha Dhrira;’Aïcha Dhkhîra
alt_name = 万山镇;万山;Wanshan Zhen;Sheng-ch'i-hsien;Chiuhsingchi;Wanshanssu;Chiu-sheng-ch'i;Wan-shan-ch'ang;Wanshansze;Wanshan Tequ;Sheng-ch'i;Chang-chia-p'ing;Sheng-hsi-ku-ch'ih;Wan-shan-t'e-ch'u
alt_name = Litlgrønkjølen;Pederkjølen;Grønsjøkjølen;Hestnesmyra;Almyra;Møkkelbrynnkjølen;Simenmyra;Fjølabukjølen;Skallkjølen;Krokkjølen;Kattstokkjølen;Hestflyet;Myra synna Flena;Nordre Synstkjeldmyra;Søndre Møkkelbrynnkjølen;Vakkermyrene;Måsåmyra
alt_name_2 = Sommerløvkjølen;Småmyra;Langmyra;Sjømyra;Mormyrene;Hestvegmyra;Litlmyrstrupen;Hammarslangkjølen;Monsmyra;Raddkjølen;Storgolvmyra;Elgsmyra;Tørtallmyra;Slipvevmyra;Vetmyra;Kjerringnålmyra;Tvurrukjølen;Storrønningen;Litlgolvmyra;Vintervegmyra
alt_name_3 = Svartåskjølen;Tallåsremmet;Salbakkmyra;Gomormyra;Svenstrupen;Granåsflyet;Nilsmyra;Langkjølen;Dulpmyra

They are Aïsha in Mauritania, Wanshan in China, and Storkjølen in Norway. There are more instances where the alt_name value becomes extremely long (with more than 10 semicolon-separated names) around the world:

With roughly 300 such cases in total, these long alt_name tags are concentrated mainly in Norway, the Arabian Peninsula, Afghanistan, and China.

While some of these names indeed have different meanings, most of these “alternative names” are essentially identical except for variations in transliteration rules, spacing, or capitalization. Taken to an extreme, this means that theoretically an infinite number of “alternative names” could be created in this way. To avoid such situations, should we introduce clearer guidelines that limit how the alt_name tag is used?

To make the discussion more concrete, I would like to divide some potentially controversial uses of alt_name into several subtopics. We can then consider at which level most contributors would regard them as proper alternative names. Because of the limitations of my personal scope, many of the examples come from Chinese cities and towns, but as shown earlier, similar situations occur in other parts of the world as well. Feel free to supplement this discussion with examples that you consider either appropriate or inappropriate uses of this tag.

#1 Generic suffixes in names

In East Asia, most mappers agree that the name tag should include a generic suffix indicating the administrative level, for example, in 北京市 (Beijing), in 新宿区 (Shinjuku),and in 西塘镇 (Xitang), which means “municipality/city”, “district/city/ward”, “town”, respectively. This differs from many European languages (and possibly other language groups; I am less familiar with them), but it helps distinguish nearby places that share the same proper name. For example, 承德市 (Chengde, city seat, capital=5) and 承德县 (Chengde, county seat, capital=6).

However, in everyday usage it is often acceptable, and sometimes more natural, to omit the generic suffix when referring to a place, provided that there is no ambiguity. For instance, it is more common to say “我在北京 (I’m in Beijing)” rather than “我在北京市 (I’m in Beijing Shi)”. Generally these are not considered “true” alternative names. Nevertheless, some mappers add them because of limitations in Nominatim: the place may not be found accurately unless the query matches the stored name exactly. As a result, we sometimes see alt_name=北京 for 北京市 (Beijing). However, this practice is relatively rare in other East Asian countries. For example, 東京都 (Tokyo) doesn’t receive alt_name=東京.

One issue with this approach is that the “alternative name” becomes unnecessary if the search engine improves. It can also look awkward when “true” alternative names exist in the same language, for instance, different CJK characters. Consider 回龙圩镇 (Huilongxu) that is tagged alt_name=迴龙圩镇;迴龙圩;回龙圩. Here 迴龙圩镇 is a “true” alternative, whereas 迴龙圩 and 回龙圩 are the same names without the generic suffix.

Similar situations occasionally appear in English-speaking countries. For example, alt_name=New York City for New York makes sense because it is commonly used. However, adding alt_name=Boston City for Boston may be considered unnecessary.

#2 Translations of generic suffixes

While topic #1 mainly concerns practical workarounds for search limitations, this topic concerns how generic suffixes are translated. According to OSM Wiki/Multilingual names, the general convention when translating Chinese (and often Japanese or Korean) place names is to omit the generic suffix. That means, name=北京市name:en=Beijing, name=東京都name:en=Tokyo, and name=서울특별시name:en=Seoul (here the generic suffix “특별시” has three characters).

However, some mappers translate the suffix as well, sometimes using both translation and transliteration. For 本溪市 (Benxi), the suffix may be translated as City or Shi, resulting in entries 本溪, Benxi City, and Benxi Shi. One possible retagging solution is to store the translated full name in official_name:en. However, this raises another question: where should the transliteration Ganzhou Shi go? There is currently no dedicated tag for this case, so it often ends up in alt_name:en

alt_name = 本溪
alt_name:en = Benxi Shi
official_name:en = Benxi City

This is somewhat analogous to translating name=New York as name:zh=纽约 while also adding alt_name:zh = 纽约市, which is currently used in New York. But in practice, such usage is uncommon and unnecessary; for example, Rome dose not have alt_name:zh=罗马市, alt_name:ko=로마시 or alt_name:ja=ローマ市.

#3 Transliterations from different systems

If topics #1 and #2 are particularly relevant to CJK languages, this issue is global. In the discussion Tagging names transformed from one language to another, darknos pointed out the current chaos in the use of transliterations. He also said:

alt_name is a great tag, but right now, it serves as a graveyard for names that land there solely because of the imperfections in OpenStreetMap’s name-tagging system.

These examples may provide some rationales for the above statement:

Rominizations without tone marks end up in alt_name

Let’s continue the story of “本溪市”. Historically there have been several transliteration systems for Mandarin Chinese: pinyin, tongyong, and wadegile. These systems typically include tone marks in their standard forms, such as name:zh-Latn-pinyin = Běnxī Shì and name:zh-Latn-wadegile = Pen3-ch'i1 Shih4. However, romanizations without tone marks are very common in real-world usage. Since no dedicated tag exists for these simplified forms, they often end up in alt_name:en. As a result 本溪市 (Benxi) is now tagged as alt_name=本溪;Benxi City;Pen-ch'i Shih;Pen-hsi Shih;Benxi Shi.

Names in languages without ISO codes end up in alt_name

Beside the above three transliteration systems, there are also several other systems having no ISO code, including the well-known postal romanization and Yale system, and less-known EFEO, Barnett–Chao, Meyer–Wempe, etc. They all have subtle differences. The Japanese also have multiple transliterations such as Hepburn, Nihon-shiki, Kunrei-shiki. They don’t have tone marks but differ in spelling conventions, for example, si and shi, fu and hu, also kyo and kyō. You can find more examples in other languages. Lots of them may lack an ISO language or script code, leaving alt_name as the only available tag if the mapper doesn’t wish to create a new one.

Variants of old or official names end up in alt_name

The statement that “If none of that fits then alt_name=* can be used” has really confused cartographers, leading them to attempt cramming every name they think potentially relevant into this tag. It happens even when a mapper try to introduce the complicated date namespace. For instance, 白银区 (Baiyin) carefully records historical names using date ranges, but still contains alt_name=Haochiach'uan;Paiyin;Pai-yin Shih;Pai-yin Ch'ü;Baiyin Qu just because they are non-standard transliterations of older names or older official names (Haochiach’uan for 郝家川, Pai-yin Shih for 白银市). Similar tagging appears in many Chinese cities and towns as the generic suffixes changed along with the administrative types (e.g. 县 (County)市 (City), 乡 (Township)镇 (Town)).

Some mappers argue that these names are legitimate because they appear on historical maps from the 19th and 20th centuries, which are sometimes used as tagging sources. For example, in Mainland China, Wade–Giles romanizations were widely used throughout the 20th century before pinyin became the ISO standard in 1982. These old transliterations should theoretically be considered disused. However, many of them are still added as alt_name as previously mentioned.

We have to acknowledge that parts of these historical names have become widely recognized internationally, for example, Peking for 北京市 (Beijing) and Tsingtao for 青岛市 (Qingdao). Determining whether the old transliterations are necessary or appropriate in OSM by the usage remains difficult.

Meanwhile, European and American cities that have been known for centuries may also have many transliterations over the long time, even in the same language. Some interesting examples are found in German cities, I’m wondering whether it makes sense to add alt_name:zh = 法兰克福;法蘭克福;法兰克佛;法蘭克佛;法兰库夫;法蘭庫夫;弗兰克福;富蘭克福;美因河畔法兰克福;美茵河畔法蘭克福;... for Frankfurt am Main and to add alt_name:zh=鲁尔河畔米尔海姆;米尔海姆(鲁尔河畔);米尔海姆;魯爾河畔米爾海姆;米爾海姆(魯爾河畔);鲁尔河畔的米尔海姆 ;鲁尔河米尔海姆;米尔海姆;米尔亨;米尔海姆安德鲁尔;... for Mülheim an der Ruhr.

#4 Variants with different writing conventions

The situation becomes even more complicated when transliteration systems combine with variations in hyphenation, spacing, numbers, and capitalization. If we want, Benxi Shi and Pen-ch'i Shih could theoretically appear in many forms: Ben Xi Shi, Ben-xi-shi, Benxishi, Benxi-Shi, Penhsishih, Penhsi shih, and more. This is already happening in places such as 万山镇 (Wanshan) and 小浦镇 (Xiaopu).

Similar patterns also appear in other countries. Rebensteinermauern has been tagged with alt_name=Rebensteinmauern;Rebensteiner-Mauern;Rebensteinermauer;Rebensteiner-Mauer;Rebensteinmauer. No. 63 or Benab has been tagged as alt_name=Number 63 Village;Number 63;63;No. 63;No. 63 Village;63 Village;#63 Village;#63;Benab;63 Benab;No 63 Benab;Number 63 Benab;. Ironically, these variants may also be motivated by search limitations if Nominatim cannot reliably normalize these transformations.

Summary

Hopefully this overview illustrates how chaotic the current use of alt_name and alt_name:* tags can be. While these tags provide a great deal of flexibility, they also make it possible to generate an effectively unlimited number of “alternative names” through combinations of the situations described in sections #1#4. For this reason, it may be time to establish a clearer definition of how these tags should be used:

  1. Should we acknowledge or discourage the practice of adding or removing generic administrative suffixes (for example, , , , , City, Shi) as alt_name values?

  2. Should we consider adopting the suggestion by darknos to introduce more explicit tags for transliterations, such as name:en-t-zh-Latn-pinyin and name:en-t-zh-Latn-wadegile? This could allow romanizations without tone marks, as well as other transliteration forms between languages, to be stored in a more structured way. Alternatively, should we recommend avoiding transliterations whenever possible as suggested in Avoid transliteration?

  3. Should we acknowledge or discourage the use of alt_name for names that differ only in spacing, hyphenation, or capitalization, that could possibly be resolvable at the technical level?

  4. Should we acknowledge or discourage the use of historical transliterations or translations that appear on older maps and were once widely used but are now obsolete in alt_name? Would it be better to store them using more explicit tagging, such as name:en:-1982 and disused:name:en, or even old_alt_name?

2 Likes

Thank you for mentioning the discussion about transformed names and for raising the important issue of transliteration. Here are my personal thoughts on these points.

Using more specific tags is something I support. Even though different ways of writing a name are just copies of the original, their presence in alt_name demonstrates their usefulness to mappers. If these names are important enough to be recorded, the most precise tags should be applied. The main recommendation to avoid transliterations can still be followed while using these specific -t- tags.

Adding truncated names to the alt_name tag just to assist a search engine that does not work perfectly seems unwise. When the original name is complete, it already contains all necessary information. It is better to leave it to data users to handle this information. That said, using a shorter name in the name tag and the full name in alt_name may be acceptable if the truncated (or elliptical) version is much more common locally.

From my perspective, a true alternative name is more than a version with different spaces or hyphens. A real alternative name should sound distinct, be spelled differently, and carry its own unique history or meaning. Using alt_name for minor changes in punctuation or capitalization is not the most effective way to apply the tag.

3 Likes

In practice, it’s far from infinite because an individual tag value is limited to 255 characters. This is a hard limit enforced by the OSM API.

I’ve always thought of alt_name=* as a “junk drawer” of keywords for search engines to consume. It’s a form of SEO, similar to tags on Flickr or <meta name="keyword"> in HTML. This is not that unusual. Look at almost any official national-scale digital gazetteer and there will be a field for alternative names much longer than we have in OSM. Our Gulf of Mexico feature has no alt_name=* in English or Spanish, but the GNIS entry that sparked so much debate a year ago lists more than 30 alternative names from a variety of sources. In principle, we could replicate the whole list in OSM.

The downside of this junk drawer is that it says nothing about why something is in the list. Unlike GNIS, we don’t track the source or justification of each individual value in alt_name=*. If a name’s inclusion there isn’t obvious, then we can clarify by moving it to a more specific key or subkey.

It isn’t just for Nominatim. Some East Asian maps also conventionally omit the generic when it’s already understood from the typography or symbol. But I agree that improved language awareness in data consumers might someday allow them to backfill this information when it’s missing from OSM. In the meantime, short_name=* would be a fine place to put these generic-less names.

The U.S. Census Bureau does systematically append generics to place names, but people don’t normally use these fully qualified names outside of demography. Instead, official_name=* contains the legal title such as “City of Paris”. That said, we did originally import the Census names as e.g. tiger:NAMELSAD=Paris city but removed them a couple years ago as import cruft.

More relevantly, the Vietnamese community has historically included the generic in name=*, but I’ve proposed to move it to official_name=* and border_type=* to give geocoders and renderers more flexibility.

Anyone can submit a request to the IANA Language Subtag Registry for a variant subtag representing a notable transliteration scheme.

In the meantime, the BCP 47 standard for IETF language tags allows us to use private use subtags of our own choosing. This would take the form of name:abc-x-defghijk, where abc is the language code and defghijk is an arbitrary string one to eight characters long. OSM Americana recently added support for this and other miscellaneous extensions to BCP 47.

I can think of a couple reasons why these unaccented spellings wound up in alt_name:en=*. English is perhaps more aggressive than other languages at dropping diacritics when borrowing a toponym. Unlike most other major Latin script languages, English has no native diacritics that could conflict with the transliteration scheme. Native speakers only recognize at most a handful of diacritics from other languages like French, Spanish, or Māori, depending on the dialect, but (to my chagrin) most consider the diacritics inessential, a purely stylistic matter. Unaccented English is also what gave rise to the ASCII standard, which is so prevalent in computing.

For better or worse, English is often seen as the “international” language, the “default” Latin-script language. When a traffic sign anywhere in Asia has a subtitle in Latin script, it almost invariably contains English translations like “park” and “airport”, just as an English speaker would use, whereas these names only transcribe the generic without translating it. For this reason, some mappers prefer to put the lightly anglicized name in int_name=*. This has been especially common in China.

This would make it clearer where the “English” name comes from, but it would still assume that stripping diacritics from pinyin or Wade–Giles is a strictly English phenomenon. Are we OK with making that assumption? It would put the simplified spelling further out of reach of anyone who speaks some other Western language, such as French or Italian. I think those languages tend to strip diacritics from Chinese transliterations also.

The premise of this guideline is that software can automatically transliterate just as well. For CJK, this is almost like saying we don’t need translations in OSM or non-English versions of Wikipedia because Google Translate or ChatGPT already works well enough. Even if we don’t intend to comprehensively transliterate everything in OSM, I think there will always be some need for human-curated transcriptions of CJK names. Sure, the software will keep getting better, but these improvements are only possible because of craft translation and craft transliteration.

Besides, text transformation software inherently lacks the geographical context that an OSM element provides. We have name:pronunciation=* and name:en-fonipa=* because often two places named exactly the same in English are supposed to be pronounced completely differently. This is increasingly common in other Latin-script languages too, as globalization makes language communities more open to unadapted borrowings from other languages. Going in the other direction, text transformation software can’t reliably transliterate an English name into Chinese without knowing the English name’s correct pronunciation.

If OSM doesn’t encode transliterations explicitly, then most data consumers will naturally turn to Wikidata for this information before they try to generate them automatically. But Wikidata has a notability standard, not nearly as strict as Wikipedia but still much stricter than OSM.

Incidentally, old_name=* is primarily intended for old names that people nonetheless use normally. It isn’t intended to be a solution to the problem of finding each geographic reference in a Zheng He map or Journey to the West and resolving it to a present-day place or administrative boundary. That’s what Wikipedia is for.

If someone is researching the name history in such great detail in order to be able to use the obsolete date namespace syntax, what they’re really doing is historical mapping. Historical mapping belongs in OpenHistoricalMap, where we have much more sophisticated conventions for handling the evolution of a feature over time and recording details about dates and sources. Or Wikidata, where the data model is designed for time series data about individual attributes such as names.

This is largely why we didn’t copy all those obscure alternative names for the Gulf of Mexico from GNIS. Most of them came from centuries-old nautical charts that everyone forgot about until political commentators scrambled to justify the common name on a historical basis. But someone did attempt to catalogue a select few of them using the date namespace. I would like to remove those eventually, but for now they serve the purpose of nerd-sniping keyboard warriors who would otherwise edit war over their favorite name for the Gulf.

This is the norm in some languages like Vietnamese that have less consistent standards for exonyms and transliteration. So far, we’ve been content to use alt_name:vi=*. In principle, we could move these names into more specific subkeys like name:vi-x-mofa=* and name:vi-x-hanviet=*, but some names are just interchangeable without a particular rhyme or reason, like all the different ways to spell Australia when we start nitpicking about hyphens and tone marks.

Nominatim is very lenient about punctuation and spaces. Any half-decent geocoder would have to skip over punctuation when tokenizing input text in order to be usable. Geocoders can also match on many abbreviations automatically. This is one reason why we feel comfortable recommending that mappers avoid abbreviating.

I suspect these permutations are intended to avoid picking a side on a relatively minor stylistic matter. We could eliminate most of them under the principle that abbreviations don’t belong in alt_name=* either. But some minor variations would remain. Straight or curly apostrophes? Why not both! :nerd_face:

3 Likes

Seems as a kind of mapping for the renderer to me.

Or perhaps spam or possibly vandalism (more a form of defacement).

But in both cases it could as well have been a new user with good intentions :-)

but as long as name variants are real then it is not an incorrect mapping for renderer and therefore it is OK

  1. One factor to consider is place= vs boundary=administrative , where it can be thought the generic suffix emphasizes the latter. Also, there could have been various degrees of Tagging For Renderer concerns, while Carto does render the latter, and distinctively (should be more about labels colliding and getting hidden in raster now). Interestingly, Tokyo has old_name:ja=東亰 for the kanji. I’m guessing there might have been some omission, as it does have official_name:it=Metropoli di Tokyo for the Metropolis only. Besides, it should be remembered a valid case is some name= don’t make sense without the generic suffix (eg compass direction entities “北区”, or other single-char “港区”, which are even worse in Mandarin and Cantonese than Japanese for the single-syllabus pronunciation; or a special case Hokkai-dou in the to-dou-fu-ken system). Then there would be consistency/uniformity concerns, while English have them in official_name= / alt_name= “hidden” away.
  2. You can always add them first. I would assume you will only get debated when removing alt_name= , if there’s any concern about their necessity. (As a side note, I should disclaim, or take pride in having promoted name:*-t-*= and name:*-fonipa )
  3. One variation you haven’t considered is numbers. I have used *:*-u-nu-latn= vs *:*-u-nu-roman= vs *:*-u-nu-hant= to store different number writings, as they can be very inconsistent and conflicting, but have more practical needs.
  4. *_name= has the problem of whether old_alt_name= vs alt_old_name= should be used, or that they are different. was:alt_name= is clearer, although it will get mixed up with any other was:*= stages. Obviously it can only store 1 stage cleanly, and can’t be distinguished nicely afterwards. A was:alt_name:end_date= would only help between attributes. While the date suffix may seem the best, it has been argued as a poor data format, as you are storing variable data in the key, which is supposed to be more fixed and finite. (Personally one idea I have for OSM to do-minimal is name:en:*=* @ (-1982) similar to temporary:*= following *:conditional= , before having to OHM multiple objects)
1 Like
  1. I would rather discourage. However, Nominatim (for instance) should be able to “translate” Shi in place=city or town. But just like “airport” (in local language, browser languages and English) => aeroway=aerodrome
  2. Yes (for the first part), but I could live with the second, which is not observed. Well, see @darkonus comment for a proper solution (but will software cope with that? In practice I mean).
  3. Keep the work for soundex - which is difficult at multilingual level.
  4. Second option definitively better. And used by Nominatim by the way.

I’ve never seen alt_name misused the ways (plural!) you mentioned, just as a plain alternative name.

IHMO, keeping diacritics is a must. No problem for having name:en=Benxi Shi, but pinyin is pinyin, isn’t it?

No sense to duplicate names already present with the date namespace. (and yes, @Minh_Nguyen it’s a simplified version compared to OHM).

And the usage in other languages may differ : Pékin in name:fr and Beijing in alt_name:fr! Which, IMHO, are correct. Probably one day the opposite will be right.

Well, you have place=… that defines somehow the generic, isn’t it? The issue may reside in the “somehow”, language and feature dependent. As you said, name and official_name are sometimes more relevant.

And some have combined diacritics, Nguyễn ;-). In French it’s like Chinese without tones. If I ask “avez-vous des congres ?”, I’m asking if they have conger, not congresses!
But no, in Hanyu pinyin — Wikipédia, we do use diacritics, sone not used in French, but it’s completely normal. By the way, mind the wording of Wikipedia in French, one of the rares localization of this name. @Minh_Nguyen will love French ;-).

not perfect but better than Wikipedia, which is not geospatial, so Paris is Paris, but probably Paris (France) or Paris (Texas) for two places named Paris locally.
We can’t simply borrow the name from Wikidata. It works in general but not always, notability being probably the most important issue, but the many irrelevant versions without diacritics, etc. don’t make much sense either.

1 Like

This recalls the recent discussion about ellipsis in toponyms. Keys like place=* and border_type=* are essential but not necessarily a replacement for detail in names. In some cases, mappers have gone too far trying to structure names for machine readability. Another parallel discussion is the one about whether the name=* of a chain hotel location can ever include the brand=*. In both instances, data consumers still need a reliable mechanism to reconstruct the fully qualified name, but real-world variation may make a structured representation inadequate or unusable for this purpose.

What sets Chinese and some related languages apart is that generics in these languages are more predictable and the grammar is utterly simple without any inflection. But exonyms are more complicated. 肯德基州 is one of the names for the U.S. state of Kentucky in Chinese. 州 is a common generic in Chinese, but not every U.S. state is a 州, and not every 州 is an administrative area. Also, 肯德基 without the generic most commonly refers to a fried chicken restaurant chain. A data consumer shouldn’t be expected to know all this just to support 肯德基州 in user input.

I was referring to data consumers, not mappers. Many popular renderers and geocoders have long used Wikidata’s labels as a fallback for when OSM lacks a name in a given language either intentionally or unintentionally. These localized names may be exonyms for famous places or they may be mere transliterations for less famous places. But either way it’s good enough for the data consumer to meet user expectations.

To clarify, all US states are translated as “州”. It refers to “State” more, but Commonwealth doesn’t have a direct counterpart, and basically isn’t translated. In some cases, or internationally viz British Commonwealth, the word can be translated as “Federation” confusingly.
Unless, you want to refer to the Cantonese or historical “麻省” for Massachusetts. That’s usually “Province”.
States are almost always mentioned with “州”. Exceptions, or other cases using the state names on its own include some non-state-established private universities (eg UPenn is “賓夕法尼亞大學” not “賓夕法尼亞州大學”).

1 Like

Since this topic touches upon several concepts that interest me—specifically the omission of generic terms and the transformation of original names into different writing systems or languages—I cannot help but dive into a slightly more detailed analysis. This analysis might lack professional expertise, but I will give it a try. Let’s look at the names of the city and administrative unit provided by the user @higashimado.

We begin with the primary Chinese names—the endonyms. It appears that both names are commonly used in their full form, including the generic term 市 (Shì): 本溪市 (Běnxī Shì). As far as I know, this is quite common in other countries as well, where the name of a specific place and the administrative unit of which it is the center are identical. Interestingly, in Ukraine—where I am from—this is not the case, which I personally find quite helpful. The name of one of our cities (the administrative center) is Cherkasy, while the administrative unit is Cherkaska oblast. Of course, it is not our fault that objects of different natures often share the same name, but as mappers, we still have to deal with it.

I might be mistaken here, but it seems that all foreign-language names for the Běnxī Shì are derived from the Chinese original, and many of them underwent ellipsis during this transformation—dropping the generic term 市 (Shì):

Place: Běnxī Shì

Key Value
name:ar بنشي
name:cs Pen-si
name:de Benxi
name:en Benxi
name:et Benxi
name:fr Benxi
name:hr Benxi
name:pt Benxi
name:ru Бэньси
name:sv Benxi
name:vi Bản Khê
alt_name:en Benxi City

The primary English exonym is the shortened version, with the full name listed as an alternative. Conversely, the following languages use the full form as the primary name, with shortened versions in short_name:

Key Value
name:ja 本渓市
name:ko 번시시
alt_name:ko 본계시
short_name:ja 本渓
short_name:ko 번시;본계

Now, let’s look at the names assigned not to the place, but to the relation boundary=administrative.

Administrative unit: Běnxī Shì

Languages where the shortened name is primary:

Key Value
name:uk Беньсі
name:ru Бэньси
name:ar بنشي
name:cs Pen-si

Languages where the full name is primary (including English):

Key Value
name:ja 本渓市
name:ko 번시시
name:en Benxi City
short_name:en Benxi

As for the values listed in alt_name separated by semicolons for this object, my brief research suggests the following: Benxi Shi is Hanyu Pinyin without diacritics, and Benxi City is a hybrid of Hanyu Pinyin and a translated generic term. There are also two more historic variants: Pen-ch’i Shih and Pen-hsi Shih (both Wade–Giles?).


What can I say after this analysis? Not much, really. It shows that no matter how much we mappers pretend to be dispassionate—merely documenting the most common names—reality is more complex. By choosing a name, we are making a choice among many variants, often based not on objective criteria, but on linguistic habit: deciding whether to translate terms, add them, or cut them off. Removing hyphens, changing capitalization, or dropping diacritics.

If this practice becomes better documented in our Wiki, and we use more specific tags for different variants—such as -Latn or even the -t- tags—our data could become more structured. This would likely make it easier to validate and more versatile to use.

The omission of generic terms is currently being discussed in this thread; if you’re interested, you’re welcome to join!

Unlike Western languages, Vietnamese transcribes Chinese using Sino-Vietnamese instead of pinyin, in order to facilitate pronunciation and preserve the meaning of each character to some extent. The full Sino-Vietnamese name would be “Bản Khê thị”. Chinese-style generic suffixes like thị largely fell out of fashion in the mid 20th century, except to a limited extent in overseas communities. Nowadays, Benxi would be either “Bản Khê” or “thành phố Bản Khê” depending on the context. In a larger phrase, Vietnamese tends to introduce a common noun with a classifier in front ( bánh mì), a personal name with a kinship term (anh Minh), and a toponym with a generic (thành phố Bản Khê).

1 Like

It is not just Vietnamese, Czech and possibly other Slavic languages use their own systems ( Shanghai is Šanghaj for example). If I read the wiki correctly, tranliterations should go to name:en/fr/cs etc namespace, not to alt_name. Even int_name is better for transliteration than alt_name.

You’re likely to be right there. Though the name of the city is derived from 本溪湖 (Benxi Lake), the place itself was largely unknown for the foreigners until coal mines were discovered there in the early 20th century. Then it gradually developed into a city. We could assume most names in other languages are derived from the Chinese name after that.

The different tagging patterns we see today are largely a hybrid outcome of two processes: the initial import of names from Wikidata and later efforts by the local community to normalize transliteration conventions. In practice, this means as follows.

In many cases, foreign-language names were initially imported from Wikidata. These could include the shortened form Benxi, the full transliteration Benxi Shi, or hybrid forms such as Benxi City. Because the place node and the administrative relation share the same Chinese name and the same Wikidata item, these names were often added to both objects.

Later, the local mapping community attempted to establish conventions to normalize multilingual names for both places and administrative relations. See OSM Wiki/Multilingual_names#China. Ideally, the tagging would look like this:

Tag Place Relation
name 本溪市 本溪市
name:zh 本溪市 本溪市
name:ko 번시시 번시시
name:ja 本渓市 本渓市
name:en Benxi Benxi City

Here, name:ko and name:ja are consistent with name:zh, since the CJK writing system often allows character-by-character translation. For foreign languages (especially English), the convention tends to omit the generic suffix for the place node while keeping it in the administrative relation. One possible reason is that Chinese administrative system is influenced by the Soviet model, where generic terms commonly appear in relation names like Relation: ‪Mykolaiv Raion‬ (‪1738977‬) | OpenStreetMap and Relation: ‪Kondinsky District‬ (‪1442693‬) | OpenStreetMap.

However, most Chinese mappers are only familiar with English and sometimes Japanese or Korean. As a result, the conventions were mainly applied to name:en, name:ja, and name:ko, and most edits have focused on these languages. In principle, similar conventions could be applied to some other European languages, for example, name:es=Benxi for the place and name:es=Ciudad de Benxi for the administrative relation. However, it is unclear whether such forms would actually sound natural to native speakers of those languages.

For languages that are less commonly used in China, such as Arabic or languages from Southeast Asia or Africa, neither the local community nor native speakers have usually revised the imported tags. As a result, these names often remain as they were imported from Wikidata. In many cases they remain untouched and may never actually be used by map users.

In summary, the situation is quite complex. What makes it more problematic is that the choice of name variants is often not the result of discussion between native language users and the local mapping community. Instead, it is frequently determined either by part of them or simply inherited from Wikidata without clear source (sometimes even including mistakes, particularly for lesser-known places).

1 Like

That would differ from the tagging convention in Spain itself, where typically a place=city node and a municipality administrative boundary have the same nametag. In the very common situation where there is also a province with same name, even that higher level admin unit has the same name tag, with e.g. “Provincia de Sevilla” in the official_name tag.

(I have just noticed that the MapTiler layer, with browser language set to English, labels the city I live in as “Málaga City”, which I find quite jarring. I’m not sure where that comes from).

2 Likes

It looks to be the English Wikidata label of Node: ‪Málaga‬ (‪21750065‬) | OpenStreetMap : Málaga City - Wikidata

Unfortunately Wikidata labels sometimes include disambiguations or titles in the name, rather than in the description

2 Likes

The issue is that the mentioned node Málaga is missing a name:en=Malaga.

So it is, although I wouldn’t have seen that as an issue given that other OSM-based maps I use, both desktop and Android apps, display “Málaga”. I’d say the issue is more with invented names on the wikidata side (or maybe these are not really intended as names, I don’t know enough about Wikidata conventions). Very few Spanish places have a name:en tag (apart from genuine exonyms like Seville), but most of them don’t have this issue (“Zaragoza City” was the only one I noticed scanning the map quickly).

I could add the name:en tag but I would probably fall into an existential crisis about whether the English name is “Malaga” or “Málaga”. I have never quite managed to decide this myself, and have probably mixed them inconsistently on this very forum. English Wikipedia uses the latter, for what its worth.

Anyway, I have probably diverted this thread too far from alt_name. I certainly won’t be adding versions with and without the accent to OSM tags…

1 Like

It’s a matter of style – that is, personal preference. Most English speakers wouldn’t care so deeply about what goes in the name:en=* tag of a city in Spain. It doesn’t affect comprehension at all, and not too many monolingual English speakers live there anyways. But it is an interesting challenge here in the United States, where a good chunk of the country has inherited place names from Spanish and other accented languages, yet the stereotypical American only speaks American.

The diacritics can even take on a political dimension, regardless of whether Nominatim can handily perform diacritic folding. Stroke City it ain’t, but we apparently need an alt_name=* just to keep the peace.

3 Likes

The Americana OSM map illustrates that the OSM “gazetteer” can be already well developed.

1 Like

7 posts were merged into an existing topic: Wikidata de la capital de municipio