Fair. But what I was really getting at was perfectly summarized by Minh:

The mockup labeling the two names as “Chinese” and “Korean” struck me as misleading. It’s accurate to indicate that one name is Chinese-derived and the other Korean-derived. It’s also accurate to indicate that one is preferred in China while the other is preferred in North Korea, as your flag mockup did. But appending the English word “mountain” makes both names English, no longer “Chinese” or “Korean”. One of these names is far and away more common than the other among English speakers, so
name:en=*
andalt_name:en=*
have obvious values.
There’s a big difference between
name:ko-Latn=Baekdusan
name:zh-Latn-pinyin=Chángbáishān
and
name:en-t-ko=Baekdu Mountain
name:en-t-zh=Changbai Mountain
The former two are the transliterations from the original writing systems to the Latin alphabet; the latter two are “English names” half-transliterated from their respective source languages. However, none of these is what the mountain is actually (predominantly) called in English.
As I wrote in my first reply, I think a big part your original premise is sound: there are some name:en=*
values (and probably the case with many other languages) that are currently in the OSM data that probably shouldn’t be tagged as such, because they’re merely a transliteration, translation or other such form of transformation from the original endonym. That’s worth identifying in some manner; your proposed solution could be workable for such a purpose. But as I cautioned earlier, there are many perfectly valid name:en=*
values in the data that you may mistakenly conclude “shouldn’t be recorded in English”. It’s hard to tell sometimes what’s genuinely “the name in English” and what’s not, and especially prone to error if your default presumption is that a transliterated name is probably not “the name”.
As pointed out earlier, I think name:en=Kapyĺ
for this small Belarusian city is a great example of something that is obviously not true, and quite blatantly is just a transliteration from the original Cyrillic script to Latin. I would wager almost every English-speaker in the world has no idea what an ĺ would sound like. Amusingly the int_name
, name:de
and name:fr
tags are all Kapyl
, sans the acute accent, and the city itself is tagged as the administrative seat of the “Kapyl District”. Like I wrote earlier, the acute accent is a giveaway that this is categorically not “the name in English”: English uses some diacritical marks, especially for foreign names and loanwords, but I can’t think of a single use of an acute accent on an ‘l’ or any other consonant. (In fact, pretty much the only accent marks one sees on consonants are the c-cedilla in words like façade and the n-tilde in words like piñata.)
I agree with your “more information is better” and “give the data consumer more flexibility” principles, but still don’t really see an ultimate usable purpose. Your example of tagging Ukrainian names for places in Crimea based on either Russian or Crimean Tatar sources is kind of fascinating, as a piece of etymological trivia for someone who doesn’t speak any Ukrainian (or Russian, or Crimean Tatar), but… how would a data consumer make practical use of it? Not to say it couldn’t be done, and maybe for political or cultural reasons in places like Ukraine (or more specifically Crimea) app functionality that could pick and choose between the etymologies to display could be usable/desirable/popular, but from my perspective it’s hard for me to picture any real-world use. If anything, it may lead to the kind of misconceptions that Minh pointed out with respect to displaying both “Changbai Mountain” and “Baekdu Mountain”: it makes it seem as though both are commonly or interchangeably used in English, when that’s not really true.
Apologies for not giving this a ringing endorsement; I do actually think this is otherwise a fascinating idea.