Multiple delimited names in the name tag

I absolutely agree it suffers from multiple flaws in its current state (even it’s own wiki says so!). That’s why said improve on solution like default_language=* / language_format=* and not “use it verbatim in its current state”. For starters it should have clearly delineated variables (e.g. something more along the lines “${name:hr} - ${name:sr-Latn}”)

Some are easily fixable by having better syntax, some has already handled counterparts (e.g. coastlines), and some are inherent but handleable (e.g. Antwerp extract would either have to duplicate default_language for itself, or have extract generator add it automatically, or store it separately somewhere or simply have user defining its own preferred rendering which maybe be same or different than “official” one) and some are actually easier to use then alternatives (e.g. vandalism case requires fixing just one tag, instead of many thousands of potentially modified objects for which clean revert is problematic in case if user just blatantly replaced multiple name tags and those objects changed afterwards).

None of those problems seem insurmountable to me; but yes, they would need extra discussion if people are interested in such more versatile solution, which is why I suggested it for consideration (if noone is interested, I certainly do not intend to start one-man crusade war over it :smile:)

  • What to do when one of the referenced keys is missing?

Whatever we want, eh? Simplest solution “just substitute null string” is admittedly not very nice, but even some basic handling (remove trailing fixed chars before null variable) produces much nicer results. Or if needed one can go more advanced ways if needed (e.g. posix shell variable expansion, ternary conditional operator, etc.) or even hardcode some rules. But I’d personally prefer to keep it somewhat simple.

  • OSM XML doesn’t support newlines in tag values, but what if a newline is the best delimiter?

The same as you would do in name=* case - you’d have to represent it somehow (common syntax is usually ASCII sequence \n, but one could use UTF8 shenanigans instead). But it would be much simpler to add it once by few experienced users in more powerful/customizable editor for the whole country/locality, then to depend on zillions of users on the ground with their different apps to all support it correctly and have all those zillions of users educated to use it correctly to map every single name.

  • What should default_language be when the delimiter isn’t inherent to all the features contained within the boundary but rather depends on the map designer’s stylistic preference? (Do we agree that this is a legitimate opinion for a designer to have?)

Absolutely, in fact giving that freedom to map designers is one the my main goals behind that idea (and even more extended than that - I think every user should have possibility to become simple map designer by tweaking rendering profile to their needs if they so choose. Sure majority want, but they should be able to). So, instead of having name=aaa / bbb that some random mapper on the ground has chosen as “best” and having map designer be at their mercy, the map designer would be one with power to choose what to render. They could take that default_language and use it verbatim (e.g. similar to current osm.org map), or they may decide to replace that “-” inside default_language (or “/” or “;” …) and replace it with newline or a picture of a red star or whatever. Or they might extend provided default_language with a newline and int_name. Or they may decide to disregard default_language altogether and render names for whole world in Croatian only (or whatever). Or they can have simple (or complex) set of user preferences they want to follow depending on the users will. IOW, map designers should be able to decide whatever they feel best for their specific use case!

Why not acknowledge the reality that name has multiple values in it?

I do acknowledge it is the reality (what I do not particularly accept is the claim that we should encourage the users to do more of it, instead of less of it). In fact, that reality is actually one of the main reasons I think it very uphill battle trying to convince every mapper in the world to map names in some specific way. It would IMHO be much simpler (and more realistically doable) to hand-craft and curate few hundreds (or thousands) default_language tags, then to try to handle every single name=* tag in existence and their additions/changes (even with such effort supplemented by very good AI bots).

If default_language is such a good idea, it can contain a semicolon too.

Sure, it can, if some locality really likes such separation of names with literal semicolon “;” characters (hey, I’m not judging!)

My experience is somewhat different about that “effectively”. Sure, it is possible to support them, and some have done it (to some extent at least), but majority I’ve seen do not seem to really handle them (and even more importantly, is it often impossible to even define them that way in ideal or even useful way, even with all goodwill of data consumers at disposal, as mentioned before with sidewalk example). (Additionally, parsing it sucks at efficiency - if I want to find every element that offers Croatian cuisine, I have to do extremely inefficient fulltext search on cuisine=* tags to find results containing croatian substring – if it were instead tagged as cuisine:croatian=yes, it would be very fast indexed find).

It would be nice if this topic could stay focused on multiple delimited names in the name tag and not get side tracked into a discussion of language defaults for areas. Although the ideas are related, this topic already quite long. Perhaps one of the @mods-general could split off the messages about language defaults into a separate linked thread?

I’m not hard against, but do note that many existing posts are quite intertwined with parts about disadvantages of “multiple delimited names in the name tag” as well as suggestions to modify/improve them (which should IMHO definitely remain in this thread), as well as suggestions for alternative ways to accomplish similar result (which might indeed benefit from being in new thread).

Perhaps new replies at least should each be split in two different messages? (one in new thread commenting on parts of messages related to default_language-alike methods, and one message in this thread commenting only on the name-alike method. Although I do envision it would be hard to keep such messages usefully crossreferenced, if one tries to compare their pro and contra. :slightly_frowning_face: )

default_language’s blast radius is too large for any data consumer to use for any use case that involves reuse or caching. Think of all the commotion whenever the coastline breaks and floods the world, or when the Great Lakes dry up, and how long it takes for the Standard layer to recover. Now imagine that multiplied by literally everything in a country at every zoom level. No changeset can cause anywhere near that scale of disruption by modifying individual name tags. Meanwhile, any legitimate change to a default_language tag would require modifying every name tag in the country. This is one of those ideas that sounds great on paper until considering how OSM is produced.

1 Like

I will consider this, but I won’t be able to give this potential creation of a new language default thread any attention for 6 hours or so.

The good news: OSM Americana now supports the semicolon delimiter in name, name:*, and also ref (for things like terminal gate numbers and highway exit numbers).

The bad news: OSM Americana can’t support slashes, dashes, and spaces as delimiters. But just imagine if the places that use these delimiters were to migrate to semicolons.

1 Like

Pompously announcing a really bad idea doesn’t make it less bad. The “other prominent OSM data consumers” are only doing a quick fixup to avoid ugly breakage in the name of being lenient in what they accept, that is not the same as “support”.

As has been pointed out multiple times in this thread, things are not so simple. Often in (proper) multi-lingual regions the -actual- name of the place is composite and is customarily written with a separator.

Please stop trying to rearrange the world according to a naive, CS-driven, concept of normalization.

The idea with semicolons may look neat and clean to a computer programmer but it solves preciously little for the name tag. name tags with multiple languages in them is just the very tip of the ice berg when it comes to problematic content. We also have descriptive names, names with extra info, categories in names, names with full route descriptions (any PT route). Each of these has its own particular problems for data users. When you add semicolons to the mix, you just pile yet another format on top of all that already exist.

If we are looking pragmatically at the situation, then the de facto use of the name tag has been for a long time to be the label or display name of the place. That is nowhere written down because we always strife for a name tag that adheres to the definition in the wiki. However, in reality it is what mappers tend to do (because of the feedback they get from the map) and because it helps to avoid conflicts. Maybe it’s time to just accept that the name tag is on of the ‘human’ tags in OSM, only to be displayed but not interpreted by a computer. As long as we make sure that the necessary data is also available in tags that are machine-readable, that’s a workable compromise.

My personal suggestion her would be to introduce a new tag display_name and get carto to render that preferably where it now renders name. Then advertise the tag among data user and start slowly moving non-names into the new tag. No mass edits necessary. Just rename tags when you come upon a problematic use. If it takes 10 years to get to a clean name state, that’s fine. No rush.

I certainly don’t think that it is a particular good idea when a single data users imposes a format for a tag that breaks pretty much everybody elses map.

3 Likes

That’s fine. If the simulated screenshot would be wrong in any of these places, then by all means the name should stay as is. For the features that are using a semicolon, however, it’s clear that the mapper’s intention was not for the user to see a semicolon. If Americana is avoiding ugly breakage, then you’ve written a better headline than I was able to come up with.

Can you elaborate on what’s broken as a result of Americana or any of these other data consumers (plural) interpreting a semicolon as a value separator? Americana still renders slashes, dashes, and spaces as slashes, dashes, and spaces. If you’re concerned about semicolons getting misinterpreted, so far, I’ve come across only one name in the whole world that properly contains a semicolon in the real world – and it’s escaped as ;;. So if anything, that feature is broken in any data consumer that does not interpret the semicolon as Americana does.

This would be grist for a separate topic, about which I’m pretty sure you and I would see eye to eye.

If separating multiple equally primary names with the standard semicolon separator is such a bad idea then it would be helpful to explain why you think that. I don’t particularly want to see further proliferation of multiple names stuffed in one tag in cases where name + alt_name + *_name + name:* would be a better representation. However, with multiple names in the name tag being a common political compromise, I’d much prefer to see the standard semicolon delimiter used in those cases.

1 Like

I’m glad to hear that you recognize the challenge that we face, and I appreciate that you have some concrete ideas to solve the problem. If this thread has shown anything, it’s that a lack of data standardization can cause real problems for real data consumers when alternate tagging schemes are in competition. If the community comes up with a better data modeling solution, I’m confident that the Americana project and the broader US mapping community would adopt it.

In the meantime, with my “maintainer of a community renderer project” hat on, I support the views of my fellow maintainers that supporting semi-colon delimiters is the least bad option available in the face of multiple conflicting methods to solve the same problem. Sitting around and waiting for the community to invent a better solution is inconsistent with the zeitgeist of the community around our renderer. Supporting innovation is an explicit goal of the project, and I expect we will continue to innovate in the future on long-standing challenges in OSM-based cartography. If that philosophy exposes areas where the OSM data model can do better, I consider that a positive outcome.

I recognize the unfortunate situation that rendered names with a semi-colon will look poor when rendered on maps that have not chosen to interpret a semi-colon as a delimiter. Rather than complain about the situation, I hope those on this thread with strong feelings will consider this a call to action to work with the community to solve it properly. I appreciated your response to a question on tagging standards during the recent OSMF election:

The evolution of tagging is a question I consider a core responsibility of the community that should not be decided top-down by the OSMF board. However, it is a topic where the board could give the necessary support to bring the topic forward by organizing a working group. As with the data model, such a working group would need to start with a study that researches the different options of standardization or consolidation of our tagging system, so that the community can have an informed discussion. Only then can we talk about how the OSMF can support a concrete evolution step.

If you were serious about this, and it wasn’t just an offhand statement to mollify the portion of the electorate that feels strongly about tagging standardization, consider this an opportunity to put your suggestion into action.

  • As has been pointed out places do have composite names (@lonvia touched on other complexities that in the end cause similar issues) while they might be built by concatenating semi-independent strings, the result is still a name in its own right.

  • Turning the previously unstructured name tag in to a structured tag is just a tremendously bad idea, it changes the semantics of one of the most used attributes (and @Minh_Nguyen was asking for all punctuation to be converted to semi-colons, not just handling the odd misused tag) and will loose information on a big scale.

PS: poster child example Biel/Bienne - Wikipedia

1 Like

The problem right now with non-standardized delimiters is that there is no way to distinguish the case of a “name in its own right” from there being two equally valid but different names used by different local linguistic groups. This is specifically the situation that normalizing on the semicolon separator for equally valid but different names can help distinguish. If the name really is hyphenated, don’t change the hyphen to a semicolon. If the name really isn’t hyphenated in practice but there are two different versions that are equally prominent and valid, then don’t use a hyphen, use a semicolon.

I believe that you are misunderstanding Mihn’s comments. If the single name is understood to include hyphens and other punctuation, then they should be kept. The cases where semicolon should be used is where local speakers of one language use one name and local speakers of another use a different name and those linguistic groups don’t have a unified understanding that the name should be compounded.

3 Likes

Seems a bit alarmist to say that semicolon-delimited multiple names is breaking anything or that it imposes a tagging scheme on others. Nobody is making anyone change existing ad-hoc delimiters that communities already use.

For those used to locally standardized delimiters in names, consider places where there may not be a standard delimiter. This road in Indiana, USA has two equally important names, both in English. These names are posted on separate signs, so there’s nowhere for a delimiter to go. The US doesn’t really have a precedent for this. Are we supposed to invent a delimiter? If so, why not a semicolon?

3 Likes

Possible alternative if one has reason to preserve the existing non-semicolon separator in the main name tag:

name=Biel/Bienne
name:separated=Biel;Bienne
name:de=Biel
name:fr=Bienne

(name:separated would override name for renderers like OSM Americana.)

So it would do the wrong thing?

1 Like

I’d love it if the Americana team could tune down the rhetoric two
notches from “hey we’ve solved this problem for everybody, now please go
ahead and change all the name tags” to “hey we’ve solved a local issue
we had, we acknowledge our renderer won’t work for everybody because of
that but that’s fine, it doesn’t have to, we’re Americana after all” :wink:

I would also appreciate if you could recognize that while there might be
renderers who have “not chosen to interpret a semicolon as a delimiter”,
most renderers will have “chosen to not interpret a semicolon as a
delimiter”.

4 Likes

Rendering Biel and Bienne on separate lines, with the user-preferred language on top, is not wrong. (Rendering the official name Biel/Bienne on one line is of course also fine, but the point is to give the renderer more options.)

2 Likes

I asked for no such thing. In this thread, I asked for the community to be aware of the longstanding usage of semicolons as an option, so that data consumers could support it without fear of a backlash. In a way, my request has been denied. :wink: (To clear up any confusion, the word “support” can mean “to know what to do with”, not necessarily “to campaign for”.) Unfortunately, this thread has become so long that folks just now coming into the discussion have probably gotten an overly simplistic view of the situation.

In the other thread I created, I thanked those who have been using a semicolon when appropriate. If expressions of gratitude to mappers are problematic, then let me replace it with a profound apology for being grateful.

You’re right that some may have made this choice. You must know much more about renderer developers’ intentions than I do; I had no idea most have considered and rejected the idea of pretty-printing semicolons.

Incidentally, there’s a new localized renderer on the scene, Tracestrack, which I just found out about from weeklyOSM. Their preferred delimiter? The empty string. It works well enough for Hong Kong, which separates Chinese and English names with a space:

https://twitter.com/tracestrack/status/1592246528152076289

Like Americana, they render the whole world. I can’t get it to show the English name of Milan, but on the bright side, Milano is an English name that happens to be my favorite snack.

The thought process that inevitably leads to this point is:

  1. Users want to see a map in a language they know. OSM has lots of name:* tags for this purpose, so we’ll show those instead of name.
  2. For things like cities, users also want to know the name in the local language, so we’ll get it from name. (And default_language would be a nonstarter, if for no other reason than the potential for massive vandalism.)
  3. Yuck, repeated names all over the place. Deduplicate the names by searching for the preferred-language name in name.

Americana took an additional step in avoiding false positives by requiring the matching duplicate name to be surrounded by semicolons (but not ;;) before removing it from the label. Unfortunately, “Americana avoids false positives” didn’t occur to me as a subject line last night.

That’s all well and good with the semicolons as separators and it may make sense in some cases.

Nevertheless, I find it somehow selfish from the renderer’s point of view to want to have more options at all costs and to offer the user a user-specific map with names from the user’s supposed preference. This ignores the efforts of genuine bilingual or multilingual regions and the local community, which consciously use multilingual names in the name-tag for very specific reasons, often especially cities and municipalities use these multilingual names highly officially.

If it’s just a matter of making the display of multilingual names in the map image “nicer”: well, my computer science studies were 30 years ago now, back then I was still programming with Turbo Pascal and C++, and today I don’t know anything about it anymore. But I assume that almost all multilingual names also have the specific name:*** tags in the name tag. This allows the name-tag to be broken down into its language-specific components and then reassembled individually, in the desired order or among themselves and with any desired separator.
I even managed to do this with the example of “Cottbus - Chóśebuz” with a small Excel table (data basis: copy of the place=city - node from the editor):
image
The whole thing works without a single semicolon! So without a semicolon in the name, for the formulas in Excel sheet you need one or the other semicolon of course :wink:

I have not yet taken into account the French name just added by a colleague. :slight_smile:

Translated with DeepL Translate: The world's most accurate translator (free version)