[RFC] Feature Proposal - Add languages: tags for name rendering (2nd revision)

SomeoneElse · December 16, 2024, 11:39pm

I think that they’ve already provided the tools (calls to lua functions at data load and at update) that allow whoever’s loading the database to do whatever they want with whatever keys they want. If they need to store a name in just one language, or several, they can; if they also want to store a bunch of other flags that say things about names, they can do that too.

FWIW Tilemaker (another route to creating vector tiles, this time without a database) is similar, and you can even share quite a lot of the code between them.

The challenge here is that literally every data consumer will have different requirements, as there is no “correct” answer to “what language should this be shown in”. There’s actual lua code all over Github to do basic stuff like “use name A or name B” or “combine name A and name B”; I’ve certainly got plenty of examples - one literally on my screen at the moment is a name that’s composed of the “name” and “operator” tags for a feature.

Speaking generally about the pgsql lua processing in osm2pgsql, that hasn’t been my experience. Database loads of my map style (with lots of lua) don’t take orders of magnitude longer than OSM Carto (with much less).

Minh_Nguyen · December 16, 2024, 11:43pm

The previous discussion cited turn-by-turn navigation instructions as a motivation for this proposal, so feedback from developers of existing OSM-based navigation software may be valuable too.

phodgkin · December 17, 2024, 9:24am

It may be that I’ve misunderstood the proposal. I understood it involved placing the preferred language order tagging in admin boundary relations. Hence every object with a name (or at least >1 name tag) would need to “know” which was the relevant admin boundary in a cascading CSS-type fashion.

This would go way beyond conventional tag transformation on import. I think the osm2pgsql flex output may be able to cope with this, but it is not obvious, at least to me.

I agree that the actual handling of the name tags would be relatively straightforward and usage dependent.

dieterdreist · December 17, 2024, 10:46am

I agree with those who wrote that “presentation_order” is not geodata, and is referring to rendering or similar data usage. At the very least, the tag should be renamed to something indicating why this order is suggested. The result could be similar, but the concept should refer to facts and not to “presentation”.

Maybe this is to store which languages are locally relevant, e.g. ordered by the number of speakers in the place? Should it cover the situation where some languages are legally mandatory and should these be distinguished in tagging from places where it is “voluntary”?

Jarek · December 17, 2024, 2:57pm

In the first thread the initial suggestion was to create languages:official and languages:preferred, but this was changed on the basis of difficulty ascertaining this data (e.g. lots of places don’t have “official” languages) and problematic social and political implications (e.g. in some places languages which are not official or which are in deep minority of speakers are actually widely used on signs).

What would the real-world significance be of whether the languages are legally mandatory or “only” voluntarily used on all signs?

We discussed something like languages:used_on_signs, do you think that would be better? As I understand it, presentation_order refers to “the order in which languages are presented on signs”. Naming is difficult, so could you add your suggestion?

The goal of the proposal here, as I understand it, is to encode in a machine-readable format the real-world practice of which languages are used for names and in which order. It’s meta-information about names. It’s about as geodata as tagging wikidata=* on an object.

I do think that the proposal could benefit from some example pictures of multilingual name signs seen in real world, to make it more obvious what it proposes to solve.

Jofban · December 17, 2024, 3:45pm

And the city of Biel/Bienne may swap that ordering to match current OSM rendering:
languages:presentation_order=fr;de

Isn’t this the wrong order though? Street names are german first, then french, so languages:presentation_order=de;fr? I think it just hasn’t been corrected from the first proposal.

Minor point, but I see ; as an unordered list (or a set), while | seems more common for ordered lists (see e.g. Lanes - OpenStreetMap Wiki)

One thing I might add, is that currently, the OSM data for Biel/Bienne is not enough to actually determine the german or french name of the city, as both official_name:de and official_name:fr contain Biel/Bienne! How does this proposal deal with that? Is that out of scope?

I think this proposal is particularly useful for the case of signage. There, it can also be easily verified. If a common standard is put into law, I can see how that might be tagged at administrative boundaries, otherwise it can and should be tagged at the object itself (including e.g. a multilingual amenity=restaurant as rare as it may be).

Last point, there are several ways to combine the different names into one. The usual one is below each other, another one combines the shared part, and I’m sure there are situations where the signage itself uses - and /. How could that be encoded? We don’t necessarily have to encode that, though it might be useful as a “Look out for this sign” kind of thing. And of course 3D-micromapping.

aseigo · December 17, 2024, 6:58pm

I would not expect it to be done on ingestion, as it that would also mean re-calculating all names when the relevant tags change.

My expectation is that it would, indeed, be done on actual render using a spatially-indexed set of admin boundaries that have these tags set on them. That set should be small for any given point, and would be the same for regions of areas.

When pulling straight from an GIS database such as PostGIS or Spatialite, it should be a single query for any given area. For mobile apps, this could be included as a pre-calculated set, again spatially indexed for fast lookup.

Minh_Nguyen · December 17, 2024, 7:15pm

This sounds plausible when you put it that way, but it would require brand-new technology in multiple parts of the software ecosystem that weren’t architected with this sort of functionality in mind.

In popular vector tile formats like MVT, each tile needs to have the full context for the renderer to evaluate each feature. There’s pretty much no such thing as global context, let alone an index like what you’re describing. The closest thing would be TileJSON metadata, which is optional and probably doesn’t support geometric data of any kind.

Furthermore, in the popular stylesheet languages like the Mapbox or MapLibre Style Specification, each individual feature is evaluated based on its properties without any global context either. The closest thing you can do is evaluate a style property based on a hard-coded GeoJSON feature, but not one from the tiles.

Raster maps are quite different in terms of their constraints. The OSM Carto maintainer has written about an experiment that attempts to tackle the problem in a manner similar to what you’re describing:

aseigo · December 17, 2024, 7:43pm

Indeed, it’s not a trivial topic. But it’s one that maps deal with, and it isn’t optional. Which is why tools to help should be provided.

If boundary=aboriginal_lands is meant to imply an administrative area then it should be included in the set of boundaries used to resolve names.

FWIW, this case is very similar (if not identical) to the case of Haida Gwaii.

Can you describe in more detail than “a list” what sort of “more” and “different” cases you would find useful?

Are you envisioning a list of every country and their administrative regions and a green checkmark next to each with an explanation of the tag values to be used?

A list of intriguing and unusual situations like Thai language street signs in American cities or Gaelic language districts in Northern Ireland?

If you can offer a bit of insight into what would be helpful for you to make a decision, I can see what is possible.

At the same time, I don’t want to spend time for no benefit, nor do I want to make the proposal’s text unapproachable by adding dozens of identical situations.

That varies so dramatically from location to location, I’m not sure what could be said other than “adhere to local customs and policies”.

That’s actually a strength of this proposal: if changes needed to be made, they can be made in very, very few places (as few as one!) to propagate a change.

It makes “getting it right the first time” less important, and “getting it right every time we tag something” no longer a concern.

This is already covered by the OSM wiki page on multi-lingual place names, which is linked in the proposal. I’m not sure it is worth duplicating those recommendations and guidelines, especially as they also very by locale …

There is currently no global consensus about how name vs name:<lang code> works, beyond the recommendations on the multi-lingual names wiki page.

This is not because it has not been discussed to death, but because it has been hard to find global consensus (as noted in the proposal) and the amount of work required to make accurate changes to name tags in an area can be quite large.

I consider the issue of what exactly should be in the name tag to be unresolvable so long as we leave the decisions up to individual editors. This is a problem that exists currently, and not one I expect there to ever be a satisfactory conclusion to.

It’s why I started looking into other solutions, leading to this proposal, and why this proposal specifically does not provide guidance on the future use of the name tag.

Minh_Nguyen · December 17, 2024, 7:53pm

That’s an interesting approach. I wonder if this will require a high degree of user education, given the syntax. Whatever you call the subkey, it’s still a subkey of name:*, which looks like a property of name=*.

aseigo · December 17, 2024, 7:56pm

Note that the text says “to match the current OSM rendering”. Currently, the rendering is French / German.

For street lanes I can definitely see the visual metaphor being employed with |, but as counter-points:

; is a common separator in tags people are familiar with
a;b is no less or more ordered than a||b, both are one dimensional lists of graphemes with a separator character
| is quite similar to various graphemes such as l and as such can be a readability problem

I’m happy to adopt whatever is the OSM convention for ordered lists, however. I haven’t been able to find any documentation on this on the wiki, but I’m probably just looking in the wrong places. Do you have any points to conventions for ordered vs unordered lists of values in tags?

For exceptions (and they do exist, one is noted in the proposal even), the languages:presentation_order:self tag is proposed. This catches the cases you note here, as well as more difficult ones where the tag should not propagate further.

It shouldn’t be encoded at all, but left up to the renderer.

It is impossible to know what sort of separator is appropriate in all cases for all renderers, and encoding them differently for different regions would introduce inconsistencies between regions (including nearby ones).

Minh_Nguyen · December 17, 2024, 8:08pm

“Multiple values” discusses the various approaches to storing multiple values for a given attribute, with semicolon-delimited value lists as one option. Of the keys that require the use semicolons, some define a meaning to the order, and there are some keys that might have a subtle implied order depending on the mapper, but in general the values of a semicolon-delimited list are unordered.

aseigo · December 17, 2024, 8:13pm

Indeed! And this is, as you note, not a trivial problem.

However, I am not sure how to even begin tackling this aspect of things without an agreement on how we will record language metadata. If I show up with a proposal or, better yet, a patch to a given renderer for tags that are not at all in use, I would expect the odds of the change being accepted in the renderer to be ~0%.

However, if there is language metadata available in the dataset, then we can start working on the renderer issues. (By ‘we’ I mean those of us who are software developers and those familiar with GIS, not necessarily OSM editors, of course! )

So I see it as a chicken-and-egg problem:

Renderers won’t take on added complexity for something that does not exist
OSM editors are probably hesitant to add data that won’t be used

Oof.

One reason to focus on admin boundaries is that this is a (relatively) small number of features on the global map, and ones which are fairly well understood. This means that the effort required to add the metadata is kept low, mitigating the effort-spent risk by OSM editors … and raising the odds that it can be done fairly quickly and across enough of the world to be useful.

Only at that point would I expect we can start to get work done on the renderers.

This is also true for editing software, though with less tension between effort-and-effect.

As for rendering techniques that can not dynamically access a spatially indexed dataset to inform their rendering, I can offer two ideas though there are likely other ideas somewhere out there that are better:

The name tags could be kept up to date programmatically using the languages:preferred_order, so renderers incapable of anything more could continue to use the name tags without fear of them falling out of date.
The names could be made available via a separate service offered specifically for such renderers. This is similar to the first case, but dynamic and would not need to modify the OSM dataset itself.

And of course there is the simple answer of “just keep using name: as it currently is”, though I’m not a huge fan of that as a long-term solution.

How to iterate the software rendering is a discussion we can have once we have a way to record the metadata, and at that point I’m sure that we’ll find good, workable solutions as software devs put our heads together.

Indeed, I’m certainly not the first one to note this approach!

From a GIS software perspective, it’s kind of the “obvious” path forward, and so it is both unsurprising and comforting to see others have suggested it.

The meanignful step that is so far missing is for the OSM editing community to buy into the idea and agree on a way to record the necessary language metadata.

Minh_Nguyen · December 17, 2024, 8:18pm

Oh, fortunately, I don’t think this level of complexity would be necessary. As a general rule, a renderer, geocoder, or router doesn’t call out to a service on demand any time it needs to evaluate some feature. That would be prohibitively expensive and impractical for most use cases. Instead, data consumers would need to postprocess OSM data to copy the relevant details onto each feature, reminiscent of how renderers already deal with route relations or join OSM to Wikidata for translations. I suppose that’s why osm2pgsql came up earlier.

aseigo · December 17, 2024, 8:21pm

I have added this to the proposal, along with how we could deal with future tag additions if required.

Thanks for providing a concrete example in the current OSM dataset, as that was very helpful.

SomeoneElse · December 17, 2024, 8:41pm

What matters is not what the proposal says, but what mappers actually do. I genuinely don’t think that people are going to be able to create a cohesive hierarchy in the way that you describe**, for many of the reasons already mentioned, so no-one’s going to have to worry about doing that. What we might get is country defaults, with a layer below that (which may not be strictly administrative) with language tags on that too. If we get a set of gaeltacht and non-gaeltacht areas in Ireland, I’d be a happy bunny.

To give an example of how people do this stuff now, this is the code (before osm2pgsql) that I use to say “this bit is Welsh-language first”. In the Irish case I’d expect that that could be changed to use “something” that defined preferred languages based on data in OSM. Minutely updates is a challenge, because although the same lua code is called as at import, you need to figure out where you are, and if all you’ve got is a bunch of tags on a way, in lua you don’t. It’s not an insoluble problem, but it’s not straightforward.

** not strictly a “language” issue, but one classic example of “what people call things” is this.

SomeoneElse · December 17, 2024, 8:51pm

To be honest, I wouldn’t get into speculating about how data consumers might work now. I’m familiar with a number of display examples (CartoCSS, MVT and mkgmap / Garmin) and they’re all very different. The common factor in each case is the need to output data not directly to be displayed, but to help the thing doing the displaying do it better. For example, in the Shortbread schema you’ll find explicit “rows” and “cols” output for the ref value so that a nice highway shield can be drawn.

If you can figure out a way of storing data that will cope with some of the edge cases and (most importantly) actually get used, people will figure out how to consume the data.

aseigo · December 17, 2024, 9:13pm

I, too, expect the overwhelming number of areas to have either one, two, or three levels language preference. Three levels will not be that unusual, and are actually quite easy to identify on the map currently. But your point stands…

Same!

That’s a great example; thanks for sharing, and does show the possibilities, as well as the details that will need working on in the software side.

Jofban · December 17, 2024, 10:21pm

Yeah, I confused myself. Point withdrawn.

@Minh_Nguyen pointed to some examples, though honestly, the need for ordered lists is probably so low that no standard has been established. The only two use cases where order matters (and that I know of) are lane tagging and opening_hours syntax; the latter being so complex and custom that I wouldn’t use it as an OSM convention.
I probably overstated the usage of | since to my recollection those uses are all in some way related to lanes. Though I’ve tried to get it used in the context of phone numbers.
Whichever gets chosen, it’s probably safe to document that the order matters, even if it might be obvious.

Most map renderers should just ignore it, I agree. It’s just something of interest when actually wanting to map what exists on the ground.

Not quite the answer I want, so I ask more directly: Is there a law or statute that prescribes the ordering of names on street signage? One example suffices, ideally in english, otherwise please translate the relevant excerpt. I certainly wasn’t able to find one for the canton of Bern, but I’ve been wrong before.

Originally, I was going for a sort of “If there is no law, then it doesn’t belong on an administrative boundary”, but then I wondered about the national speed limits in Germany, since those aren’t mapped there either despite being law. And I found the StVO relation. It’s of type defaults, which then applies to the area of Germany. Maybe that would be a better approach. Also allows for reuse in case the same law applies to different regions (I’m thinking of the gaeltacht areas discussed here).
What to do if there is no law? I’d say only tag the object it applies to, though it can be difficult to discern if it’s a convention among all street signage.

Minh_Nguyen · December 18, 2024, 12:44am

If the proposal ends up looking anything like that StVO relation, I would have a hard time imagining nontrivial uptake by data consumers. However, I applaud the creativity in translating an entire legal code to a Turing-complete programming language.