Atlantic Ocean: repeated name removal

ezekielf · September 1, 2023, 6:07pm

OpenMapTiles only includes named features in its label layers since including all features would be a waste of resources. Clearly the code authors assumed that any feature with one or more *_name or name:* tags would also have a name tag and therefore testing for the presence of name would be sufficient to identify if a feature is named or not. This seems entirely reasonable to me, but the logic clearly breaks if we are going to accept the lack of name on a named feature as valid tagging.

More complex logic for identifying named features could surely be written. The condition would need to check for all possible name key variants, or all matching a certain pattern. Or we could just keep things simple and say that a feature with one or more *_name or name:* tags but lacking name is a tagging error.

pnorman · September 1, 2023, 6:19pm

In practice people are going to assume this.

Minh_Nguyen · September 12, 2023, 1:05am

Presumably the addition of name to any of these features would not be accompanied by the deletion of any *_name or name:*. If name is removed, inevitably leading to the addition of noname=yes, this also would be an oversimplification. Worse, it would be conflated with the situation in which something no longer has a primary name (just an old_name) in the local language but may still be known by a name in some less relevant language. This surely happens sometimes with places and natural features.

Hungerburg · September 22, 2023, 11:51pm

Obviously, there cannot be competing *name:*= on rivers where both sides speak the same language, but still have different names - so how is that handled, in case of the lakes mentioned? In my area this affects a river/stream that makes the country border.

amapanda_ᚐᚋᚐᚅᚇᚐ · September 23, 2023, 9:14am

I don’t know of a waterway with this, but in Ireland, we use regional language suffixes to tag that a certain place has a different name in (Hiberno) English name:en_IE and (British) English name:en_GB. Perhaps that approach could be useful here

Minh_Nguyen · September 23, 2023, 11:27am

To elaborate on my earlier example, this reservoir is located entirely in Ohio. However, locals and the state government, which manages the lake, insist on referring to by a compound name, while the federal government insists on a simplified name due to federal naming policy. Both names are in the same language but have different scopes, which are clarified by loc_name, reg_name, and nat_name, with name breaking the tie in favor of the local and state name.

Adding a border to the situation, this reservoir straddles the Georgia–South Carolina state line and is managed by the federal government. The reservoir was originally named Clarks Hill Lake. Later, at the federal level, it was officially renamed after Thurmond, as a birthday present from a South Carolina senator to a Georgia senator. However, the states are free to continue calling it whatever they want. Most people on both sides of the border still call it Clarks Hill Lake, which the federal government recognizes as an alternative name. We have that as the name, with Thurmond relegated to an alternative name.

Similarly, this reservoir straddling the North Carolina–Virginia state line is also managed by the federal government. It was originally named Bugg’s Island Lake (or Buggs Island Lake due to the federal rule against apostrophes), but Congress later renamed it the John H. Kerr Reservoir after the Congressman from North Carolina. North Carolinians call it Kerr Lake, but Virginians prefer its original name, having tried unsuccessfully to revert the federal name.

Based on @amapanda_ᚐᚋᚐᚅᚇᚐ’s example, I’ve added name:en-US-NC and name:en-US-VA to the Kerr Lake relation (hyphens, not underscores, per the BCP 47 standard). It feels weird to use ISO 3166-2 codes here, as though there are two dialects called “North Carolina English” and “Virginia English”, but it gets the point across. I also changed the name to Kerr Lake;Buggs Island Lake, accounting for the fact that it mostly lies in Virginia.

Regardless, we don’t split the lake along the state line, because it’s still one lake in reality, and both states apply the name to the entire lake. This “dialect”-based solution also applies in cases where a jurisdiction insists on a particular name for a feature that it does not claim for itself at all – probably relevant to seas and the like. But it doesn’t absolve us from having to pick one or more names among the various possibilities.

amapanda_ᚐᚋᚐᚅᚇᚐ · September 23, 2023, 4:00pm

That’s what I initially assumed with my river basins map ( see other forum post), until I saw all the gaps. Eventually I just looked for the presence of any tag key matching the /name(:.+)/ regex.

I tried to match up based on wikidata, but that had lots of gaps too.

Minh_Nguyen · September 23, 2023, 4:51pm

Rivers are also interesting because sometimes a name applies only to part of the river, even within a single language. For example, in Vietnamese, the overall Mekong River is called sông Mê Kông, but its distributaries are known by a variety of unrelated names, and collectively the river system within Vietnam is known as sông Cửu Long. In other words, the Mê Kông arises in China but the Cửu Long arises at the Cambodia–Vietnam border. In practice, Cửu Long is never translated into English, which applies the same name to the entire river.

harahu · September 23, 2023, 9:08pm

I feel like all of you examples don’t really touch on the scenarios where it’s actually hard to use name as a tie-breaker.

Take this river, for instance, on the border between Norway and Russia: https://www.openstreetmap.org/way/27653663

Here there are three local (and official) languages competing for the name tag. Norwegian, Russian and Northern Sami. None of them have a clear dominance of use in the area, so the tie can’t really be broken. Yet the river isn’t nameless; it has names in all three languages.

I’ve set the name to Vuorjám - Jakobselva - Ворьема, covering all three languages, but I dislike this solution. I’d love to be able to remove the name tag here, but still achieve the same rendering effect by having carto understand that it should make use of the name:no, name:ru, and name:se tags instead.

This is, IMO the same problem as with the Atlantic Ocean, only that one cannot reach for the lazy option of “let’s just go with the English name”.

What would you do in this case?

IanH · September 23, 2023, 9:19pm

The presence of different languages for each region makes it easier. There is no need to split the baby in terms of language. You end up only having to handle one name per region/language.

ezekielf · September 23, 2023, 10:13pm

I would do this:

name=Vuorjám; Jakobselva; Ворьема
name:se=Vuorjám
name:no=Jakobselva
name:ru=Ворьема

A semicolon separated list of names would also be a fine solution for the Atlantic Ocean name tag if several languages could be chosen. However I suspect with an object of global scale this list would get out of hand quickly.

harahu · September 23, 2023, 10:16pm

I could live with that. The problem is that carto doesn’t understand semicolons for names.

ezekielf · September 23, 2023, 10:26pm

The feature has been requested.

github.com/gravitystorm/openstreetmap-carto

Reformat semicolons in names

opened 06:13PM - 17 Dec 22 UTC

1ec5

text

### Expected behavior Some features have `name` tags that contain multiple va…lues separated by a semicolon. Some examples from the United States: * [This `place=town` node](https://www.openstreetmap.org/node/158859890)’s `name` tag contains an English name and a Yiddish name separated by a semicolon. It is also tagged with `name:en` and `name:yi`, but the dual `name` is appropriate because of the widespread use of a language within the town that would be considered a minority language elsewhere. * [This `amenity=place_of_worship` area](https://www.openstreetmap.org/way/350497025)’s `name` tag contains an Amharic name and an English name separated by a semicolon. Both names are [signposted equally prominently](https://www.mapillary.com/app/?pKey=299319525035558) and used interchangeably. No [`default_language`](https://wiki.openstreetmap.org/wiki/Key:default_language) tag applies in this case, because that key is intended for administrative boundaries, whereas this is a one-off feature. * [This road](https://www.openstreetmap.org/way/679125385)’s `name` tag contains two English names separated by a semicolon. The road is maintained jointly by two highway departments that disagree on the name for political reasons, going as far as to post competing street name signs up and down the road. As a result, local residents also disagree on the name. Unlike in [some countries](https://wiki.openstreetmap.org/wiki/Multilingual_names#United_States_of_America), there hasn’t historically been a consensus to separate dual names with an ad hoc delimiter such as a hyphen or slash. Instead, it’s not uncommon for mappers to use a standard [semicolon value separator](https://wiki.openstreetmap.org/wiki/Semi-colon_value_separator) as they would with any other key. Apart from consistency with other keys, a semicolon is much less likely to occur within a name in reality. A mapper who uses the semicolon delimiter would expect a renderer to reformat the semicolon in some fashion. For example, Mapbox-based maps replace each semicolon with a fancy em dash. But perhaps a more language-agnostic treatment would be to replace each semicolon with a newline, just as with `ref`s in #750. A newline would be less ambiguous because it isn’t possible for a raw tag value to contain a newline. ### Actual behavior Unfortunately, openstreetmap-carto renders the raw `name` tag verbatim, including the semicolon: <img src="https://user-images.githubusercontent.com/1231218/208253823-c2e421c8-5434-4a21-84c2-b9f84001b1a4.png" width="400" alt="Kaser"> <img src="https://user-images.githubusercontent.com/1231218/208255424-ad004fee-3684-4f80-9f3d-f2a24563f7d9.png" width="400" alt="Debre Yibabe Kulbi Kidus Gabriel Ethiopian Orthodox Tewahedo Church"> <img src="https://user-images.githubusercontent.com/1231218/208255300-165f4491-d5bb-4a91-bbc4-c9074d3bab5e.png" width="400" alt="Cincinnati Columbus Cincinnati"> Without support for a semicolon delimiter, openstreetmap-carto encourages mappers to choose unpredictable delimiters instead. A previous version of the Kaser node used a slash, indistinguishable from an individual place name or POI name that contains a slash in reality. This is problematic for other data consumers, such as the router GraphHopper, that reasonably expect a semicolon delimiter. ### Implementation notes #750 splits `ref` on `;` and recombines it with `\n`, primarily to choose a shield image based on the length of the longest name. However, a simple `replace()` could suffice for `name` on a point-placed label such as a place or POI. https://github.com/gravitystorm/openstreetmap-carto/blob/62e8d547a39f8b40e774e98d295b4f9758bc41be/project.mml#L1775-L1781 There’s also a very rare [`;;` escape sequence](https://wiki.openstreetmap.org/wiki/Semi-colon_value_separator#Escaping_with_%27;;%27) for cases where a single name legitimately contains a semicolon. To handle this case, the `replace()` call can be nested inside another `replace()` call that replaces `\n\n` with `;`, or `regexp_replace()` can be called instead. A newline may not be suitable within line-placed labels (roads, rivers, etc.). In these cases, perhaps an em dash could be used. Though slightly less language-agnostic than a newline, it’s still independent of the writing direction and no less ambiguous to the viewer than a hyphen or slash that’s hardcoded in the database. /ref #1086 #4404

As is typical, there was quite a bit of discussion but no indication that such a feature would be welcome. Best not to worry about what carto does and focus on other styles that are open to forward progress instead.

Minh_Nguyen · September 23, 2023, 11:55pm

Correct, that was in response to a question specifically about monolingual naming discrepancies, but I intentionally chose examples in which the federal and state agencies have shared jurisdiction over the same body of water: the U.S. Army Corps of Engineers owns the land and manages the reservoir’s water level, but you obtain a fishing permit from the state Department of Natural Resources. Depending on the direction you’re coming from, two signs just 100 feet (31 m) apart call it Kerr Lake Reservoir or Buggs Island Lake, respectively.

In a sense, this is similar to an international body of water: each country’s citizens prefer a different name depending on the language they speak, but the relevant international governmental organizations officially use English, and there’s a historical tradition of using English on the high seas in general (though some regions have older traditions in other languages).

I don’t think of name as a tie-breaker in these cases. Rather, it’s a fallback strategy for when the data consumer doesn’t have enough context to personalize the label further for the individual user. Back on land, the name(s) in the native language(s) are the most sensible “impersonal” fallback, but there are no such constraints at sea. Otherwise, we might as well be tagging the Sulu Sea’s name in Sama–Bajaw and the Natuna Sea’s name in Moken based on the local seaborne population.

I got the impression that the OSM Carto maintainers were interested in displaying something nicer but weren’t on board with interpreting the semicolon differently than any other punctuation character, as they do with ref and as e.g. Mapbox and OSM Americana do with name. @imagico developed a proof of concept that relies more heavily on name than most vector styles. If I understand correctly, it would intentionally leave the Atlantic Ocean unlabeled because:

It lacks a name tag.
It has one or more name:* tags.
No administrative boundary relation surrounds the ocean that could have a default_language tag.

This seems consistent with OSM Carto’s language-neutral design goal, since it lacks a mechanism to personalize the map according to the user’s language. However, it might surprise users who expect the ocean to be acknowledged in some manner.