That is useful additional info to know what is possible under “automated edits”. Although I at least would prefer a solution where deprecated tags stop being created.
But if it is impossible to totally eradicate such new usages of deprecated tags, such bots might be an option (as you say, in areas where local community approved that) in same cases (e.g. when there is simple one-to-one renaming - it obviously won’t work if old deprecated tag is unclear; e.g. might be interpreted to mean different things for which different tags exist).
We have a process already, and it works just fine. That process is the Automated Edits Code of Conduct which of course we’re all aware of. And that’s exactly what I used to guide me when conducting the river modernization project, in which the community replaced, by way of automated edits, a tag which had been deprecated in a proposal, with support from the local communities involved.
The underlying subtext that I am hearing is that we would like it to be easier than what I just described to make mass edits after a tag is deprecated in a proposal.
This process must be able to do much more. It has to take the relevant stakeholders with it. So editors and data users. It must provide an information platform (e.g. by informing them directly). There must be a defined procedure depending on the “Automated Edits Code of Conduct” for a) first double tagging (if possible) and then b) after a transition phase, phasing out the old tag.
In my view, deprecating is a process that can make use of automatic edits, which are then sub-processes, but must still represent other functions.
This project wasn’t starting from scratch. There was never a serious risk of rivers vanishing from renderers after retagging, because natural=water enjoys better support (this being an argument in favor of the retagging). The retagging didn’t affect other use cases such as routing or geocoding. The analysis use case was also largely unaffected, as global queries had had to account for both tags for some time.
In other words, this was the best-case scenario. Unfortunately, many desired deprecations start out at a disadvantage because the migration path for data consumers is not as straightforward.
Sorry to disappoint but this change did affect geocoding. It is a stellar example of something that might have looked like a bit of a simple cleanup but in reality was a structural change to the tagging that can suddenly pose quite a huge problem on the software side.
Here is the problem: we’ve had this double tagging of rivers with waterway=river/stream + waterway=riverbank. Often both were tagged with a name. This needs some deduplication for geocoding purposes. Nominatim did this by simply ignoring waterway=riverbank. After all, the river line is the more interesting result and ignoring a tag is easy enough. Then @ZeLonewolf came along and changed the tagging from a simple waterway=riverbank to natural=water+water=river. What was previously a primary feature now became an attribute to another tag. The way the tag processing works in Nominatim there is no way to filter by such an attribute, so deduplication of rivers has been broken since this particular mass edit because fixing it requires a change in the design of the software.
Please don’t discuss now the peculiars of how to implement secondary tag filtering. It’s not really the point. The point is that any changes to existing tagging may have an affect on data consumers that you don’t have foreseen. So if you want to have deprecation and ‘tagging cleanup’ in a major way, you have to devise a plan that gets feedback from data consumers already during the RFC phase. I consider any proposal doomed that cannot show that considerable thought has gone into the secondary effects that the change might have.
It is not double-tagging, they are entirely separate features. One is the path of the river, and the other is the water-covered area of the river.
Tagging a name on a river area is always incorrect tagging. If Nominatim is indexing names on waterway=riverbank or natural=water + water=river, then that behavior is simply incorrect, and has been incorrect for over a decade.
False, both tagging systems were widely in use at the time I became involved, sometimes just one, sometimes just the other, and sometimes both. If Nominatim was failing to correctly handle the former but not the latter style tagging, then it was already failing to properly handle a significant percentage of the river area objects tagged worldwide when I became involved.
If this effort further exposed an issue with Nominatim that was already widely present, I would consider that a positive outcome.
I do agree, but who is going to do the work?
Using passive voice (“we would do well to xxx”) is almost guaranteeing nothing will happen.
How about you actually suggest a rough outline draft of the text/chapter how you think it should go (click on that arrow and “Reply as linked topic” so others know to follow), and I promise I’ll jump right in and offer changes / more details. In few iterations by parties interested, we might actually have something tangible which can be offered the rest of community for comments. Deal?
I hear you say that this is a non-issue, but is it really? I’m a Norwegian mapper, and I’ve used the style you suggest for my own projects where I try to achieve this effect, but it comes with a bitter aftertaste. The technique of mapping with superimposed areas with tag natural=* is not well documented, and does not seem universally accepted. Whether data consumers treat it correctly seems quite hit and miss. I would not translate these cases to the style you suggest before said style:
Is well documented and standardised. For instance, does one use layer=-1 on the area signifying the river/lakebed?
Of all of the proposed tagging changes over the years, including “deprecations”, as a data consumer I’ve only ever been contacted once (by the parcel locker people) to say that a proposal might affect the data that I’m consuming. Thanks to them for doing that.
For what it’s worth, I don’t think the technique of tagging waterway=riverbank and natural=sand (or bare_rock, shingle, etc) on the same object is well documented or universally accepted either. I wouldn’t be surprised if some data consumers did something odd in both cases. I don’t see a problem with the data modeling though. One feature is an intermittent area of water, another feature is the intermittently exposed sand, rock, or shingle. The areas overlap in the real world, so it makes sense to model them as two overlapping areas in OSM as well. Improving documentation to mention this technique is a good idea though.
You’re certainly correct. Making a statement like this is easy, but doing the real work to make change is difficult. I probably do not have the available time and energy to get started on this in the near future, but I’d be supportive of such an effort.
From the imagery, it looks like an area of rock that forms a well-defined river channel. It’s probably usually mostly dry with a small stream making its way through (see the Bing imagery) but occasionally a raging torrent.
The question, I guess is whether should be mapped one object or two. Is it only one (something that is sometimes wet and sometimes not), or two (the rock, and the water that’s there part of the year)?
However it gets mapped, there are ways for data consumers to decide what they want to show here. It’s much easier (using pretty much any rendering mechanism) if both sets of tags are on one object, but also possible if they’re not. Personally I don’t think “how easy it should be to render” should be the main factor here - it comes down to “do you think there is one object here or two”.
I’m not really sure if the RfC period should be extended, since I assume with most proposals the details have already been discussed on the mailing list and other places beforehand, but I’ve always felt that the two vote period is way to short and should really be extended by at least a week, if not two. It can take almost a week just for everyone to be notified. Let alone can most people read through the proposal, relevant mailing-list discussions, Etc. Etc. and made an informed decision in that short of a time.
(I’m aware that the general rule is “at least” 2 weeks and that it can be extended, but it’s clearly the default and people don’t usually extend it. One reason being that it can look bad on the proposer if they extend the vote period for a proposal that is likely going to be rejected after the 2 weeks ends.)
Proposals should include a machine readable description of each set of tag changes. This requires that the all the concepts described in human language are available in a form something that bots and renders can immediately process. This forces the proposal writers to create data object or modifier for each concept being covered in order to be considered complete. The resulting data objects can then be analyzed and used by companion software such as editors and routers.
The wiki isn’t well suited for machine-readable schemas, unless you’re suggesting data items about proposals, but that could be a challenge to model well. Maybe there could be a voluntary step of creating a pull request against the id-tagging-schema repository, showing how its list of deprecations and suggested upgrades would change. But I’m unsure about making that a required step, since deprecations aren’t always as straightforward as replacing one set of tags with another.
Proposer writes proposal, discusses it widely, and demonstrates consensus with a >75% approval during a 2+ week vote.
Proposer writes up technical procedure for making the change in the map.
Proposer discusses proposed change with the communities involved to ensure there are no objections to making the edits
Proposer makes the edits
There is your process, with no use of the passive voice. It’s also the process that exists today and anyone can use!
In practice, the change from an older tagging scheme to a newer one is rarely a 1:1 substitution and a bunch of work is involved to deal with edge cases, outliers, or cases where the old tag is replaced by more detailed tagging that requires the mapper to make a decision.
The problem isn’t that we don’t have a “process”. The problem (if you can even call it a problem) is that the community rarely agrees on what to do, and most people aren’t willing to put in the effort of listening, technical analysis, persuasive writing and diplomacy needed to do it on a global scale.
This is the reality of an anarchic, community-driven process. Some have suggested that the community process is no longer effective and that it would be better for the OSMF to introduce a formal, bureaucratic process for handling tagging changes. That would certainly “solve” the problem of tagging changes being too hard by appointing an arbiter to decide on such things. However, I suspect that the voting membership of the OSMF would oppose board candidates that would advocate for foundation involvement in tagging issues.
Wouldn’t it be more like they are just implementing the process you outlined that has already been decided on by the community? Otherwise, what exactly would they be deciding on that would be out of scope of the already exiting way of doing things?
However, I suspect that the voting membership of the OSMF would oppose board candidates that would advocate for foundation involvement in tagging issues.
I don’t see what the “issue” would be if they are just implementing a tag that the community has already voted on and approved. In a way bureaucrats/arbiters are already involved in tagging issues when for example DWG members revert edits done by people who are using unpopular or unapproved tags. At least in the case of deprecation it would have an actual, wider benefit to the community outside of just tag fiddling to indulge the preferences of a small minority of confrontational users or whatever.