[RFC] Feature Proposal - landcover proposal V2

stevea · January 16, 2024, 12:32am

The latter quite possibly has a great deal to do with the former, but I could be wrong.

OSM has known for a long time that we have more than one way to either say (tag) something IN our data or do (render, display, route, geocode, overlay with specific tags…) something WITH our data.

Proposals that want to deprecate tags? Again, a very uphill climb. And if we’re going to fix this, let’s not continue to re-invent the wheel (of deprecation), but at least begin to talk about some smarter methods with which we might ease that transition — what’s being called “versioning,” which must be much more than suffixing .1 on the end of something or bumping v1 to v2, though it does include sane time, manner, place and circumstances where it is wise and correct to do that.

Honestly, I’m exhausted.

JoaoPaixao · January 16, 2024, 5:43am

Would you say that this approach of suffixes or even prefixes in order to distinguish “versions” is completely unreasonable or could something be planned and built with this method?

alan_gr · January 16, 2024, 6:35am

Are you talking about version tagging on individual objects, to be maintained by individual mappers, something like what was done with public transport v2? Could you give an example? I’m finding it hard to visualize the abstract concept.

JoaoPaixao · January 16, 2024, 4:54pm

I’m also trying to understand what scenario we’re talking about.

It could be:

key.v1=value
key=value.v1
v1.key=value
key=v1.value
key:v1=value
key=value:v1
v1:key=value
key=v1:value

And if so, as for one of these, it is inconceivable, impractical in the long term and would only bring more work, is that it? Or could something be worked on with this?

stevea · January 16, 2024, 5:31pm

“Versioning” would be a comprehensive system of integrating into OSM newer tags that have been Approved, using a scheme which deprecates old tags to be replaced. It would involve the identification of downstream use cases, and a phased approach to the introduction (of new) and deprecation (of old) in a way that is well-understood (and anticipated) by the community so as to minimize the disruption of “losing features” when a tag deprecates. It is a great deal of specification, social harmony, technical execution and agreement. The specific “numbering of versions” would be a tiny aspect of all this in comparison to to what I outline above.

OSM already does something (crudely) like this, when tags are newly introduced which duplicate or somewhat overlap existing tagging. But now, it is a “free for all,” rather chaotic anarchy, and we have three (or more?) ways to denote “a wooded area.” As in the example of farm → farmland OSM has some (modest, imperfect) success at doing this, but seldom if ever with a project-wide emphasis on minimizing impact to those who use the older tag.

“Zooming out” even further, (from being abstract about Versioning), Brian’s comments about “Proposals must define a specific solution” should guide new Proposals that introduce new tags which replace old tags, causing them to be deprecated. With Proposals that do this, identifying that there will be common ground in the process of deprecation of tags in general can lead to there being a boilerplate approach to such deprecation such that a “phased approach” (versioning) can and should emerge.

Public_transport v1 becoming v2 is another example, but it is seen that these exist simultaneously rather than v2 completely replacing v1; v1 wasn’t really deprecated, as it is so well-supported and widespread.

I hope that helps. I’m being very high-level here, deliberately avoiding the specifics.

ZeLonewolf · January 16, 2024, 6:57pm

I don’t think embedding versioning within the tagging is a viable solution because it’s just too weird and different from how we’ve been tagging. Plus it puts extra data in the database that doesn’t add information.

What is needed is a way to access the data in a version-independent way. In other words, if I need “version 1” OSM data, I get it in that format, and if I need “version 2” OSM data, I request it from the same source, but it comes out in a different format. Then mappers can muck about in the tagging and as long as the translation software keeps up to date, then the data consumer doesn’t have to care about the version differences and we can put known service lifetimes on how long we’ll maintain “version 1” before a data consumer must upgrade.

Bottom line, I think this is a software problem, not a schema one.

Furtermore, this problem is effectively solved with standardized schemas. So if you’re using OpenMapTiles, Shortbread, Overture, Daylight, etc., those schemas effectively handle the versioning for you. So there’s a great argument that this is already solved upstream.

JoaoPaixao · January 16, 2024, 7:01pm

If i understood correctly, what you mean by “versioning” is the method of transition from one data scheme to another, that involves replaced actual data.
If i asked what would be the diference between this method called “versioning” and what this proposal tries, would you say that is a “anarchy method”? Meaning, is the will of the person that make the proposal that chooses is own method, making difficult to consistent in this process of transitioning?

If farm -> farmland was an example of a transition of a well used tag, how it was made? How the mappers and data consumers made the transition?

ZeLonewolf · January 16, 2024, 7:17pm

Yes

This proposal proposes that we will accept that some number of data consumers will break and/or have bad data after some period of time, and that we should be okay with that because the data model will be better organized for future data consumers.

JoaoPaixao · January 16, 2024, 7:22pm

I agree that this is a software problem, it would be the one choosing and managing the version to operate on.

Then i ask you or somenone if knows how those software manage the versioning, what exactly they version, because in the moment we dont have a “versioning system” otherwise we wouldnt be debating on this, so what would be their “versioning system”?
Could we somehow help them with the data itself?

And the suffixes or prefixes approach, we agree that it would somehow duplicates information but wouldnt it be also helping these softwares to choose the data version more easily?

We dont need more than 2 versions, 1 in use and 1 for upgrade.
Of course we cant make the 1 for upgrade instantly the one for use, and delete the one currently used. We would need to incentivate the upgrade.

ZeLonewolf · January 16, 2024, 7:29pm

As someone that develops several pieces of software that consume OSM data, the answer is very simple:

When data changes in OSM in a way that I didn’t expect, my software stops working the way I expect it to.

I then have to go and find out why it broken. Often it is simply a mapper making a mistake or mapping in a weird way. But it could also be that there were multiple tagging schemes for something and I was only aware of one of them. So I then need to either modify my software or address the tagging dispute with the community or allow my software to remain broken.

As someone who is “plugged in” to the community, this is easier for me to do. For someone who is just developing software that uses OSM data and isn’t involved in the community aspects or the people or personalities, dealing with these diffences can be a real headache.

If it sounds like this is a really hard problem to solve, it is, and that’s why it hasn’t been solved yet.

JoaoPaixao · January 16, 2024, 8:15pm

I agree with what you said. So lets only consider the following hypothetical scenario.

We have the current data mapped as key=value, and now lets say we add the way of mapping version:key=value.
You could say with reason, that this duplicates information, right; by duplicating we get 2 types of data, the one without version property and the one with, right.
Couldnt you as a developer for the softwares that retrieve this data, filter the data by with property version or without?

Example

natural=sand in the normal way the map

and

v1:landcover=sand with version prefix.

Do we agree that if filter by the normal way only natural=sand would work and if filter by the version, only the landcover=sand would work?

If not, discard all the following text.

If so great, but there would be a problem, anyone could then make v2,v3 v… infinitetly right? What would be the criteria for new versions?

Solution that i would say to be the more reasonable, simply add the proposal with intent of adding information, not replacing or deprecating, and make a compilation of the other type of proposals.
This process of new version we would say to be develop in by year cicle. It that year, anyone could propose a change, deprecation (and addition) and make a consense with all the authors of other proposals so we dont have conflitcs between them. After a year all the proposals that would have conflict with the remaining in the new version, would be discarded for future versions and the rest would be approved to integrate the new version.

With this system, developers like you could work on render the new version to be supported. The previous would remain operating and rendering.

What if we add new values or keys to the “default version”, meaning the one with simply key=value (althrough i would advise to after a while deprecate this one and remain only with prefix versions)?
If the new values are only additions and never replacing or deprecating values, the process would stay the same as today, nothing will changed.
But what about the new replacing and deprecating?
It would be the new process, the version prefix system, it would be propose for the new version and integrate only if we wouldnt have conflicts.

its all this pure fiction and impossible with the reality, or could we do it?

ezekielf · January 16, 2024, 8:17pm

These schemas don’t really solve the problem. They break just the same as any other data consumer when a tagging style they recognize is replaced with one they do not in the source OSM database. An actual versioning and backwards compatibility solution for OSM would allow a period of time where mappers could submit either v1 or v2 tags to the API and consumers could get data out with either v1 or v2 tags. A translation layer on both ends would be required. With the tag agnostic philosophy of the project this seems unlikely to happen.

stevea · January 16, 2024, 8:27pm

Being realistic here, can we more-widely acknowledge that after nearly a year and almost 200 posts in this topic that the OP proposal for landcover is “not viable” any longer? It certainly has stalled (evident in its wiki’s Talk page, too) and we have seriously skidded away from the original topic into a meta-discussion about “versioning” and greater-encompassing topics of tag deprecation. This is a huge, deep, very difficult topic, and deserves its own (new) topic here, if not an entire Working Group within OSM to address for the longer term. Thanks for your consideration.

Tjuro · January 16, 2024, 8:29pm

A proper solution could be to store references to tags instead of tags themselves.

That way when a tag change is proposed all older tags will be updated in the new schema but still work in the old one, and all new tags will be backwards compatible with the old schema.

So, proposal authors could specify with tag a new tag will be in the old schema. For example, you could specify that highway=busway would look like highway=service for data users on the old schema.

02JanDal · January 16, 2024, 8:40pm

I rather interpret this conversation as the landcover proposal not being viable within the current context (i.e. the absence of a deprecation/versioning process). That is, it makes sense to put it on ice for now and instead focus on versioning etc. (possibly using it as a usecase), but then re-visiting it when we have the infrastructure (both technical and non-technical) to handle such a change.

dieterdreist · January 16, 2024, 8:44pm

can you illustrate with landuse=farm how this would help solve the issue?

02JanDal · January 16, 2024, 8:46pm

While a tagging scheme such as one of these is possible, “versioning” can mean many things (and I think even calling it “versioning” might be limiting, as there might be approaches that don’t really have anything to do with versioning but that solve the same problem).

Earlier in the thread I posted a few possible approaches:

Providing planet dumps that are “normalized” according to a specific version, a new version would come every X years, and dumps for the last Y versions are available
Provide “normalization” scripts in various languages so that each consumer can choose their target language which the rest of their system works with, without having to manually see what has changed over time
Just let it play out, but disallow mass edits, so that consumers would start to see gradual changes over time, and hopefully adapt on their side
Having official “version change days”, one every X years, on which all data is adjusted according to the new tagging scheme, and made very clear to all consumers that this will occur and on which days

All these, in various ways, solve the same problem (migrating between tagging schemes), all have trade-offs, and all will have (vocal) opponents. As @stevea says, devising this is a huge undertaking (but an important one, possibly even one of the most important ones OSM will undertake if it wants to stay relevant).

stevea · January 16, 2024, 8:46pm

“The current context” is OSM right now. The Proposal is effectively “dead on arrival” (sorry to be harsh in my wording) with the structure (both social and technical) OSM has right now. It would certainly need to be updated (as a v3?) if and when a comprehensive (clearly longer-term) solution might emerge along the lines you suggest. But to put it “on ice” as it is now and expect that we could simply “thaw” it with what effectively would be a new paradigm for OSM tagging is unrealistic.

Just my opinion.

ZeLonewolf · January 16, 2024, 8:49pm

(since I can’t use it as a reaction…)

It was never viable. Not because the discussion here is long, but because the proposal lacks a fundamental understanding of the technical landscape of the project. The length of the discussion simply serves as a body of evidence for the next time someone proposes an unreasonable change.

02JanDal · January 16, 2024, 8:51pm

That’s pretty much exactly what I said It is not viable right now, but it (or rather a future iteration of it) might be once OSM “is ready for it”.

It might be possible to just pick up where it is now, it might need to restart from scratch (that will depend on how a tagging-migration-solution ends up being), but regardless one should look at what was already done and discussed.