[RFC] Feature Proposal - landcover proposal V2

stevea · January 16, 2024, 5:31pm

“Versioning” would be a comprehensive system of integrating into OSM newer tags that have been Approved, using a scheme which deprecates old tags to be replaced. It would involve the identification of downstream use cases, and a phased approach to the introduction (of new) and deprecation (of old) in a way that is well-understood (and anticipated) by the community so as to minimize the disruption of “losing features” when a tag deprecates. It is a great deal of specification, social harmony, technical execution and agreement. The specific “numbering of versions” would be a tiny aspect of all this in comparison to to what I outline above.

OSM already does something (crudely) like this, when tags are newly introduced which duplicate or somewhat overlap existing tagging. But now, it is a “free for all,” rather chaotic anarchy, and we have three (or more?) ways to denote “a wooded area.” As in the example of farm → farmland OSM has some (modest, imperfect) success at doing this, but seldom if ever with a project-wide emphasis on minimizing impact to those who use the older tag.

“Zooming out” even further, (from being abstract about Versioning), Brian’s comments about “Proposals must define a specific solution” should guide new Proposals that introduce new tags which replace old tags, causing them to be deprecated. With Proposals that do this, identifying that there will be common ground in the process of deprecation of tags in general can lead to there being a boilerplate approach to such deprecation such that a “phased approach” (versioning) can and should emerge.

Public_transport v1 becoming v2 is another example, but it is seen that these exist simultaneously rather than v2 completely replacing v1; v1 wasn’t really deprecated, as it is so well-supported and widespread.

I hope that helps. I’m being very high-level here, deliberately avoiding the specifics.

ZeLonewolf · January 16, 2024, 6:57pm

I don’t think embedding versioning within the tagging is a viable solution because it’s just too weird and different from how we’ve been tagging. Plus it puts extra data in the database that doesn’t add information.

What is needed is a way to access the data in a version-independent way. In other words, if I need “version 1” OSM data, I get it in that format, and if I need “version 2” OSM data, I request it from the same source, but it comes out in a different format. Then mappers can muck about in the tagging and as long as the translation software keeps up to date, then the data consumer doesn’t have to care about the version differences and we can put known service lifetimes on how long we’ll maintain “version 1” before a data consumer must upgrade.

Bottom line, I think this is a software problem, not a schema one.

Furtermore, this problem is effectively solved with standardized schemas. So if you’re using OpenMapTiles, Shortbread, Overture, Daylight, etc., those schemas effectively handle the versioning for you. So there’s a great argument that this is already solved upstream.

João_Paixão · January 16, 2024, 7:01pm

If i understood correctly, what you mean by “versioning” is the method of transition from one data scheme to another, that involves replaced actual data.
If i asked what would be the diference between this method called “versioning” and what this proposal tries, would you say that is a “anarchy method”? Meaning, is the will of the person that make the proposal that chooses is own method, making difficult to consistent in this process of transitioning?

If farm -> farmland was an example of a transition of a well used tag, how it was made? How the mappers and data consumers made the transition?

ZeLonewolf · January 16, 2024, 7:17pm

Yes

This proposal proposes that we will accept that some number of data consumers will break and/or have bad data after some period of time, and that we should be okay with that because the data model will be better organized for future data consumers.

João_Paixão · January 16, 2024, 7:22pm

I agree that this is a software problem, it would be the one choosing and managing the version to operate on.

Then i ask you or somenone if knows how those software manage the versioning, what exactly they version, because in the moment we dont have a “versioning system” otherwise we wouldnt be debating on this, so what would be their “versioning system”?
Could we somehow help them with the data itself?

And the suffixes or prefixes approach, we agree that it would somehow duplicates information but wouldnt it be also helping these softwares to choose the data version more easily?

We dont need more than 2 versions, 1 in use and 1 for upgrade.
Of course we cant make the 1 for upgrade instantly the one for use, and delete the one currently used. We would need to incentivate the upgrade.

ZeLonewolf · January 16, 2024, 7:29pm

As someone that develops several pieces of software that consume OSM data, the answer is very simple:

When data changes in OSM in a way that I didn’t expect, my software stops working the way I expect it to.

I then have to go and find out why it broken. Often it is simply a mapper making a mistake or mapping in a weird way. But it could also be that there were multiple tagging schemes for something and I was only aware of one of them. So I then need to either modify my software or address the tagging dispute with the community or allow my software to remain broken.

As someone who is “plugged in” to the community, this is easier for me to do. For someone who is just developing software that uses OSM data and isn’t involved in the community aspects or the people or personalities, dealing with these diffences can be a real headache.

If it sounds like this is a really hard problem to solve, it is, and that’s why it hasn’t been solved yet.

João_Paixão · January 16, 2024, 8:15pm

I agree with what you said. So lets only consider the following hypothetical scenario.

We have the current data mapped as key=value, and now lets say we add the way of mapping version:key=value.
You could say with reason, that this duplicates information, right; by duplicating we get 2 types of data, the one without version property and the one with, right.
Couldnt you as a developer for the softwares that retrieve this data, filter the data by with property version or without?

Example

natural=sand in the normal way the map

and

v1:landcover=sand with version prefix.

Do we agree that if filter by the normal way only natural=sand would work and if filter by the version, only the landcover=sand would work?

If not, discard all the following text.

If so great, but there would be a problem, anyone could then make v2,v3 v… infinitetly right? What would be the criteria for new versions?

Solution that i would say to be the more reasonable, simply add the proposal with intent of adding information, not replacing or deprecating, and make a compilation of the other type of proposals.
This process of new version we would say to be develop in by year cicle. It that year, anyone could propose a change, deprecation (and addition) and make a consense with all the authors of other proposals so we dont have conflitcs between them. After a year all the proposals that would have conflict with the remaining in the new version, would be discarded for future versions and the rest would be approved to integrate the new version.

With this system, developers like you could work on render the new version to be supported. The previous would remain operating and rendering.

What if we add new values or keys to the “default version”, meaning the one with simply key=value (althrough i would advise to after a while deprecate this one and remain only with prefix versions)?
If the new values are only additions and never replacing or deprecating values, the process would stay the same as today, nothing will changed.
But what about the new replacing and deprecating?
It would be the new process, the version prefix system, it would be propose for the new version and integrate only if we wouldnt have conflicts.

its all this pure fiction and impossible with the reality, or could we do it?

ezekielf · January 16, 2024, 8:17pm

These schemas don’t really solve the problem. They break just the same as any other data consumer when a tagging style they recognize is replaced with one they do not in the source OSM database. An actual versioning and backwards compatibility solution for OSM would allow a period of time where mappers could submit either v1 or v2 tags to the API and consumers could get data out with either v1 or v2 tags. A translation layer on both ends would be required. With the tag agnostic philosophy of the project this seems unlikely to happen.

stevea · January 16, 2024, 8:27pm

Being realistic here, can we more-widely acknowledge that after nearly a year and almost 200 posts in this topic that the OP proposal for landcover is “not viable” any longer? It certainly has stalled (evident in its wiki’s Talk page, too) and we have seriously skidded away from the original topic into a meta-discussion about “versioning” and greater-encompassing topics of tag deprecation. This is a huge, deep, very difficult topic, and deserves its own (new) topic here, if not an entire Working Group within OSM to address for the longer term. Thanks for your consideration.

Tjuro · January 16, 2024, 8:29pm

A proper solution could be to store references to tags instead of tags themselves.

That way when a tag change is proposed all older tags will be updated in the new schema but still work in the old one, and all new tags will be backwards compatible with the old schema.

So, proposal authors could specify with tag a new tag will be in the old schema. For example, you could specify that highway=busway would look like highway=service for data users on the old schema.

02JanDal · January 16, 2024, 8:40pm

I rather interpret this conversation as the landcover proposal not being viable within the current context (i.e. the absence of a deprecation/versioning process). That is, it makes sense to put it on ice for now and instead focus on versioning etc. (possibly using it as a usecase), but then re-visiting it when we have the infrastructure (both technical and non-technical) to handle such a change.

dieterdreist · January 16, 2024, 8:44pm

can you illustrate with landuse=farm how this would help solve the issue?

02JanDal · January 16, 2024, 8:46pm

While a tagging scheme such as one of these is possible, “versioning” can mean many things (and I think even calling it “versioning” might be limiting, as there might be approaches that don’t really have anything to do with versioning but that solve the same problem).

Earlier in the thread I posted a few possible approaches:

Providing planet dumps that are “normalized” according to a specific version, a new version would come every X years, and dumps for the last Y versions are available
Provide “normalization” scripts in various languages so that each consumer can choose their target language which the rest of their system works with, without having to manually see what has changed over time
Just let it play out, but disallow mass edits, so that consumers would start to see gradual changes over time, and hopefully adapt on their side
Having official “version change days”, one every X years, on which all data is adjusted according to the new tagging scheme, and made very clear to all consumers that this will occur and on which days

All these, in various ways, solve the same problem (migrating between tagging schemes), all have trade-offs, and all will have (vocal) opponents. As @stevea says, devising this is a huge undertaking (but an important one, possibly even one of the most important ones OSM will undertake if it wants to stay relevant).

stevea · January 16, 2024, 8:46pm

“The current context” is OSM right now. The Proposal is effectively “dead on arrival” (sorry to be harsh in my wording) with the structure (both social and technical) OSM has right now. It would certainly need to be updated (as a v3?) if and when a comprehensive (clearly longer-term) solution might emerge along the lines you suggest. But to put it “on ice” as it is now and expect that we could simply “thaw” it with what effectively would be a new paradigm for OSM tagging is unrealistic.

Just my opinion.

ZeLonewolf · January 16, 2024, 8:49pm

(since I can’t use it as a reaction…)

It was never viable. Not because the discussion here is long, but because the proposal lacks a fundamental understanding of the technical landscape of the project. The length of the discussion simply serves as a body of evidence for the next time someone proposes an unreasonable change.

02JanDal · January 16, 2024, 8:51pm

That’s pretty much exactly what I said It is not viable right now, but it (or rather a future iteration of it) might be once OSM “is ready for it”.

It might be possible to just pick up where it is now, it might need to restart from scratch (that will depend on how a tagging-migration-solution ends up being), but regardless one should look at what was already done and discussed.

ZeLonewolf · January 16, 2024, 8:58pm

Then, solve it, and be prepared to show your work. Whether this is a dismissal or a call to action only depends on your personal level of interest and motivation to roll up your sleeves and demonstrate a solution. I’ve certainly been told that “if you don’t like it, do it better” and then gone off and done so on this project. I’ve built software, executed hard tagging changes, and spoken at conferences and on YouTube on my exploits. And, I’m nobody. Just a hobbyist that felt like it.

Your language is in reverse, I think. OSM is not some entity which you sit outside of and chastise it to do better. OSM is you. It’s me, it’s @stevea, it’s @SomeoneElse entirely. It’s a software developer, an end user, it’s the OSM Foundation Board, a working group, a local chapter, an activist, a hobbyist. It’s nobody in particular. But it IS individuals that have seen problems and said “I can do better” and come up with the goods. That’s the reality in a decentralized project.

“OSM” doesn’t undertake anything. Individual contributors do.

Matija_Nalis · January 16, 2024, 9:01pm

true, but is them up to them to find and fix the issue before distributing updated dataset to their users.
IOW, (theoretical) improvement of using such dataset would be that when tagging style changes, only one place (Overture or whoever) has to worry about it and do backwards-compatible fix; instead of every data consumer having to handle it.

Of course, it also creates another set of problems, not least of which being that tagging structure is ossified there, and you can not do any improvements in data tagging, because if you change or introduce any new type of data, you’re back on square one. Also since one had to check if there has been vandalism or new or changed tags before releasing, there would be delays in updated maps, but that can be an advantage too sometimes.

An actual versioning and backwards compatibility solution for OSM would allow a period of time where mappers could submit either v1 or v2 tags to the API and consumers could get data out with either v1 or v2 tags. A translation layer on both ends would be required. With the tag agnostic philosophy of the project this seems unlikely to happen.

If someone would like do such translation (which I don’t see as an idea with particularly useful); may I recommend (instead of such ugly extra-tag-prefix/suffix kludge) to instead create new API endpoint instead.

e.g. currently this API endpoint https://www.openstreetmap.org/api/0.6/relation/5876272 returns some XML. One could easily create implement translator API at https://whatever.example.com/api/0.7/relation/5876272 which would return that object in the way that they find preferred (e.g. replacing all natural=wood or landuse=forest to landcover=trees or whatever other change and reasoning one might have to deprecate/rename tags), e.g. it would “translate” regular OSM-ATYL-tags to my-more-rigid-and-clearer-set-of-tags, like this:

<?xml version="1.0" encoding="UTF-8"?>
-<osm version="0.6" generator="CGImap 0.8.10 (412697 spike-07.openstreetmap.org)" copyright="OpenStreetMap and contributors" attribution="http://www.openstreetmap.org/copyright" license="http://opendatacommons.org/licenses/odbl/1-0/">
+<osm version="0.7" generator="My New Cool API 0.9.37  (116321 whatever.example.com)" upstream="CGImap 0.8.10 (412697 spike-07.openstreetmap.org)" copyright="OpenStreetMap and contributors" attribution="http://www.openstreetmap.org/copyright" license="http://opendatacommons.org/licenses/odbl/1-0/">
 <relation id="5876272" visible="true" version="6" changeset="118694112" timestamp="2022-03-20T12:24:14Z" user="Muki Mačković" uid="3636444">
  <member type="way" ref="87194173" role="outer"/>
  <member type="way" ref="392034935" role="outer"/>
  <member type="way" ref="753330854" role="inner"/>
  <member type="way" ref="1041941121" role="inner"/>
-  <tag k="landuse" v="forest"/>
+  <tag k="landcover" v="trees"/>
  <tag k="name" v="Park šuma Grmoščica"/>
  <tag k="type" v="multipolygon"/>
  <tag k="wikidata" v="Q97353235"/>
  <tag k="wikipedia" v="hr:Park-šuma Grmoščica"/>
 </relation>
</osm>

Then people can then easily change their JOSM (or whatever) configuration to use that new API endpoint; and enjoy in their new&improved OSM tagged worldview.
The same modifications (if wanted) can be made for objects being PUT on the server, if you prefer to upload only the “new and better” tags instead of the equivalent-tags that user has specified (you probably should create an explicit changeset tag that indicate you did such auto-translation, though)

The best thing? One can do that right now with minimal effort (certainly one or more orders of magnitude less effort than this discussion has entailed so far), and one can test and develop such project on any small server/VPS, and gather smaller (or bigger) number of enthusiastic beta-testers to improve and test it before proposing it as an official one. And even if it is not accepted as official one, who cares? The ones who elected to use it would still reap all the benefits that such tag deprecation/renaming (supposedly) entails.

02JanDal · January 16, 2024, 9:12pm

I think you should look at the context of both times I posted those suggestions; the first time to exemplify that it is (or might be) possible to have a scheme that allows changing tags in a consumer-friendly way, the second time to exemplify other alternatives to prefix/suffix versioning.

I’d love to take a stab at this in a more serious manner, but considering that I already have two rather major OSM-related projects I’m working on right now which are already consuming most of my free time I have to politely decline.

I was referring to OSM as an “organism” or a community, as this is not something any one single person can do. Sure, it will require one or a few heroes who pull the community together around the task and do a large share of the work, but in the end it will go nowhere without the involvement of a large part of the community.

Do note that your comment comes across as somewhat condescending. I’m well aware what drives a project such as OSM forward.

SomeoneElse · January 16, 2024, 9:15pm

“landcover vs landuse” has been debated since at least 2010. This particular thread isn’t even a year old yet.

OSM (with its completely free tagging) has succeeded where other competitors have failed, just like wkipedia succeeded after nupedia failed.

What’s allowed OSM to be successful is allowing anyone to describe their local area so that they, or other people, can make a map from it. Restrict that, and you restrict what people can map - which wouldn’t be a good thing. That doesn’t mean that there can’t be a role for a set of data with a more or less fixed schema so that the problem Brian described above is less of an issue, and yes, the schemas for OpenMapTiles et al do intend to nail things down a bit so that a consumer of (say) vector tiles claiming to be using OpenMapTiles’ schema don’t have “lots of data suddenly missing” - but what Zeke said above is true - and it does mean that if people suddenly have the urge to change the tag in OSM for concept X from tag A to tag B, they’d better have a pretty good reason. One to one changes that literally add no value are NOT a good reason.

Maybe an example would help. I look after a couple of different sorts of maps - web based ones (click the changelog link to see more details and links to the sources of the components) and Garmin / mkgmap ones, source here. Where possible both use more or less the same schema, so that the way that I round up all possible taggings of (say) “broadleaved woodland” is the same in both cases, but the resulting map style code is very different…

If someone decides to change all tag A to tag B in the data, each map will just “see fewer examples of broadleaved woodland”. I’ve seen that happen with embassies, hill forts and pinfolds. That’s despite both those map styles being registered at taginfo, with a link to both github projects so that people can create an issue saying “I’m proposing to change tag A to tag B” alongside anything else that might apply.