[RFC] Feature Proposal - landcover proposal V2

stevea · January 3, 2024, 12:55am

I don’t know; as I say, I don’t have anything ready-made. I am very encouraged by your response and that my post received as many thumbs-up as it did — more than any other post I’ve ever made here.

It may be appropriate for a new thread to emerge that begins to address these issues. This is a very long-term goal, and one which will take a lot of people, a lot of effort, a lot of listening and a lot of give-and-take. To start with, OSM already has some quite well established processes (like our Proposal and voting process, wiki development, et cetera). Things like a formal versioning process would need to be well discussed, well understood, widely agreed upon and to be well documented. I have been an ambitious and productive OSM contributor, but that’s a great deal of work and to spearhead such an endeavor takes vast dedication and indeed, leadership. I’m not sure I’m correct for that role (right now).

Again, if others want to pursue this, I think a new topic (thread) here in our Discourse / community forum is a good direction to take it in as a next step. I remain listening, both here and whereever this discussion may go, as I do have a great deal of passion to see OSM improve along these lines. But I don’t know how to go about it, except to offer a spirit of helpfulness and future contribution as a listening member and my longer-term perspective in the project. Thank you for your kind response.

KoiAndBlueBird · January 3, 2024, 1:50am

Apart from being sensible and intelligent, it was a very welcome change from the general tone of this discussion.

Sounds good. On which board should we post it? Would you be willing to do a small write-up? Is there anything to discuss beforehand? I think that the scope should be thought out.

Speaking of others, please, chime in (if you like)!

If I am being overly pushy here, please tell me and I’ll stop. Your text was actually so motivating to me that I want to start with the process; even though it will be a long way and depends heavily on others.

stevea · January 3, 2024, 1:57am

I’ll ask you to continue to show initiative yourself, as I think you’re doing great so far. You are welcome to copy-paste any or all of my post, quoting it, linking it, et cetera. While it is several paragraphs, I tried to make it as concise as possible (though the iceberg is indeed quite large), to provide a possible “jumping-off point,” and it appears that’s what you are proposing to do. I am encouraging you, or anyone else, to do so.

You might start a thread in the General talk Category titled something like “Longer-term suggestions for difficult tag disambiguation, possibly a formal versioning process” and introduce a pointer back to this topic’s posts right here. I am not performing some sort of “hit and run” here, “fleeing the scene,” rather, I am encouraging more support for these ideas to come forth and hopefully flourish. I’ve gotten the ball rolling, it would be good for others to throw some shoulder into this, as I don’t want to be a single point of contact for what is clearly a widely-shared set of ideas that seem to resonate with a good-sized segment of our community. Let’s let this continue to unfold, so far so good.

Sticking to landuse / landcover, in particular the seemingly intractable problem of wood vs. forest, is a good place to start. (1A, 1B, “landcover as a general problem to solve with a worldwide solution…” are components of this — icebergs are big).

KoiAndBlueBird · January 3, 2024, 2:12am

I think I understand. I’ll be waiting and letting our conversation stir a bit here before making the new post. As you can imagine, I am not super keen on doing this “alone”. More voices, different voices, critical voices are needed here.

What we need is not rush, it is dedication and commitment. Indeed:

pnorman · January 5, 2024, 6:50am

It’s not the mappers who bear the cost of tagging changes, it’s data consumers. It’s very easy for mappers - they just update their editor, since most people don’t work with raw tags.

Most tagging changes so far involve tags not used by any data consumers or seldom used ones. Some tagging changes involve data consumers noticing flaws in the model.

With the features involved, virtually every rendered map using OSM data will be changed, many analytic uses, and pretty much every use I can think of except routing will be changed. And by changed, I mean broken in most cases.

The idea that users will change their logic in advance is, put simply, absurd. Of the ones we know about, we have no way of notifying most them, and we don’t even know about most users who will be impacted. The first they will know is when their maps or data analysis or whatever they’re doing breaks. In most cases, this will go to a team that has responsibility for many services, and in some they won’t have touched the maps service it’s just worked for the last five years.

Some of those will look at it and decide to go with a commercial provider under the mistaken assumption that they won’t have their software break because of a data change they weren’t expecting.

I dislike getting involved in tagging disagreements for a number of reasons, one of which is that I am a maintainer or developer of many map rendering styles. However, as a maintainer, I have been trying to adopt the principle that a “style should not adopt practices that would result in making consuming OSM data more difficult for most data consumers”. An example of this would be troll tags.

I cannot see a situation where would support the changes in this proposal in a style I maintain. I might have to reluctantly accept it, such as if a client told me to “fix the broken OSM data”, but I would not endorse it.

We gain no improvements in what we can map. We break almost everyone’s usage of our data. And who would see any gains? The small number of people who look at raw tags would be the only ones seeing the new tag keys and values. Most mappers won’t see any change, except as everything breaks.

Tordanik · January 5, 2024, 8:56am

You paint an accurate picture of the current state of affairs where we’re unable to make improvements without incurring massive costs for each change. But surely, the conclusion cannot be to stop improving the OSM data model. Even if we can live without changes that only affect people looking at raw tags (which includes data consumers who may consider adopting OSM in the future, and for whom an idiosyncratic and messy data model is a turn-off), all these same costs are incurred by, and therefore deter us from, making changes that do introduce new capabilities.

Therefore, my conclusion is that we need a better process for making changes to the OSM data model. This might very well involve a versioned specification for at least the “core” of the OSM tagging data model, which I expect would make it both more feasible to make changes to the data model and make tracking those changes as a data consumer less onerous. (After all, what’s the current alternative? Watch a few thousand wiki pages and sift out the substantial changes from the formatting tweaks?)

Of course, pulling that off would require the OSM community to demonstrate an uncharacteristically high level of coordination.

SomeoneElse · January 5, 2024, 10:06am

For the avoidance of doubt, this proposal is NOT an improvement to the OSM data model (or even much of a change, actually). It’s just an administrative thing that would require “busywork” for no gain by almost all data consumers.

That does make sense…

02JanDal · January 5, 2024, 10:54am

I think this is important, but rather than just saying “we can’t do this sort of thing because of this”, it would be more productive to say “we need to take this into account”.

A good versioning (or migration, or whatever) process as suggested by @stevea would then be able to take such problems into account, hopefully solving them or at least making the problem smaller.

As an example of how this particular issue (data gets suddenly broken for data consumers) could be solved in a versioning/migration process:

New tagging scheme is accepted
1. Objects get dual-tagged in both the old and new tagging scheme (preferable automatically by the editors, might in many cases be implemented just by adjusting the prefixes), QA tools warn of either tagging scheme missing
2. New tagging scheme is applied to existing objects (either in automatic mass edits or over time)
Time passes (measured in years)
Old tagging scheme is officially deprecated
1. Editors and QA tools warn of old tagging scheme being present
2. Old tagging scheme is removed (either in automatic mass edits over time)

This would give data consumers time to adjust. As these issues often have already been present for a long time, migrating over a long time (3-6 years possible) should not be unreasonable. Do note that this is just one possible solution, which might or might not work together with any other requirements the versioning/migration process would have.

Just throwing a few other random ideas for solutions out here, not really part of the topic

Providing planet dumps that are “normalized” according to a specific version, a new version would come every X years, and dumps for the last Y versions are available
Provide “normalization” scripts in various languages so that each consumer can choose their target language which the rest of their system works with, without having to manually see what has changed over time
Just let it play out, but disallow mass edits, so that consumers would start to see gradual changes over time, and hopefully adapt on their side
Having official “version change days”, one every X years, on which all data is adjusted according to the new tagging scheme, and made very clear to all consumers that this will occur and on which days

This got a lot longer than intended, but my point is: We as a community have problems (such as tagging schemes that aren’t ideal), and it’s a lot better then if we actually try to solve them, rather than just vetoing any attempt to improve the situation.

That said, problems such as the one you’ve outlined are very important that they are mentioned (so that they can be taken into account), however, I think it would be a lot better if such posts had a tone of “great initiative, here’s a thing to consider” than “that’ll never work, stop trying to improve anything”.

It might in the end very well be that the end result is that no workable versioning scheme can be devised. But we should dismiss that possibility before trying.

os-emmer · January 9, 2024, 7:49pm

@02JanDal thank you for your very productive suggestions.

I realy like this idea. I think, we all agree that a landuse=forest and a natural=wood is near to always completly land covered with trees, so adding landcover=trees would not be wrong. Once we added landcover=trees to every landuse=forest and natural=wood, we can inform the data consumers that they should change theire software and give them a few years. After that periode of time, we can start to redefine landuse=forest and natural=wood to have a clear and distinct meaining.

That way nothing breaks, no data consumer has to instantly change anything and we could still improve the situation.

I am not sure how that works exactly, but in taginfo, there is a list with projects using a tag, see here. Does anyone know, how complete this list is?

Maybe we could introduce some kind of mailing list for data consumers where they get such updates. Sending in this mailing list should be limited to a small group of community trusted users to prevent spam. What do others think about that?

SomeoneElse · January 9, 2024, 8:56pm

While it’s not complete (for example, openstreetmap-carto hasn’t added itself here) that list is free for data consumers to add themselves to if they want to.

Entries there will tend to have some sort of contact option listed. For example, this is one of mine, and the project URL is a github link that allows new issues to be created.

In addition, data consumers can monitor the usage of the tags that their project uses. I use this script for that.

I’ve been contacted a couple of times about suggested tag changes, although a cynic might point out that “no longer in the data” occurs in the project changelog many more times than that (i.e. most of the time I find out that something has disappeared either when icons disappear from the map or I get an email because a combination that I’m interested in has disappeared from the OSM data).

dieterdreist · January 10, 2024, 10:38pm

no, it is not Cartographer10 who has to sort something out, it is the people who make a map from the data who will have to decide what to show or prefer or how to mix landuse and landcover in a rendering (maybe depending on which landuse and landcover).

dieterdreist · January 10, 2024, 10:43pm

they do not have options though. They have to live with either landcover or landuse mapped.

Try to find a forest in the data. You cannot automatically tell whether there is a forest or just a few trees in a city.

ZeLonewolf · January 10, 2024, 11:48pm

From my purely American perspective, I do not know what “a forest” is. Would one tree be a forest? ten trees? a hectare of trees? A square kilometer?

In American terminology, a forest is just any large area of trees. It would also be correct, but very archaic, to say “a wood” to describe the exact same concept. The American poet Robert Frost wrote in 1916 in The Road Not Taken:

Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.

In modern American terminology, we would also say the woods or the forest to represent the same concept.

It would be unusual conversational usage to say a forest. Now, there IS the concept of a named forest, such as the US National Forests, but these are really just a type of protected area or nature reserve.

Language note, natural=wood and landuse=forest are probably used interchangeably here because the plain (American) English terminology is also used interchangeably. Whereas, in German-speaking areas you have a “wald” and “forst” which are both tree-related places but are actually distinct concepts. Thus German speakers no doubt expect wood and forest to correspond to wald and forst thanks to the false cognates.

Now, because I’m familiar with past discussions, I think what you might be describing when you say forest is something like “land used for forestry” which is quite a different concept indeed - for example, what is expressed by boundary=forestry, however, that tagging has not proven popular.

Given all this, it is not surprising that we’ve failed to get this right.

Friendly_Ghost · January 10, 2024, 11:49pm

Which is why it’s our responsibility to decide on tagging schemes and to use them consistently. Providing stability to users is part of that. I’m all for innovation, but this is pedantic retagging without any meaningful new content.

Landcover/landuse tagging is binary. Something is tagged with one of those tags or isn’t. Real life forests, on the other hand, are a vague concept. How many trees make a forest? How many grains of sand make a heap?

stevea · January 11, 2024, 12:19am

I’ve been (very patiently) both untangling these in my mind and working with others to achieve consensus to do so for almost 15 years (in OSM), and I only see slow progress here. Thanking my dialect-sharing fellow contributor (though he is thousands of km away on the coast of another ocean) @ZeLonewolf as I echo his sentiments, I additionally say this: to my US English ear, a “wood” is a rather indiscriminately-purposed group of trees — or maybe it has serious protected_area semantics associated with it — while a “forest” has an additional semantic of “timber production” (or maybe parts of it are used for mushroom gathering or cultivation, or parts which are sort of meadow-like are converted into vineyards, or around here, we get patches of greenhouse_horiticulture popping up in forests…). A wood is “more primeval,” (though might or might not be virgin, never-cut trees), a forest often / usually / always? has some sort of “productive value is obtained by humans from the area, its trees, some combination, or additional activities which are associated with ‘treed’ lands” semantic associated with it.

This remains complicated, everybody. We all have cultural and linguistic biases about these words and concepts, and while OSM has put much effort into unsnarling this, we still have a long ways to go.

ZeLonewolf · January 11, 2024, 12:32am

It is interesting that you say this, because I know no such distinction. I suspect that’s because we don’t really do timber production here (you would have to go to far more remote parts of New England for that). What I would say is that for me – linguistically – is “forest” has a mildly stronger degree of formality and stiltedness than “woods”, in the same way that a person might be called “James” instead of “Jim”.

Of course, if you say the word “forestry”, I would be confident that we’re talking about the same thing.

SomeoneElse · January 11, 2024, 12:47am

With a “data consumer” hat on, I have to say that that simply isn’t true, at least not for me.

What nobody does is to take an OSM key such as landuse that they are interested in and then see what values OSM users have assigned that it. The approach is the other way around - “I’m interested in trees” (so look at forest, wood, tree and various others). Some of those values may have landuse keys, some natural amount others (boundary).

To take another example, someone who’s interested in (say) the road network isn’t going to make an assumption about what highway means in OSM; they’re going to see how OSMers have mapped real-life objects to give them clues what to look for.

SomeoneElse · January 11, 2024, 1:31am

Absolutely. Here’s another example - someone recently wrote a diary entry about fox coverts. These are small areas of woodland historically associated with fox hunting. The “land cover” would be certain sorts of trees (and undergrowth), but the use would be … what exactly?

It certainly wasn’t primarily fox hunting. Even before hunting of foxes by dogs in the UK was (mostly) banned nearly 20 years ago, hunts didn’t operate that often. Fox coverts mostly existed for the same reason as other small patches of woodland - to provide wood, to provide a bit of cover from the wind - and also just to look nice. The idea that there is just one land use (and just one land cover) simply isn’t true.

stevea · January 11, 2024, 2:15am

Yes, “same thing” roughly, maybe sharply, though there are so, so, so many flavors. “Forestry” can mean many things (human activities) happening upon a land denoted (as a polygon boundary, say). We have smeared landuse (where an overlap happens: the “active verb” context of the word forestry means “use” that is active and present) in many places with zoning confusions, but it gets better as this has unblurried as a semantic meaning: a landuse tag “being used in this way right now” compared to smearing or wish that zoning or proposed or permitted use is not the same as actual use (over the whole parcel, say). That’s a big part of blurry that isn’t talked about out loud very often but it’s a form of confused tagging that is certainly out there in our map. (Where it comes to distinguishing landcover). “We are what we render,” not exactly, but renderings can both enlighten and confuse.

We wish to “paint our map so we can see it.” While we are what we render, our data are also evolving as better.

We seem to be in a phase of teaching ourselves what we do not know. I’m OK with that.

It is our tags that are most important. Renderings and overlays are neat, yet they are interpretations. I think it best we interpret the data directly, as “what do we mean by these tags?” That’s simply how I think all this works, really. Tag well.

02JanDal · January 11, 2024, 9:27am

Just to give the perspective from another language (Swedish, though also having worked with landcover/landuse classification, and additionally being fluent in german though I don’t know if that influences it) (and who also happens to have harvester working in his forest right now):

In daily speech, we’d call it all “skog”, though it’s possible to be more specific, for example “skogsdunge” (small area of forest, usually within a field or residential area, normally no economic value (that is, no forestry activities), I’d tag that as natural=woods), “naturskog” (any size, no forestry but is let to grow naturally, commonly, but not exclusively, part of nature reserves, I’d tag that as natural=woods) or “skogsskifte” (part of a forest with forestry activities, roughly 1:1 with the parcels in the forest, I’d tag that as landuse=forest though for the entire forest, not on individual parcels).

Since so much of our country is forest, and at least in “my” part of the country the wast majority of that is used for economic value, I usually tag all forest areas over about a hectare with landuse=forest, since that’s a pretty safe bet and actually differentiating between “naturskog” and economic forest usually requires on-the-ground knowledge (and most of my mapping is armchair mapping).

Having worked some with landuse/landcover classification in a previous job, I also want to add some input on that (general) topic;

To begin with, the differentiation between the two is very hard for most people, which is something OSM should take into account (so that we don’t raise out entry bar too high).

But, there definitely is a significant difference between the two, though they aren’t completely orthogonal. As an example, a landuse of “forestry” would usually contain landcover such as “trees”, “scrubs” and “grass”, with some “water”, “road” and “wetland” interspersed. A landuse of “industry” on the other hand would contain landcover such as “built area”, with some “water”, “grass” and “road” as well. But these aren’t rules; a landuse of “forestry” could for example also contain a small section of “built area” with a logging cabin.

Landuse is often more “coarse” than landcover, an entire neighborhood could have the landuse “residential”, but within it there can be landcover of “building”, “grass”, etc. Though depending on how detailed one wants to get it is also possible for landuse to be hierarchical (e.g. a small vegetable garden, within a residential garden, within a larger residential area). Currently, I tag landuse=farmland excluding impediments and ditches, though technically they are part of the landuse, according to some definitions. All of earth is covered by some landcover, while not every square meter has a use (i.e. the ocean, large tundras, etc.), though it’s not always possible to easily tell if there’s a use (one would have know some forestry to be able to tell if a forest is actively forested and thus well kept, or is just let to grow without any intention to use it for anything). Landuse can overlap (a ditch between an industrial complex and a residential neighborhood, used to drain both, would be included in both), landcover is (usually) exclusive.

Right now, this is handled… weirdly in OSM, since there is no clear distinction between landuse and landcover. I have actually talked to a person who choose not to use OSM because of this, that’s only one data point, but that means that at least one person was put off by the current state.

That said, maybe there is a way forward to keep the current tagging scheme with minor changes that’ll work out for most producers and consumers and reduce the current confusion. But given that most of the geospatial world handles it as landuse+landcover despite the confusion for newcomers makes me doubt that.

I think it would make sense to introduce a more clear distinction in landuse and landcover in OSM, because I think the current state of affairs hinders us more than a change would. Though maybe we can do without any new keys, just refining the existing ones?

Most values of natural=* already map quite nicely to most common cases of landcover except built/exploited landcover, but that’s also covered already (by building=*, area:highway=*, etc.). So rather than introducing a new key, we could maybe just go through the landuse and natural keys and move individual ones between them (if required)?

Coupled with clarifying the distinction in the wiki, especially for newcomers, and possibly adjusting presets (maybe all landuse should be called e.g. “Used for forestry”), I think we could get quite far in improving the situation, without introducing major sweeping changes. We’d have to document clearly that natural=* means (non-artificial) landcover so that it does not get diluted again, but I think it should be doable to keep it clearer if it’s just documented.

I also propose that we try to keep this topic about discussing how we could change, and anyone who wants to discuss that a change isn’t necessary or worth it should create a new, separate thread.