We need to talk about the Canvec elephant in the room

Sorry for the catchy title, but I am actually highly surprised nobody has raised this issue in the Canada sub forum before.

Of course, the Canvec import has a long history in OSM, but over the past few months, a massive import of Canvec data has been taking place.

That by itself should not necessarily be an issue if done with due care - although in many similar cases it would have attracted quite a bit of attention and started discussions - but I would like to raise a particular issue regarding the current import:

There is a highly inconsistent and apparently broken import of “woodland / forest” type polygons taking place, that seems to be caused by the currently used import tools and processes. Polygons of neighboring Canvec tiles do not match in their forest cover. While some of that may be a “natural” consequence of different dates of production of the data by Natural Resources Canada, that is the main producer of the data if I am not mistaken, the differences are actually so big as to raise doubt on that as the main or real cause of the issues, pointing in the direction of the import processes themselves instead.

This appears to already have been an issue with the old Canvec imported many years ago looking at the map, but does not seem to have been fixed before deciding to continue importing Canvec “woodland / forest” data this year.

To investigate this issue a bit further, I downloaded official Canvec data in GIS format directly from the government’s website, and made the comparison visible below. As you can see, the two recently imported tiles in the left image, as taken from OpenStreetMap and showing an incongruent discontinuity between the tiles, seem to match perfectly in the actual data as visible in the right image as visualized in a GIS. This raises some serious doubts regarding the current import process.

Especially disturbing is the discontinuity between tiles of course, but there is the secondary and partly related issue of what actually constitutes a “forest”. Canvec has a very fine-grained classification for forests, but only the “Forest” and “Dense” classes seem to be what is generally considered “woodland/forest” in OpenStreetMap. Also reviewing satellite imagery as comparison, has shown me that the “Open” and “Sparse” classes are most wetland scrub with sparse trees, that can hardly classify as forest. The existing classifications also don’t seem to be handled consistently from tile-to-tile.

As a consequence, much of what is now displayed as one green huge “forest” in OpenStreetMap’s Canada coverage, in reality is a highly diversified and highly fractured forest cover, that is really poorly represented by the current import. There is essentially a huge over-representation of forest cover.

It seems that there are two problems here with the current Canvec import tool: 1) clearly topologically incorrect construction of “woodland / forest” polygons based on the underlying technical data leading to “tiles” being almost entirely turned into one giant forest polygon, and 2) likely misclassification and / or inconsistent usage of the existing forest classes for building “woodland / forest” polygons.

What do people in the Canada community think of the current situation, and what remedies could be employed?

2 Likes

Honestly, at least in urban/suburban/rural areas in Quebec, I found that the forest coverage from Canvec is always outdated and most of the time needs to be completely retraced. I don’t think importing forest cover in batch from Canvec has helped in any way getting accurate tree coverage. I may be wrong in other parts of the country though.

3 Likes

Do you happen to mean this user?

I’ve tried to contact them in the past about an issue with their import but I wasn’t successful.

(Not a member of the Canadian community - just someone who occasionally does QA on island mapping and noticed some poor quality data)

2 Likes

Yes, I guess urban areas may become easily outdated due to rapid urban development. Natural and managed woodland areas aren’t static as well, but at least may develop on a much slower pace, and the most remote natural areas already in their latest succession stages should be relative stable as well.

I don’t want to judge whether an import of forest data itself is useful or not, that is up to the Canadian community, although I do understand the desire to have more extensive landcover, similar to other countries (I am not a Canadian myself).

My main concern for now is the clearly broken and inconsistent import.

Yes, I left a changeset comment on the same account quite a while ago.

Unfortunately, due to the inability to easily go back to your own changeset comments, and the vast amount of edits associated with that account, I haven’t been able to find that changeset again.

What I remember from the response I got (yes, I did get one), was a kind of surprise about the issue with the woodland, like the account owner wasn’t aware of the issues. That in turn is slightly surprising to me, considering how obvious the issues are, and the almost certainty that any serious hobbyist craft mapper would always check his own work and immediately detect the problem.

After that changeset, the addition of woodland temporarily seemed to halt in favor of continuing to add the other stuff like lakes and waterways, up until the recent resumption.

1 Like

Thanks! Slightly off topic but you can find a list of your changeset comments through your own page at https://hdyc.neis-one.org/ :slight_smile:

2 Likes

Yes, slightly off topic, but thanks anyway for pointing that out that website I wasn’t aware of. Super nice to see those personal statistics as summarized there, and the ability to easily see your own changeset comments.

As to a more technical background: part of the issues are likely related to the translation from the Canvec topological data model, to OSM. One old - 2010 - but still interesting Wiki page related to this is this one:
https://wiki.openstreetmap.org/wiki/CanVec:_Geometric_Model

And some more about Canvec:

1 Like

Hi. I created an account on the forum as I was notified about this post through discord, which I am active in. Regarding the forests, it’s something that I don’t currently have a fix for. Because that is a bit about my paygrade, I’ve opted to instead import tiles that have little to no forest cover instead. I know that it is not a solution, and is simply avoiding the problem, but I figured that at least adding the data in non-forest regions was still helpful to the map.

Regarding my forest cover: When I first started canvec, I did do tiles with forests. However I don’t enjoy importing those because of the lack of consistency (it always felt like adding questionable data, but I guess if there is a better solution the forests can simply be deleted). I did run into a section of forest during my current import of tile 065. This is circled in red on the map. So I won’t be going further south so I don’t have to deal with those tiles. The blue line on the map shows the approximate forest cover in the current canvec data available for OSM. I don’t import tiles below that blue line anymore. North of the blue line - I will continue importing. South, I won’t.

To osmuser63783 - After the import of tile 065 is done, I’m going back and cleaning up previous tiles I did, and Banks Island is the first section to be cleaned up. This was the first set of tiles that I did, so not everything is stitched together as well as it could be. The island nodes are simply a problem with canvec data, and those will be removed. I did not forget about this, I promise - it’s been hanging in the back of my mind for a while. I simply have not gotten to it as my current project is tile 065.

Regarding the entire rest of canada: I have no idea how to redo the forest MPs. My best guess would be to get a brand new set of forest cover data from Canvec itself, that includes the different denominations. This can be imported using placeholder tags, then changed later. The landform tag with current canvec data is an example of that. If we can actually get this data it would of course be imported seperately than the existing waterways/bodies. Side note, I do plan to write a proposal to deprecate the landform tag sometime in the future as it is inconsistent with the rest of the world.

Once again I know I’m not actually solving anything here I just wanted to clarify a few things. Please ask me more if you need more clarification I am happy to discuss workflow. I have my process down to a science and am super efficient. That’s also why I import in the vast wilderness of NV and NWT. No other editors/civilization is there to worry about messing up.

Another clue about forest is that in early Canvec/osm versions, forest came from photo interpretation, in the last one, from satellite images classification. We should also remember that Canvec/osm is not updated anymore (since 2012).

I think as a community we need to make it clear that a data source not updated since 2012 is no longer suitable for importing, even if it might be of occasional use as a general reference.

There are longer-term questions about what to do with CanVec, but let’s stop putting more data in OSM that we know is at least 12 years out of date, and often more.

8 Likes

I agree that the data is not useful in urban areas - but a vast majority of Canada’s square mileage is undeveloped and as such I feel the lack of updates to those portions of the country are irrelevant. The millions of random bodies of water in the Canadian wilderness have not changed in centuries and likely will not change for centuries more.
Also, due to the lack of people contributing to OSM in Canada, how long would it take to map the country without canvec? I agree that some of it is questionable in quality, but a mostly good data source that spans the country is a lot easier to clean up than to map the whole country without it. For reference, it takes me about an hour to stitch, fix, and conflate roughly 150k-200k changes worth of data. How long would it take to do that by hand instead? It’s such a gargantuan task that the country would never be finished.

7 Likes

While I don’t have anything to add to this conversation at the moment, I do want to mention that I am reading everything posted here and would welcome any feedback to my import workflow :slightly_smiling_face:

I think any new CanVec imports need to go through some consultation. What was proposed for the imports was based on a CanVec that was being kept up to date, which stopped in 2012.

3 Likes

CanVec is too outdated to be trusted. It shows rails in a nearby area to where I live as Track Status: Operational that has been a footpath for almost 3 years and disused for longer then that presumably.

1 Like

@dmich9 and @jmarchon, Good you both joined, as you both appear to have been doing the majority of these imports.

@dmich9: I agree it is likely the majority of geological stuff like lakes and riverbeds, is still valid data.

These are the less contentious features of Canvec. Natural stuff like vegetation will change in over a decade, especially if not a late successional stage vegetation, but the biggest problem here is not so much the import of vegetation related data, but the very obvious broken toolchain causing topologically wrong data for these features.

This appears to be not such a big issue with water features, but it is a clear major issue with the woodland data. I also have doubts about the wetland data, but haven’t reviewed that in depth.

However, I do not agree that “fixing and cleaning up” afterwards is the right approach here, especially not with the totally broken toolchain for woodland. This imported data cannot be repaired, it would amount to having to re-digitize all of it.

As such, IMO, the only viable way forward is to stop importing woodland altogether, and do a thorough review of the other nature related features (wetland only?) being imported before accepting them as valid input.

I would also strongly recommend you to revisit all the data both of you have added, and to remove all of the added woodland data so far. Truly, maybe only 20-30% of the tiles have proper data, the rest is plagued by the topological errors and essentially totally broken data.

The water features, if enough consensus within the Canadian community, and probably as the sole part of Canvec, could likely be continued being imported if done with care.

Yes, I do have a few questions:

  • Do you follow the instruction about the Canvec import as listed here:
    Canvec Import Guide - OpenStreetMap Wiki
    and thus use the date from:
    https://ftp.maps.canada.ca/pub/nrcan_rncan/vector/osm/

  • Have you looked at the possible alternative of Natural Resources Canada “Geospatial Extraction Tool” for getting (woodland) data? Yes, it will be a lot more cumbersome, involving some form of data conversion, but at least the data will be in proper topological structure, GIS ready. I would still recommend running it through QGIS geometry validity checker though, as the official government data I downloaded through that tool, still contained minor geometry data errors according to the geometry checker.

I do also agree with Paul that a wider discussion is warranted. I understand the strong desire to fix the “white plains” of Canada and have some proper content there instead of a solid background. But adding broken data is not the way forward IMO.

1 Like

I do use the data from the source you listed. I initially followed the Canvec Import Guide, but have slightly tweaked it to be more efficient. I am not opposed at all to getting the data from a different source - but I’m not sure how exactly to do so.
Ultimately, I am not very experienced with the technicalities of GIS data. I enjoy doing Canvec because the process is relatively simple - as the data is already in OSM format.

Quick side note as well - you may have noticed that I am still importing Canvec as this conversation is going on - but I can assure you that these contain little to no woodland data.

I may or may not have written that guide :wink:

1 Like

I’m really torn on writing this reply, because I don’t want to dissuade you from contributing, but two things you wrote here really strike at fundamental questions of “What’re we doin’ here?”:

  1. You’re not wrong that using Canvec data is almost certainly the fastest way to fill in the gargantuan ‘blanks’ on the map, and certainly “doing it by hand” would take a person eons. However, when the source data is knowingly out-of-date and of questionable quality/precision/accuracy to being with, it begs the question: would you rather fill gaps on the map with info that’s kinda-sorta correct, or wait for better sources to come along?

  2. If the answer to 1. is “I’d rather fill the gaps now so that the map will be finished,” keep in mind that’s really only true in the sense that the map will be “full”. You make good points about how “millions of random bodies of water in the Canadian wilderness have not changed in centuries and likely will not change for centuries more,” but… that’s really kind of a half-truth in my opinion. Many of them haven’t changed, but many of them have. Rivers run new courses, lakes become ponds, ponds become wetlands, wetlands become clearings, clearings become forests a lot quicker than one might think. No data source is ever going to be perfect, because they’re just snapshots of an ever-changing world. OSM, such as it is, will never truly be “finished”.

I appreciate that you’re mostly mapping stuff way up north that frankly we’re likely not going to get better sources for for decades to come, and probably when you’re “finished” most people won’t touch it again for years to come. But, that Canvec data really is quite spurious when looked at with a finer-toothed comb. I’ve written before that anecdotally most of the Canvec-import data I come across in my OSM travels—mostly Southern and Central Alberta—was already inaccurate and out-of-date when it was imported in the first place over a decade ago, let alone now. You write “[…] a mostly good data source that spans the country is a lot easier to clean up than to map the whole country without it,” but in my experience the exact opposite is true: it’s easier to scrap it and start from scratch than to salvage it.

6 Likes

Just a quick note that I haven’t seen in this thread yet. No map can be finished, and no map can ever be exactly true - something will always become out of date. The same is true of any data sources. Canvec is certainly somewhat outdated. How up to date is the aerial imagery, though? I doubt the whole north is being overflown and photographed every other year. Most OSM editing tools don’t make it easy to find out the date of the imagery being used, either.

2 Likes