I wondered how the Canadian community feels about the address interpolation lines that come with the CanVac imports. They have become a noticable issue in the Nominatim search engine.
The address interpolation QA report gives you an overview of interpolation lines Nominatim takes issue with. At the time of this writing more than 70.000 OSM ways are flagged as problematic in Canada. The two main errors are:
interpolations without intermediate numbers (an even interpolation from 6100 to 6102 for example, or worse, going from 6100 to 6100). You might argue that the interpolation line is still useful to describe that the exact location of the number is unknown. Examples: Way: 235749794 | OpenStreetMap, Way: 92668977 | OpenStreetMap
On top of that, there are interpolations that donât follow any street or path that is visible in OSM or aerial photos (Way: 143606764 | OpenStreetMap). These are not even flagged in the QA report.
Finally, Iâve noticed that the interpolation lines are not deleted when the actually existing house numbers are added. Example: OpenStreetMap
What would be the best way to go about improving the quality of this data? Would it be possible to add some preprocessing to the import that avoid importing the worst offenders?
Edit:Way: 143579633 | OpenStreetMap removed as bad example because interpolations in the middle of nowhere do have addr:street, so there is some information about the address.
Well, it certainly is a bit of a mess. A lot of the issues Iâve seen in my neck of the woods seem to have come from old imports that were cut off in odd placesâcausing interpolation points to âclampâ to the edge ofâŠtiles I assume? Before the import from earlier this year, you could see the cutoff lines in some parts of Longueuil.
As far as how I feel about it though, I think this data is indispensable and Iâd rather have the current slightly-broken version than nothing.
Iâve only really been active for something like a year, maybe a year and a half, but the vast majority of the work Iâve done in that period is surveying addresses. This is because prior to the most recent import, huge portions of Longueuil had no address data -at all-. And sadly, since most of the regional mapping attention is limited to Montreal fully surveying and keeping these addresses up to date is likely impossible without imported data. With the current data, itâs at least possible to look up the rough location of addresses.
Iâve not been involved in any data imports, but I would welcome any work to improve its accuracy. From my experience, the recent import fixed a lot of weird & broken interpolations in my area, but seems to still be stretching single addresses to cover the full area around the building as you say. (Hereâs a recent example for folks, I could locate a bunch of these if needed)
As an asideâIâm glad this came up, because I havenât been doing that very consistently after address surveys. I wasnât entirely sure if should or shouldnât be done, so I mostly stopped deleting the interpolations along streets Iâd surveyed out of an abundance of caution (that example wasnât one of mine though!)
This is probably just a case of newbie-itis though!
Youâre right. In order to respect OSM API capabilities at the time the Canvec product was created (2010-2012), the map sheets were divided into square tiles varying in size based on the number of features in each tile.
In addition to the sorts of errors or âquirksâ of the interpolation ways youâve already identified, I found that the ones I deleted last year in Bragg Creek, Alberta were often placed on the wrong sides of roads (i.e. the âoddâ numbers and âevenâ numbers were erroneously swapped around), or went the wrong way (i.e. the lower number at the start and higher number at the end of the interpolation way were erroneously swapped around).
As I said in my reply to the list: I generally replace the interpolations with actual building-by-building or POI-by-POI addresses, and delete the interpolations when Iâm done. Their accuracy is often dubious at best. In the absence of more precise addresses I think theyâre better than nothing, but with the addition of more precise addresses in my opinion they become entirely redundant and otherwise not worth keeping.
You ask,
What would be the best way to go about improving the quality of this data? Would it be possible to add some preprocessing to the import that avoid importing the worst offenders?
and Iâm a little concerned: are you thinking of âre-importingâ the CanVec data? I wouldnât; I think itâs a pretty common sentiment that CanVec was better than nothing, but going forward isnât accurate enough to satisfy most users, and thereâs no point going back to the same âwellâ to fix the original errors.
Frankly, as time-consuming as it would be, I think the only way to do it properly is slow, methodically, one-by-one double-checking and fixing/replacing.
Also, as an aside:
FYI those addresses are simply out of alignment with the underlying data; the associated roads are on the map, but located about 600 m south of the address interpolation ways, and tagged (without names) as highway=path.
Thanks for the responses. It is great to see that people are busy replacing the interpolation slowly with building house numbers sourced from surveys. And, yes, please do delete the interpolation lines when a street is completely surveyed. It will help reduce confusion for the data users (and also tell the next mapper that the street is done).
Oh no, I wasnât thinking that. Some of the broken interpolations have been created only yesterday (Way: 1282312246 | OpenStreetMap). This is also what prompted me to write in the forum here. I was wondering if it isnât better to stop these imports for a bit and improve the process before continuing.
Thatâs more along the lines I was thinking. In particular, Iâm wondering if there is something remote armchair mappers could help with.
One thing that comes to mind are the interpolation lines that are cut on CanVec tile boundaries. It should be fairly easy to create a list of problematic ways and to fix them remotely by joining the ways. Could be a MapRoulette challenge.
Other improvements are more problematic without local knowledge and I wouldnât want to start larger edits without the consent of the local community: Condensing one-address interpolation lines to points might work with armchair mapping. At least short ones like Way: 357011203 | OpenStreetMap should be fairly save to do (except for the fact that it sometimes not clear if the number exists at all), the longer ones like Way: 1281057082 | OpenStreetMap are a bit trickier but it might be possible from aerials to identify the one house along the road the address likely belongs to.
Maybe you already had other ideas for armchair fixing.
Thatâs a case that is clearly not fixable without going out an surveying. But that begs the question, isnât it better to delete the data completely in that case? Right now it does quite a bit of damage because it leads people to the wrong place. Better to have no data. And when the error is fixed by surveying, it can be done with on-the-ground data. The CanVec data will not help.
Yeah thatâs tricky⊠I donât really have a better suggestion than leaving it to local mappers, if we have 'em. Your example of the âshort, single-address waysâ is emblematic of the sorts of problems weâll face without boots-on-the-ground knowledge: thereâs a building at the corner of Earnscliffe Avenue and Little Road with potentially three different addresses. Is it 2 Little Road (following the interpolation way on the west side of the building)? Is it 4 Little Road (following the interpolation way on the south side of the building)? Or is it 24 Earnscliffe Avenue, following the interpolation way on the east side of the building? Itâs perfectly conceivable they may all be correct, in some way: maybe there are multiple entrances to the building that each have different addresses. I donât know how to figure this out without the aid of someone with a lot of local knowledge.
Well, we could just move the interpolation ways 600ish metres south, and that would certainly improve the accuracy quite a bit. Whether the interpolations themselves are correct in the first place is another, underlying issue.
What I do is I move the interpolated lines outside of buildings in JOSM and move the interpolation in front of the building- because if they are inside the building, itâs usually mistaken as the address of the building, and never gets surveyed (ex: StreetComplete will not ask for address if the interpolation is inside of it). A lot of places where i am (gta area) have this, and itâs annoying because i want to survey the addresses, but SC doesnât give me the option to.
Once it gets manually surveyed, I delete the interpolations via JOSM, but that is a manual process.
I really donât see how this is realistically achievable, even if the Canadian community suddenly grows by an order of magnitude. There are now 1.4 million interpolation lines in Canada. From what I understand from the responses in this thread, they are all potentially of questionable quality and therefore all need to be resurveyed.
That makes me wonder if this data belongs imported into OSM in the first place. It would be much better to prepare the data as an external dataset in a way that it can be easily used with OSM as fallback data. Weâve been doing this for many years now in the US with the house number interpolation data from TIGER. Nominatim (the search engine) can import it on the side and use it as a fallback in the US but it will always prefer house numbers from OSM if they are available. The added bonus of an external dataset would be that it can be easily updated with each new version of CanVec.
There is a quirk with the current tagging schema of interpolations. Each interpolation creates at least two address nodes that are on first sight indistinguishable from exactly mapped house numbers. Youâd have to look if the address node is part of an interpolation way to understand that it is not an exact number but an estimate. Very few data users do that. And thatâs where the low-quality interpolations do a lot more harm then it seems on first sight.
Canvecâs address interpolations most often come from municipalities, which provided them to their provincial governments, which made them available to the federal government (i.e., Canvec), without a priori validation. We cannot therefore speak of wall-to-wall data quality (a mari usque ad mare qualitas ;-)). Rather, data quality should be assessed on a municipal basis.
I agree with Pierre on the fact that we must identify the problems to be solved and propose simple solutions.
I think an import of address interpolations from Canvec could only be relevant where nothing else exists.
way ["addr:interpolation"]({{bbox}});
node(w:1,-1)[!"addr:housenumber"]; out meta;
node(around:0); out meta;
way(bn)["addr:interpolation"];
out meta; >; out meta;
Observez ensuite que les cercles rouges montrent les nodes intersections entre deux chemins. Il sâagira ici de simplement cliquer sur les chemins adjacents et de cliquer sur la touche de raccourci C pour joindre les deux chemins,
@lonvia In the Nominatim database 334 CA interpolations didnât have a parent place (road) assigned. Thatâs usually because Nominatim cannot find a road with the same name nearby. I believe I fixed 80% of those now. Itâs not easy to check because Nominatim doesnât update the interpolation in the database after a nearby OSM change.