Address interpolation imports from CanVec

I wondered how the Canadian community feels about the address interpolation lines that come with the CanVac imports. They have become a noticable issue in the Nominatim search engine.

The address interpolation QA report gives you an overview of interpolation lines Nominatim takes issue with. At the time of this writing more than 70.000 OSM ways are flagged as problematic in Canada. The two main errors are:

On top of that, there are interpolations that don’t follow any street or path that is visible in OSM or aerial photos (Way: 143606764 | OpenStreetMap). These are not even flagged in the QA report.

Finally, I’ve noticed that the interpolation lines are not deleted when the actually existing house numbers are added. Example: OpenStreetMap

What would be the best way to go about improving the quality of this data? Would it be possible to add some preprocessing to the import that avoid importing the worst offenders?

Edit: Way: 143579633 | OpenStreetMap removed as bad example because interpolations in the middle of nowhere do have addr:street, so there is some information about the address.

2 Likes

Well, it certainly is a bit of a mess. A lot of the issues I’ve seen in my neck of the woods seem to have come from old imports that were cut off in odd places–causing interpolation points to ‘clamp’ to the edge of
tiles I assume? Before the import from earlier this year, you could see the cutoff lines in some parts of Longueuil.

As far as how I feel about it though, I think this data is indispensable and I’d rather have the current slightly-broken version than nothing.

I’ve only really been active for something like a year, maybe a year and a half, but the vast majority of the work I’ve done in that period is surveying addresses. This is because prior to the most recent import, huge portions of Longueuil had no address data -at all-. And sadly, since most of the regional mapping attention is limited to Montreal fully surveying and keeping these addresses up to date is likely impossible without imported data. With the current data, it’s at least possible to look up the rough location of addresses.

I’ve not been involved in any data imports, but I would welcome any work to improve its accuracy. From my experience, the recent import fixed a lot of weird & broken interpolations in my area, but seems to still be stretching single addresses to cover the full area around the building as you say. (Here’s a recent example for folks, I could locate a bunch of these if needed)

As an aside–I’m glad this came up, because I haven’t been doing that very consistently after address surveys. I wasn’t entirely sure if should or shouldn’t be done, so I mostly stopped deleting the interpolations along streets I’d surveyed out of an abundance of caution (that example wasn’t one of mine though!)

This is probably just a case of newbie-itis though!

1 Like

You’re right. In order to respect OSM API capabilities at the time the Canvec product was created (2010-2012), the map sheets were divided into square tiles varying in size based on the number of features in each tile.

This is something that came up last year on the mailing list as well: [Talk-ca] Which is preferred? Addr:interpolation with a source or adding in actual addresses?

In addition to the sorts of errors or ‘quirks’ of the interpolation ways you’ve already identified, I found that the ones I deleted last year in Bragg Creek, Alberta were often placed on the wrong sides of roads (i.e. the ‘odd’ numbers and ‘even’ numbers were erroneously swapped around), or went the wrong way (i.e. the lower number at the start and higher number at the end of the interpolation way were erroneously swapped around).

As I said in my reply to the list: I generally replace the interpolations with actual building-by-building or POI-by-POI addresses, and delete the interpolations when I’m done. Their accuracy is often dubious at best. In the absence of more precise addresses I think they’re better than nothing, but with the addition of more precise addresses in my opinion they become entirely redundant and otherwise not worth keeping.

You ask,

What would be the best way to go about improving the quality of this data? Would it be possible to add some preprocessing to the import that avoid importing the worst offenders?

and I’m a little concerned: are you thinking of “re-importing” the CanVec data? I wouldn’t; I think it’s a pretty common sentiment that CanVec was better than nothing, but going forward isn’t accurate enough to satisfy most users, and there’s no point going back to the same “well” to fix the original errors.

Frankly, as time-consuming as it would be, I think the only way to do it properly is slow, methodically, one-by-one double-checking and fixing/replacing.

Also, as an aside:

FYI those addresses are simply out of alignment with the underlying data; the associated roads are on the map, but located about 600 m south of the address interpolation ways, and tagged (without names) as highway=path.

Thanks for the responses. It is great to see that people are busy replacing the interpolation slowly with building house numbers sourced from surveys. And, yes, please do delete the interpolation lines when a street is completely surveyed. It will help reduce confusion for the data users (and also tell the next mapper that the street is done).

Oh no, I wasn’t thinking that. Some of the broken interpolations have been created only yesterday (Way: 1282312246 | OpenStreetMap). This is also what prompted me to write in the forum here. I was wondering if it isn’t better to stop these imports for a bit and improve the process before continuing.

That’s more along the lines I was thinking. In particular, I’m wondering if there is something remote armchair mappers could help with.

One thing that comes to mind are the interpolation lines that are cut on CanVec tile boundaries. It should be fairly easy to create a list of problematic ways and to fix them remotely by joining the ways. Could be a MapRoulette challenge.

Other improvements are more problematic without local knowledge and I wouldn’t want to start larger edits without the consent of the local community: Condensing one-address interpolation lines to points might work with armchair mapping. At least short ones like Way: 357011203 | OpenStreetMap should be fairly save to do (except for the fact that it sometimes not clear if the number exists at all), the longer ones like Way: 1281057082 | OpenStreetMap are a bit trickier but it might be possible from aerials to identify the one house along the road the address likely belongs to.

Maybe you already had other ideas for armchair fixing.

That’s a case that is clearly not fixable without going out an surveying. But that begs the question, isn’t it better to delete the data completely in that case? Right now it does quite a bit of damage because it leads people to the wrong place. Better to have no data. And when the error is fixed by surveying, it can be done with on-the-ground data. The CanVec data will not help.

Phew! :joy:

Yeah that’s tricky
 I don’t really have a better suggestion than leaving it to local mappers, if we have 'em. :slightly_frowning_face: Your example of the “short, single-address ways” is emblematic of the sorts of problems we’ll face without boots-on-the-ground knowledge: there’s a building at the corner of Earnscliffe Avenue and Little Road with potentially three different addresses. Is it 2 Little Road (following the interpolation way on the west side of the building)? Is it 4 Little Road (following the interpolation way on the south side of the building)? Or is it 24 Earnscliffe Avenue, following the interpolation way on the east side of the building? It’s perfectly conceivable they may all be correct, in some way: maybe there are multiple entrances to the building that each have different addresses. I don’t know how to figure this out without the aid of someone with a lot of local knowledge.

Well, we could just move the interpolation ways 600ish metres south, and that would certainly improve the accuracy quite a bit. :stuck_out_tongue: Whether the interpolations themselves are correct in the first place is another, underlying issue.

1 Like

What I do is I move the interpolated lines outside of buildings in JOSM and move the interpolation in front of the building- because if they are inside the building, it’s usually mistaken as the address of the building, and never gets surveyed (ex: StreetComplete will not ask for address if the interpolation is inside of it). A lot of places where i am (gta area) have this, and it’s annoying because i want to survey the addresses, but SC doesn’t give me the option to.

Once it gets manually surveyed, I delete the interpolations via JOSM, but that is a manual process.

I really don’t see how this is realistically achievable, even if the Canadian community suddenly grows by an order of magnitude. There are now 1.4 million interpolation lines in Canada. From what I understand from the responses in this thread, they are all potentially of questionable quality and therefore all need to be resurveyed.

That makes me wonder if this data belongs imported into OSM in the first place. It would be much better to prepare the data as an external dataset in a way that it can be easily used with OSM as fallback data. We’ve been doing this for many years now in the US with the house number interpolation data from TIGER. Nominatim (the search engine) can import it on the side and use it as a fallback in the US but it will always prefer house numbers from OSM if they are available. The added bonus of an external dataset would be that it can be easily updated with each new version of CanVec.

There is a quirk with the current tagging schema of interpolations. Each interpolation creates at least two address nodes that are on first sight indistinguishable from exactly mapped house numbers. You’d have to look if the address node is part of an interpolation way to understand that it is not an exact number but an estimate. Very few data users do that. And that’s where the low-quality interpolations do a lot more harm then it seems on first sight.

1 Like

Les interpolations jouent bien leur rĂŽle et il est possible d’amĂ©liorer la qualitĂ© avant de penser Ă  de nouveaux imports / nouveaux problĂšmes.

PlutĂŽt que critiquer ainsi et les donnĂ©es et la communautĂ© OSM, je penses qu’il est plus mobilisateur pour les communautĂ©s locales si nous identifions les problĂšmes Ă  rĂ©soudre et de leur proposont des solutions simples Ă  rĂ©aliser.

Peux-t-on penser Ă  des requĂȘtes Overpass et parfois en deuxiĂšme Ă©tape utiliser des filtres JOSM pour rĂ©soudre. Y-a-t-il un contributeur assez familier avec Overpass pour complĂ©ter la solution ci-dessous ? J’ai il y a plusieurs annĂ©es utilisĂ© JOSM pour identifier/corriger plusieurs de ces problĂšmes, notamment lorsque node dĂ©but/fin ne contient pas l’attribut adresse.

  • chemin pour une seule adresse
    way[addr:interpolation] Node(1)[addr:housenumber]=Node(-1)[addr:housenumber]

  • chemin avec adresse dĂ©but-fin manquante
    (
    way[addr:interpolation] Node(1)![addr:housenumber]
    way[addr:interpolation] Node(-1)![addr:housenumber]
    )

  • chemin dont les adresses non cohĂ©rentes avec interpolation

    (
    way[addr:interpolation=even] Node(1)![addr:housenumber] non cohérente
    way[addr:interpolation=even] Node(-1)![addr:housenumber] non cohérente
    way[addr:interpolation=odd] Node(1)![addr:housenumber] non cohérente
    way[addr:interpolation=odd] Node(-1)![addr:housenumber] non cohérente
    )

1 Like

Canvec’s address interpolations most often come from municipalities, which provided them to their provincial governments, which made them available to the federal government (i.e., Canvec), without a priori validation. We cannot therefore speak of wall-to-wall data quality (a mari usque ad mare qualitas ;-)). Rather, data quality should be assessed on a municipal basis.

I agree with Pierre on the fact that we must identify the problems to be solved and propose simple solutions.

I think an import of address interpolations from Canvec could only be relevant where nothing else exists.

Petit mĂ©nage du printemps chemins interpolation d’adresses OSM
Identifier et joindre lorsque deux segments

Les adresses dans OSM sont importantes pour que les divers outils de navigation routiĂšre fonctionnent optimalement et les livraisons, services d’urgence etc. puissent trouver leur chemin rapidement. Au Canada, on retrouve souvent des chemins avec l’attribut addr:interpolation pour reprĂ©senter une sĂ©rie d’adresses le long d’une rue. Sur la premiĂšre et la derniĂšre node on retrouve normalement l’attribut attr:housenumber avec au dĂ©but la plus petite adresse et Ă  la fin la plus grande.

Lors de l’import Canvec, ces chemins ont souvent Ă©tĂ©t coupĂ©s en deux ce qui rend non fonctionnel cette sĂ©rie d’adresse. Voici ci-dessous une recette pour corriger ce problĂšme.

Je vous proposes la recette suivante Ă  l’aide de JOSM + tĂ©lĂ©charger via l’API Overpass + coloriage Admin Boundaries

  1. F12 / Coloriage / SĂ©lectionner Admin Boundaries / Cliquer sur > pour ajouter dans votre liste personnelle.

  2. a. Zoomer sur la zone oĂč vous voulez tĂ©lĂ©charger les adresses Ă  valider
    b. TĂ©lĂ©charger / TĂ©lĂ©charger via l’API Overpass
    c. Instructions : Cliquer pour télécharger

    way  ["addr:interpolation"]({{bbox}});
    node(w:1,-1)[!"addr:housenumber"];  out meta; 
    node(around:0);  out meta; 
    way(bn)["addr:interpolation"]; 
    out meta; >; out meta;
    

Observez ensuite que les cercles rouges montrent les nodes intersections entre deux chemins. Il s’agira ici de simplement cliquer sur les chemins adjacents et de cliquer sur la touche de raccourci C pour joindre les deux chemins,

  1. Utilisez la fonction Validation dans le panneau de droite. Une liste de correction apparaitra, ce qui permet ensuite de corriger les différents segments.

VoilĂ  !

1 Like

I can go through the interpolation lines while importing them and merge the adjacent ones at the tile borders. Would that help?

1 Like

Le problĂšme que nous avons est que trop de contributeurs sans connaissance suffisante des donnĂ©es d’adresse ont fait des imports cela sans valider les donnĂ©es. Je suis Ă  rĂ©viser Ă  Chateauguay, au sud de MontrĂ©al, oĂč un contributeur a fait un deuxiĂšme import quelques annĂ©es plus tard par dessus les donnĂ©es existantes crĂ©ant systĂ©matiquement des doublons et des chemins et des nodes. Les Ă©ditions multiples par la suite ont ajoutĂ© au problĂšme les contributeurs ne rĂ©alisant pas que les chemins et nodes Ă©taient en doublon.

Il serait prĂ©fĂ©rable oui de contrĂŽler l’accĂšs actuel Ă  ces donnĂ©es d’import par des contributeurs non expĂ©rimentĂ©s.

Comme je l’ai dĂ©ja dĂ©montrĂ©, il est facile d’identifier les donnĂ©es Ă  rĂ©viser. Mais des corrections partielles via Maproulette, cela est aussi dommagable que les imports. Les contributeurs Maproulette ne vont voir que deux segments Ă  raccorder sans voir les doublons cachĂ©s derriĂšre et corriger partiellement ajoutant encore d’autres problĂšmes. Lorsque deux segments + doublons (4xchemins + 8xnodes), je vois des situations oĂč une seule portion en doublon a Ă©tĂ© effacĂ©e.

      interpolation 1:  n(housenumber) w(interpolation) n() - n() w(interpolation) n(housenumber)  
      interpolation 2:  n(housenumber) w(interpolation) n()

J’aimerais que Lonvia documente son interprĂ©tation que les contributeurs du Canada trouvent les donnĂ©es de qualitĂ© douteuse ce qui justifierait de les rĂ©-importer. Affirmation par ce dĂ©veloppeur pour le moins dĂ©motivante pour la communautĂ© :thinking:

Je constate bien sûr des améliorations possibles dans la forme des données. Mais
contrairement Ă  l’énoncĂ© de Lonvia, je constate que les donnĂ©es d’interpolation sont en gĂ©nĂ©ral correctes au QuĂ©bec Ă  tout le moins. Je l’invite Ă  regarder les donnĂ©es de plus prĂšs :eye_in_speech_bubble:. Pour le QuĂ©bec, voir le fichier WMS du gouvernement du QuĂ©bec (source des donnĂ©es utilisĂ©es par Canvec pour le QuĂ©bec) wms:https://servicescarto.mern.gouv.qc.ca:443/pes/services/Territoire/AQ_ADRESSES_WMS/MapServer/WmsServer?FORMAT=image/png&TRANSPARENT=TRUE&VERSION=1.3.0&SERVICE=WMS&REQUEST=GetMap&LAYERS=Adresses&STYLES=&CRS={proj}&WIDTH={width}&HEIGHT={height}&BBOX={bbox}

C’est moi le contributeur canadien qui trouve que les donnĂ©s sont de qualitĂ© douteuse :grin: :

Je devrais expliquer que je suis seulement un voix parmis tout le monde. Ç’est mon opinion seulement Ă  cause de les donnĂ©es en l’Alberta; surtout le sud de l’Alberta.

@lonvia In the Nominatim database 334 CA interpolations didn’t have a parent place (road) assigned. That’s usually because Nominatim cannot find a road with the same name nearby. I believe I fixed 80% of those now. It’s not easy to check because Nominatim doesn’t update the interpolation in the database after a nearby OSM change.

Examples:

  • Interpolations with road name ‘unknown road 59’. Removed. Changeset: 151679814 | OpenStreetMap
  • Interpolations but no road on the map yet. I created the roads.
  • Interpolation where the name was slightly different. E.g. “Ryan s Lane” vs “Ryan’s Lane”
  • Interpolations from 10 years ago where today there is no road or house nearby.
  • Interpolations where one node didn’t have a house number. Sometimes those were next to another and could be merged together
  • Interpolations on RV camping areas which represent parking spot numbers, not house numbers. I didn’t delete them. Way: 696410754 | OpenStreetMap
  • Interpolations where the start and end house number were the same. If there was clearly only one house I converted it to an address node

A couple are impossible to fix