Proposed import of addresses in Montreal

Spearfish7424 · May 21, 2024, 8:37pm

Hello,

I’m making this topic to gauge whether I should proceed with an import of the City of Montreal’s Adresse ponctuelle dataset. This would also include removing CanVec address interpolation for the relevant area, as it would be redundant.

The “how” of this import is detailed on the wiki page I created regarding the matter. But in short:

the City’s dataset contains points representing addresses. These points each represent a single address or a range of addresses. The points also contain a disassembled version their respective street name.
In QGIS, first, the street names were assembled.
The city each address belongs to (arrd:city) was not in the dataset, so this data was added from a different dataset under the same license.
To know which addresses from the City’s dataset should not be imported, all nodes and ways containing an addr:housenumber were downloaded for the relevant extent. I then selected all those which were not CanVec interpolation and transferred them into QGIS.
If the housenumbers of a point from the import dataset were also found on the nearest centroid of one of the existing non-CanVec OSM features, then that point from the City’s dataset was omitted from the import.
Points from the City’s dataset which indicate a single address could be represented as a single point.
However, for those indicating a range of addresses for a single place, I found it best to use interpolation (more on this later). The geometry was generated as virtual ways of 1 m.
The data was then further processed and cleaned up in JOSM.
How to (safely) remove the existing CanVec address interpolation for the territory of the agglomeration was also detailed. This will be done on the same day as the import (if this import is a good idea at all).

Why use small 1 m segments of address interpolation to represent places with multiple addresses? Listing them individually on the same point or representing each address as a separate point can be problematic for places that have a large range of addresses (e.g., hundreds).

As for using a symbol (like a dash) to represent a range, the disadvantage with this method is that Nominatim can’t find what is inside that range – only it’s extremes. For example, try searching for “5623 9e Avenue.” The address exists here but unfortunately Nominatim is unable to find it.

The practice of using virtual ways to represent addresses is detailed on the wiki here.

An example of the data that will be imported (in .osm format) is available on the wiki page.

Please let me know if this import is a good idea. It may be too risky and/or using address interpolation like this can seem odd.

Thanks for your feedback!

watmildon · May 21, 2024, 9:33pm

I think this would generally be a big improvement over the current state.

Dashes in addresses to indicate ranges is super poor practice and I am very very happy to see you getting that sorted out better.

I’m not against using a short interpolation way, it would certainly be a new way to do it, Is there any reason to use interpolation over say… addr:housenumber=1;2;3;4 as is more common? For cases where it goes over the 255 character limit, you can make a couple of clustered nodes as needed.

Spearfish7424 · May 21, 2024, 11:02pm

Listing each individual address or representing each of them as their own node might not be suitable for this particular dataset because there are many cases where the addresses between the “from” and “to” fields don’t all exist.

For example, there are nodes within small buildings with a range of 50. Listing all addresses on a single node or mapping them as individual nodes would cause confusion and be incorrect.

Also, there’s the question of how to cleanly and accurately distribute nodes in cases where there actually are hundreds of addresses for a single location.

watmildon · May 21, 2024, 11:19pm

I’m not suggesting listing them each on a node but using a single node with the full range expressed as a semicolon delimited set. This is pretty common in areas I have edited from larger data sets. If an interpolation says 1 through 10 but 7 is somewhere else… the interpolation is giving the wrong impression no?

As for placement, near the centroid seems to work fine. The placement of the interpolation way has the same issue as “which way to we cluster groups of nodes” so I’m not it solves that issue any better.

I’ll also say that nodes are way way easier for other mappers to work with. If, say, another local mapper wants to add detailed placement it is way easier for them to edit a bunch of nodes than to deal with an interpolation.

ChaireMobiliteKaligrafy · June 1, 2024, 9:04am

I have been researching the multiple sources of addresses in the Montreal area recently. I now have a database of all residential addresses matched on these three databases: official landrole, postal data (Postes Canada/Canada Post) and Adresses Québec. By default, all addresses are snapped to the “cadastre” centroid, but I manually corrected every lands more than 10000 square meters and move them to the main entrance of the main building to reduce dissemination errors with large polygon centroids. Feel free to contact me to get access to this data and discuss how we could import it to OSM. I also corrected and validated all the street names in the Montreal area (From Saint-Adèle to Saint-Jean-sur-Richelieu and from Sorel to Ontario in the West).

Spearfish7424 · June 3, 2024, 6:05pm

Sorry for the delayed reply.

The dataset that my proposal is based on already places the address points within their respective building, but not on their exact entrance. If it’s thought to be a good idea, the points can be imported as-is and can then be refined (moved to each building’s entrance) once they’re imported, since this manual process would take some time.

I took a break from this import since some stuff got in the way, but if there’s still interest, and if you think what’s above is a good idea, I can proceed with it. I need to add some extra steps to the process (detailed on the wiki page) but once that’s done, I can post an example dataset and the community can let me know if it’s okay.

Spearfish7424 · June 5, 2024, 9:42pm

Hi everyone,

The dataset is ready to import. You can download the osm file here.

I’ll wait until next week to upload the file linked above as well as remove the redundant CanVec interpolation.

Please let me know if you find any issues!

ChaireMobiliteKaligrafy · June 7, 2024, 7:51pm

Can you add the addr:province=QC, and a source tag, so we can distinguish them from other data sources? Thanks!

Spearfish7424 · June 8, 2024, 12:48am

Sure I’ll add the province tag.

As for the source tag, based on the wiki page, we shouldn’t use it on individual elements anymore. It will of course be included on the changeset.

I know the older CanVec data has it though, so if you think local convention warrants it, I can add a source tag to each address too. My preference is to not add it though.

jfd553 · June 8, 2024, 2:18am

I agree, I think it’s better that way.

ChaireMobiliteKaligrafy · June 8, 2024, 11:45am

It is just that I would like to detect these imported addresses using overpass, so a source tag would have helped here. I will move a lot of them to the correct entrance and add postal codes. Without a special tag in the data, I cannot detect them and know which were updated and which were not. At the very least, please provide the complete list of osm ids for all the address nodes that were imported after each changeset. Thanks!

jfd553 · June 8, 2024, 12:20pm

This is the “safest” way to keep track of these nodes since everything else can later be edited (location, tags - including source:*, etc…)

ChaireMobiliteKaligrafy · June 8, 2024, 1:20pm

I shared my database with @Spearfish7424 in a private message. I’ll let him decide what to do with it and how to import the data. Thanks!

Spearfish7424 · June 8, 2024, 7:05pm

I received your database @ChaireMobiliteKaligrafy, thank you.

You mentioned that you validated existing OSM street names using Odonymie Québec. This is super useful because if ever there are discrepancies between the import’s data and your validation work, they can be detected using the Nominatim QA Tool; under the “Suspicious addr:street tag” section, any differences will be pointed out. From there, something like JOSM can change a large number of addresses at a time to fix any inconsistencies.

As for how to identify the addresses I’ll import, I’ll be sure to provide a complete list of their osm ids.

ChaireMobiliteKaligrafy · June 9, 2024, 10:16am

Nice! Thanks!

une_abeille · June 10, 2024, 3:37am

Thank you for doing so many high-value imports! This will save a lot of time for those of us who do StreetComplete, and help people find locations a lot.

I know of a few areas where the addresses that were added via StreetComplete are not formatted in an ideal manner, either with hyphens for ranges, (some of which are mine — I learned too late that software couldn’t parse them. We will blame StreetComplete for this, since it suggests to users to do it…) or with unit numbers in them, or using hyphens instead of a proper delimiter. It might be good to try to filter for these and replace their data, or at least flag them for a manual look on a second pass.

Spearfish7424 · June 10, 2024, 5:07pm

I’m glad you find them helpful!

I’ll look into it after tomorrow’s import. I’m hesitant to automatically overwrite existing OSM data though (that was manually created), so maybe a geojson that can be imported into iD (to help spot such addresses) might be helpful in this case.

I’ll post the file on the wiki page for the import and DM you if ever I make it.

Spearfish7424 · June 11, 2024, 6:17pm

Everything was uploaded successfully, including deleting the redundant CanVec data.

A complete list of the OSM IDs that were created, which @ChaireMobiliteKaligrafy was interested in and may be helpful to others, is available here.

Thank you to everyone for your feedback!

ChaireMobiliteKaligrafy · June 11, 2024, 6:26pm

Looks nice! No more interpolations! (except the small ones you added when data was in ranges) And the addresses I added to entrances are not duplicated most of the time. Nice job! I will continue to look around Thanks! If you want to import my database also, feel free to ask and we can do it together

ChaireMobiliteKaligrafy · June 11, 2024, 6:37pm

About the hyphens: unfortunatly most of the addresses databases do not include all the exact values in between for these ranges. We need a way to fetch postal data and get all the units and civic numbers for a range. This is possible but difficult.