Importation des adresses de la Ville de Sherbrooke

Je prépare actuellement l’importation des adresses de la Ville de Sherbrooke dans OSM.

Mon objectif principal est d’ajouter les adresses manquantes des 60 000 bâtiments répertoriés dans Osm. La plupart de ces bâtiments ont été cartographiés il y a deux ans. L’importation se fera de manière itérative, un type d’intégration à la fois, des cas les plus simples aux plus complexes. Les adresses existantes seront validées et les bâtiments manquants seront ajoutés si possible.

J’ai utilisé le gabarit prescrit (en anglais) pour détailler le projet. La description détaillée est disponible ici.

J’attends vos commentaires. Merci :slight_smile:

There are many points with the same coordinates but different addresses. If you open them in JOSM they’ll be merged into a single point with non-duplicated values separated by ;, for example:

ADRESSE=1670 BOULEVARD DE PORTLAND;325 RUE BRYANT
LIEN_VOIE=DE
NOM_VOIE=BRYANT;PORTLAND
NO_CIVIQUE=325;1670
OBJECTID=61251;61250
RUE=RUE BRYANT;BOULEVARD DE PORTLAND
TYPE_VOIE=RUE;BOULEVARD

(note how LIEN_VOIE just has DE, even though that only applies to the second TYPE_VOIE, a similar thing happens for RUE if an address has the same street name, there’s just one copy)

There can be even more than two addresses per point:

42284 1
 3427 2
 1212 3
 1369 4
  295 5
  594 6
   82 7
  127 8
   34 9
  146 10+

One point has 112 addresses.

I also noticed that ADRESSE and RUE don’t contain dashes, for example:

ADRESSE=1126 RUE SAINT LOUIS
MUNICIPALITE=43027
NOM_VOIE=SAINT-LOUIS
NO_CIVIQUE=1126
OBJECTID=47056
RUE=RUE SAINT LOUIS
TYPE_VOIE=RUE

so you have to use NOM_VOIE, because OSM names do have dashes.

The data doesn’t use accented (non-ASCII) letters like À Â Ç É È Ê Ë Ô Î Ï Û Ü.

There’s an ORIENTATION column

Normally the street name is TYPE_VOIE+NOM_VOIE but for streets like “12E AVENUE” it should be the other way around.

Here, “rue” values are not appropriate for import (i.e., all capitalized, missing hyphens and street orientation). They are replaced by the value of the “toponymie” field of Sherbrooke “Segments de rue” dataset (ODbL Compliance verified). This field matches actual street signs in Sherbrooke.

What you should do is build all the names from TYPE_VOIE+NOM_VOIE+LIEN_VOIE+ORIENTATION, get all the street names in OSM and then match them (by lowercasing both names and comparing, with some special cases) then you have 3 data sets, street names that match to OSM, govt street names that don’t match to OSM, OSM street names that don’t match to a govt street name. Here’s a Python script that does this which you can run by downloading the dataset as geojson, renaming the file to Adresse.geojson, saving this code as sherbrooke.py in the same folder as the geojson file:

# Gets street names from OpenStreetMap and government data
# and compares them to find streets that are in one but not the other
# and vice versa.

import sys
import re
import difflib
import os

import requests
import geopandas as gpd
import osmnx as ox


GOV_FILENAME = "Adresse.geojson"

# load city file
gov = gpd.read_file(GOV_FILENAME)
# filter out address points without name
gov = gov[(gov["RUE"].notnull())]
# print(gov.columns)
# print(gov.head())

orientation = {
    "N": "Nord",
    "S": "Sud",
    "E": "Est",
    "O": "Ouest",
}


def to_osm_name(row):
    name = row["TYPE_VOIE"]
    if row["TYPE_VOIE"] == "AVENUE" and re.match(r"^[0-9]+[A-Z]?$", row["NOM_VOIE"]):
        name = row["NOM_VOIE"] + " " + row["TYPE_VOIE"]
    else:
        if "LIEN_VOIE" in row and row["LIEN_VOIE"] is not None:
            if row["LIEN_VOIE"].endswith("'"):
                name += " " + row["LIEN_VOIE"] + row["NOM_VOIE"]
            else:
                name += " " + row["LIEN_VOIE"] + " " + row["NOM_VOIE"]
        else:
            name += " " + row["NOM_VOIE"]
    test_name = name
    if "ORIENTATION" in row and row["ORIENTATION"] is not None:
        name += " " + orientation[row["ORIENTATION"]]
        test_name += " " + row["ORIENTATION"]
    if test_name != row["RUE"] and test_name.replace("-", " ") != row["RUE"]:
        print(
            f"Warning: {row['RUE']} != {name}",
            file=sys.stderr,
        )
    name = name.replace("-ST-", "-Saint-")
    name = name.replace(" ST-", " Saint-")
    # CHEMIN DU 6E RANG
    if re.match(r"^CHEMIN DU [0-9]+E RANG$", name):
        name = name[len("CHEMIN DU ") :]
    if name == "CHEMIN DU 4E RANG-DE-SAINT-FRANCOIS":
        name = "4E RANG DE SAINT-FRANÇOIS"
    return name


gov["osm_name"] = gov.apply(to_osm_name, axis=1)
gov["join_key"] = gov["osm_name"].str.lower()

# rename column so iD shows the name when loading the file
gov["name"] = gov["osm_name"]

osm = ox.features_from_place(
    "Sherbrooke, Estrie, Quebec, Canada",
    {
        "highway": [
            "motorway",
            "motorway_link",
            "primary",
            "primary_link",
            "secondary",
            "secondary_link",
            "tertiary",
            "tertiary_link",
            "residential",
            "unclassified",
            "service",
            "living_street",
            "track",  # Chemin Rawson
            "construction",  # Rue Étienne-Desmarteau
            "footway",  # Ruelle Whiting
        ]
    },
)
# filter out streets without name
osm = osm[osm["name"].notnull()]
osm = osm.loc[osm.index.get_level_values("element") == "way"]
osm.to_file("osm_streets.geojson", driver="GeoJSON")
osm = osm[["name", "geometry", "highway"]]
# print(osm.columns)
# print(osm.head())

print("Loaded data", len(gov), len(osm), file=sys.stderr)


def osm_name_to_join_key(name):
    name = name.lower()
    # replace all french accents/letters with their non-accented equivalents
    name = (
        name.replace("à", "a")
        .replace("â", "a")
        .replace("ç", "c")
        .replace("é", "e")
        .replace("è", "e")
        .replace("ê", "e")
        .replace("ë", "e")
        .replace("ô", "o")
        .replace("î", "i")
        .replace("ï", "i")
        .replace("û", "u")
        .replace("ü", "u")
    )
    return name.lower()


osm["join_key"] = osm["name"].apply(osm_name_to_join_key)


joined = gov.merge(osm, on="join_key", how="left", suffixes=("_gov", "_osm"))

# save just the rows in osm streets that are not in the joined dataframe
osm_not_in_gov = osm[~osm["join_key"].isin(joined["join_key"])]
osm_not_in_gov.to_file("osm_not_in_gov.geojson", driver="GeoJSON")

# do it the other way around and
# save just the rows in govt that are not in the joined dataframe
joined = osm.merge(gov, on="join_key", how="left", suffixes=("_osm", "_gov"))
gov_not_in_osm = gov[~gov["join_key"].isin(joined["join_key"])]
gov_not_in_osm.to_file("gov_not_in_osm.geojson", driver="GeoJSON")

osm_keys = set(osm_not_in_gov["join_key"])
gov_keys = set(gov_not_in_osm["join_key"])

# print(f"{len(osm_keys)} OSM keys not in Govt", file=sys.stderr)
# for key in sorted(osm_keys):
#     name = osm[osm["join_key"] == key].iloc[0]["name"]
#     # find the closest match in gov_keys
#     closest = difflib.get_close_matches(key, gov_keys, n=3)
#     print(f"{name} ({key})", file=sys.stderr)
#     for c in closest:
#         c_name = gov[gov["join_key"] == c].iloc[0]["osm_name"]
#         print(f"    {c_name} ({c})", file=sys.stderr)


# print()
# print(f"{len(gov_keys)} Govt keys not in OSM", file=sys.stderr)
# for key in sorted(gov_keys):
#     name = gov[gov["join_key"] == key].iloc[0]["osm_name"]
#     # find the closest match in gov_keys
#     closest = difflib.get_close_matches(key, osm_keys, n=3)
#     print(f"{name} ({key})", file=sys.stderr)
#     for c in closest:
#         c_name = osm[osm["join_key"] == c].iloc[0]["name"]
#         print(f"    {c_name} ({c})", file=sys.stderr)

print()
print(f"{len(gov_keys)} Govt street names not in OSM", file=sys.stderr)
print(f"{len(osm_keys)} OSM street names not in Govt", file=sys.stderr)


def open_file_in_josm(file_path):
    josm_url = "http://127.0.0.1:8111/open_file"
    full_path = os.path.abspath(file_path)
    response = requests.get(f"{josm_url}?filename={full_path}")
    print(f"Opening {file_path}: {response.status_code} - {response.reason}")


open_file_in_josm("osm_not_in_gov.geojson")
open_file_in_josm("gov_not_in_osm.geojson")

# create a dict mapping the join_key to name
pairs = list(zip(osm["join_key"], osm["name"]))
osm_dict = {}
for j, n in pairs:
    # find duplicates in the join_key that have different names
    if j in osm_dict:
        if osm_dict[j] != n:
            print(
                f"Duplicate join_key {j!r} with different names: {osm_dict[j]!r} and {n!r}",
                file=sys.stderr,
            )
    else:
        osm_dict[j] = n
# duplicate in OSM, but only this one in government data
osm_dict["rue de la prairie"] = "Rue de La Prairie"

gov_in_osm = gov[gov["join_key"].isin(osm["join_key"])]
gov_in_osm["addr:street"] = gov_in_osm["join_key"].map(osm_dict)
gov_in_osm["addr:housenumber"] = gov_in_osm["NO_CIVIQUE"]
# drop all other columns except for geometry
gov_in_osm = gov_in_osm[["geometry", "addr:street", "addr:housenumber"]]
gov_in_osm.to_file("gov_in_osm.geojson", driver="GeoJSON")
# open the file in JOSM
open_file_in_josm("gov_in_osm.geojson")

then execute these commands:

pip install requests geopandas osmnx
python sherbrooke.py

gov_in_osm.geojson will contain OSM-ready data with points with addr:housenumber and addr:street with the latter matching an existing street name in OSM. Let me know if you have trouble running the code, I can email you the geojson file.

I recommend using the Conflation plugin.

You still need to figure out what to do for duplicate addresses though, you can’t import that data as-is, addr shouldn’t have multiple values separated by ;.

This street is missing house outlines. Dataset has addresses for parks/green areas, like 2611 Rue des Dahlias.

Hi Boris,

I should first mention that your analysis of the import data is excellent!

As you read in the detailed project description, this issue was identified, and is actually one of the motivation to iteratively import the data, simple cases first.

As suggested in the wiki concerning Buildings with multiple house numbers, I used to separate housenumber values with coma, but the render :wink: is not appealing when there is more than 2-3 values. The best approach I found so far (still according to the wiki) is to add each address value, as a node, on main entrance side of the building outline. As explained, I plan to do this manually in a later iteration, unless I develop a suitable algorithm.

For the cases of multiple buildings for one housenumber,I will either add the address to the perimeter of the containing entity (e.g., amenity=university), or create a multipolygon building if there is no such entity.

Regarding the street names, I already have upload the “Segments de rue” dataset in my PostgreSQL database and used the “toponymie” field (instead of “rue”) to feed the “addr:street” key.

Finally, thanks for the code even if the solutions were already developed using Postgresql/PostGIS/Osmosis. However, I’ll take a closer look at it regarding Josm aspects as well as to the Conflation plugin.

1 Like