Hamilton address open data import

The City of Hamilton maintains an open dataset with address data for the city. I am interested in importing this data into OSM, replacing the current CanVec address interpolation data. Thankfully, this data is under the Hamilton Open Data Licence which is already an approved license. This is not a formal import proposal, since I am new to OSM I would like to ask a few questions first.

  1. Is anyone already working on this import (or any other import in Hamilton)? I took a look around and didn’t see any posts so I assume not.
  2. Are there existing generic tools for handling address imports? Or do people write new tools every time to handle their specific case?
  3. What data should be imported? addr:housenumber and addr:street are the most important, addr:city=Hamilton can be set everywhere I add an address. Should addr:province=ON be set (current CanVec nodes do not)? Should postal codes be imported?

Here is a rough idea of how I think this import will work. Initially, I want to focus on single residential buildings in the neighborhoods surrounding McMaster University (where I live). Eventually my goal would be to import as much of the address data as is useful.

  1. Covert the address data to use OSM tags. I already have a simple script for this that reads geojson and writes OSM XML. I think it might also be possible to do this directly in JOSM.
  2. Load the address data into JOSM. Scan over the import and tweak the positions of the new addresses nodes to be closest to the building they should address (typically only needed on curved sections or when there is another building such as a shed nearby).
  3. Export the adjusted addresses layer from JOSM. Use another script (I have not yet written this one) that will copy the address attributes to the nearest building and remove the address interpolation data. The script will report issues such as multiple addresses on one building.
  4. Review the final data in JOSM and upload.

I will conduct some further testing and refine this process. Notably, it doesn’t currently account for multiple addresses for one building (town houses, etc) or address nodes that should be left as is (instead of attached to a building).

I welcome any advice or feedback.

Every data set is different, so people write some scripts to convert that data into the same format as osm.
I recommend sticking to geojson, instead of converting to xml, to avoid any errors

You can use Josm and the conflation tool to make sure there are no duplicates, there’s no need for your own script

Postal codes are always a valuable thing to have in addresses.
There isn’t a consensus on inclduing provinces in the address, since you can use geospatial queries to get that data, but having it all tagged makes it easier for the apps that use the data

Hello,

I don’t have experience doing an import, so I can’t comment, but hopefully others can.

I do have experience working with address data in OSM, so I will comment on this:

IMO, addr:city=Hamilton is not needed. Hamilton has a well-defined border in OSM (Relation: ‪Hamilton‬ (‪7034910‬) | OpenStreetMap) that OSM tools can use to figure out the city (e.g. in Nominatim: https://www.openstreetmap.org/search?lat=43.265048&lon=-79.947751&zoom=19). Same with province.

What could be beneficial is including the community (Dundas, Stoney Creek, etc) in addr:suburb tag. I see it in the open data as the COMMUNITY field. In OSM we have nodes for them (e.g. Node: ‪Dundas‬ (‪249641649‬) | OpenStreetMap), but not areas. It looks like there are some duplicate street addresses within Hamilton (e.g. Node: ‪1 King Street East‬ (‪9405898132‬) | OpenStreetMap and Node: ‪1 King Street East‬ (‪907786134‬) | OpenStreetMap) so having the disambiguating addr:suburb would help.

The postal codes would be extremely useful. Please do include them whenever you can. The six-character postal codes are generally not freely available in Canada (Canada Post wants you to buy them), so if Hamilton is making them available under an open license (and I do see them in the open data preview), it would be fantastic to include them into OSM.

1 Like

Thank you, I tried this out and it works great for what I need (automatic matching with the ability to easily review and tweak). I will play around a bit to find the best workflow.

Good idea, I will explore this (and look at what is already mapped).

What can be done to ensure that the postal codes are correct? Is it acceptable to compare a subset of the data with other sources (which may not be compatible with the OSM license) in order to validate the data?

You could presumably do some spot-checks as part of pre-import validation of the data you’re about to suggest importing. I don’t think the data you’re spot-checking against would have to be compatible license.

I expect the amount of comparisons to do as part of this validation will in practice be low because there isn’t a substantial amount of postal codes data we could get to compare against (unless you work at a commercial address provider). It’ll probably be manual checks against the Canada Post postal code lookup?

Postal codes are present in the dataset, but do not appear in its description. The City may have forgotten to remove them when creating the datasets. Therefore, they may not be available in future versions!

“Postal code” attribute is visible for me upon pressing the “Load more” button at the end of the attribute list

Screenshot

:joy: didn’t click on “show more”!
Thanks

That’s a good idea. Data import can be complex. A quick look at the data reveals:

  • Duplicate addresses with the same coordinates, but different apartments (perhaps a way to add addr:flats?).
  • Addresses aren’t necessarily on the corresponding buildings/map features (some are far apart).

I don’t know their data model, but:
Buildings can have different addresses, some can even have addresses on different streets.
Multiple buildings can have the same address.

It is therefore useful to find out as much as possible before committing.

Thanks for taking a look!

From looking at a few examples, there is always an address point with UNIT_NUMBER_COMPLETE empty, then other points (with the same coordinates) where that field has a unit number. I think I will ignore the unit numbers for now, as they are a lot less useful (and much harder to work with and verify).

I think they are being placed on the center of the lot/parcel. In most cases, this is closest to the desired building, exceptions being curves in the road and other buildings (like sheds). Some of this may be fixed by adjusting the conflation parameters in JOSM, some will need manual work.