Incorporating Data from Ontario OpenData into OSM

Hello everyone,

I am fairly new to OSM, and I have to say I love the idea of open source maps. That being said, I went to the city I grew up in and labelled maybe 30-40 buildings before deciding that there has to be a better way :slight_smile:

I decided to work on a project to pull in government data to provide more building footprints for the region. Also, please let me know if this is the wrong place to post, and kindly redirect me to a more appropriate area to disseminate this information.


I knew that Ontario has an open data initiative, so I checked to see if there was any interesting GIS data. Lo and behold, I stumbled upon a database of footprints for the city:

I knew that I had to be careful incorporating this data, so I devised a methodology for “merging” in the new data, while preserving the tags of what is currently there. I view this as ground truth and very important to preserve, so this data should not be deleted.


Everything is done in python. I make extensive use of osmnx and geopandas.

I first extract out polygons from OSM, as well as the polygons from the open source repository. I then compute the buildings that overlap, as well as an overlap ratio.

If the buildings overlap by more than 50%, I determine that they are the same building. I then pull in the polygons from the government provided GIS file while maintaining the metadata of the existing building.

If the reverse is true, I assume that the existing buildings are correct, and disgard the government data.

Preliminary Results

To visualize this I have prepared some graphics:



And here are some images for buildings that successfully matched, and those that are not. The red outline is the original, while the green is the new footprints:



Note that in the case of unsuccessful imports the community noted buildings that were missed by the city:

Points of Concern:

For some of the data, I would deem the city footprints to be more accurate. However, this is a counterexample for a small section of the city, kindly labelled by someone (or perhaps imported):

As you can see, although the city data seems correct, the original data is more detailed, demonstrating awnings, balconies and other features that were missed by the city.

There are a couple possible approaches to address this:

-Keep the original polygon, always
-Raise the threshold for acceptability, or
-Devise a heuristic where the polygon with highest number of vertices (we deem this to be more accurate) or multi-polygons are preferred over the alternative.

Further Work

Contingent on community approval, I have found that a large amount of Ontario cities provide open data to a similar level of detail, such as Hamilton or Burlington

Additional Resources

I have uploaded my proof of concept script to Github here:
There needs to be a lot of work done here still. As of the time of writing, this repository is still bare-bones. But feel free to critique, or even contribute! :slight_smile:

It seems a good idea.

Please follow the Import Guidelines.
You should contact the Canada community.

Hi, seconding muralito’s recommendation to have a look at the Import Guidelines. You’re already off to a good start with your thorough documentation. :slight_smile:

The main potential roadblock here may be the legal situation. From a quick look at the Brantford Open Data License, it seems to require attribution? This may not be compatible with OSM’s approach to attribution: Data users only need to give attribution to OpenStreetMap, but not to any of our data sources. So please verify that this is ok! (If it turns out there’s a legal incompatibility, there might still be a chance to get an explicit confirmation from them that being listed as a source somewhere on the OSM wiki or in changesets is sufficient attribution.)

As for the actual methodology, I recommend not replacing existing data automatically if there’s a risk that it might introduce errors or reduce quality. Keep in mind that, in addition to less detailed outlines, there may be additional issues which are hard to check for (e.g. POI or address nodes ending up next to the new building polygon instead of inside it, or building polygons intersecting with roads). So I would suggest manual review of potentially tricky cases.