I am fairly new to OSM, and I have to say I love the idea of open source maps. That being said, I went to the city I grew up in and labelled maybe 30-40 buildings before deciding that there has to be a better way
I decided to work on a project to pull in government data to provide more building footprints for the region. Also, please let me know if this is the wrong place to post, and kindly redirect me to a more appropriate area to disseminate this information.
I knew that Ontario has an open data initiative, so I checked to see if there was any interesting GIS data. Lo and behold, I stumbled upon a database of footprints for the city:
I knew that I had to be careful incorporating this data, so I devised a methodology for “merging” in the new data, while preserving the tags of what is currently there. I view this as ground truth and very important to preserve, so this data should not be deleted.
Everything is done in python. I make extensive use of osmnx and geopandas.
I first extract out polygons from OSM, as well as the polygons from the open source repository. I then compute the buildings that overlap, as well as an overlap ratio.
If the buildings overlap by more than 50%, I determine that they are the same building. I then pull in the polygons from the government provided GIS file while maintaining the metadata of the existing building.
If the reverse is true, I assume that the existing buildings are correct, and disgard the government data.
To visualize this I have prepared some graphics:
And here are some images for buildings that successfully matched, and those that are not. The red outline is the original, while the green is the new footprints:
Note that in the case of unsuccessful imports the community noted buildings that were missed by the city:
Points of Concern:
For some of the data, I would deem the city footprints to be more accurate. However, this is a counterexample for a small section of the city, kindly labelled by someone (or perhaps imported):
As you can see, although the city data seems correct, the original data is more detailed, demonstrating awnings, balconies and other features that were missed by the city.
There are a couple possible approaches to address this:
-Keep the original polygon, always
-Raise the threshold for acceptability, or
-Devise a heuristic where the polygon with highest number of vertices (we deem this to be more accurate) or multi-polygons are preferred over the alternative.
Contingent on community approval, I have found that a large amount of Ontario cities provide open data to a similar level of detail, such as Hamilton http://open.hamilton.ca/datasets/9b0ccd920ab34810a155b9f21ed1b075_8 or Burlington https://navburl-burlington.opendata.arcgis.com/datasets/buildings.
I have uploaded my proof of concept script to Github here: https://github.com/Gezili/merge-ontario-data
There needs to be a lot of work done here still. As of the time of writing, this repository is still bare-bones. But feel free to critique, or even contribute!