Hello mappers, I am proposing to import from Google Open Buildings dataset, sourced from Google.
Documentation
This is the wiki page for my import: https://wiki.openstreetmap.org/wiki/Import/Catalogue/Sabu_Raijua_building_import
This is the source dataset’s website: http://sites.research.google/open-buildings
The data download is available here: http://sites.research.google/open-buildings/#download
This is a preview file which shows geometric data https://github.com/cascafico/Sabu-Raijua-building-import/blob/main/sabu_raijua-gt30-disjoined.geojson
This is a file I have prepared which shows the data after it was translated to OSM schema and JOSM validated: (https://github.com/cascafico/Sabu-Raijua-building-import/blob/main/sabu_raijua-confidende0752-gt25-disjoined.osm)
License
I have checked that this data is compatible with the ODbL.
This data is distributed under ODbL v.1.
Abstract
Dataset has beel released on May 2023 and features AI buildings recognition on Google aerial images. Import is part of South Asia v.3 release by Google. I cropped and area that contains approximately 52k buildings which could be added to the 7k already in planet.osm. Tagging schema is trivial except for a potential new tag “plus_code” that aims to give an alternative textual address based exclusively on object centroid. Conflation should be trivial too: every candidate that touches OSM building shall not be imported.
I have no relation with Google nor with the area subject of this import proposal.
Given this is the first (to my knowledge) import from the Google dataset it’s probably worth someone from the Legal Working Group signing off that there isn’t some super weird thing about how it’s licensed.
Do you have a sense of error rate? Either buildings objects that aren’t actually on imager (false positive) or missing buildings or shape etc.
My understanding is that plus code tagging is seen as completely redundant and discouraged as it can be generated from lattitude+longitude. There’s discussion on the wiki about general plus code support and this note : Proposal Open Location Code - OpenStreetMap Wiki
I’ve loaded the prepared file into JSOM as well as current OSM data for chunk of the full area. Running the validator show there’s likely a large number of buildings that conflict with currently mapped roadways/paths. ex: plus_code=5QX3HV4W+37H9.
You’ll want to do a run finding/filtering buildings that run through mapped roads/paths.
Did you do any filtering based off of the confidence score? I see they’ve recommended using only buildings with confidence score greater than 0.752 for what they mark as 90% confidence for this sector.
Picking a few things that are somewhat ambiguous on the various imagery I have in JOSM, and pulling their records from the sector file, there’s some chunk of these additions that have scores below that threshold.
To figure out which data is more reliable between Google buildings and OSM highways, I would use the few Strava traces available in main island. As far I see, there could be a slight highway offset (say 2-5 meters) that could lead to building and highway crossings, particularly in dense areas.
Update: having processed buildings with confidence > 0.752 disjoined from OSM ones, JOSM validator raises just 78 building-highway crossing issues. I think is a good result for a 20k objects dataset.
I recommend dropping that as a tag, because a plus code can be derived from the location. I find this explainer from Frederik Ramm about Open Location Codes & OpenStreetMap very good, tl;dr: It’s easier, faster, and more reliable to make other software support plus codes, rather than adding that as OSM tags.
One thing you may wish to try is to make a small dataset for whatever is (for example) the 100 buildings on either side of the current quality line you are using (.752) and letting that help guide whether the cut line is too high or too too low.
I have updated some statistics on dataset filtering. See wiki import data - background. I would go for 0.752 (90%) confidence and more than 25 square meters.