Sabu Raijua building import proposal

Sabu Raijua building import

Hello mappers, I am proposing to import from Google Open Buildings dataset, sourced from Google.

Documentation

This is the wiki page for my import:
https://wiki.openstreetmap.org/wiki/Import/Catalogue/Sabu_Raijua_building_import
This is the source dataset’s website:
http://sites.research.google/open-buildings
The data download is available here:
http://sites.research.google/open-buildings/#download
This is a preview file which shows geometric data
https://github.com/cascafico/Sabu-Raijua-building-import/blob/main/sabu_raijua-gt30-disjoined.geojson
This is a file I have prepared which shows the data after it was translated to OSM schema and JOSM validated:
(https://github.com/cascafico/Sabu-Raijua-building-import/blob/main/sabu_raijua-confidende0752-gt25-disjoined.osm)

License

I have checked that this data is compatible with the ODbL.
This data is distributed under ODbL v.1.

Abstract

Dataset has beel released on May 2023 and features AI buildings recognition on Google aerial images. Import is part of South Asia v.3 release by Google. I cropped and area that contains approximately 52k buildings which could be added to the 7k already in planet.osm. Tagging schema is trivial except for a potential new tag “plus_code” that aims to give an alternative textual address based exclusively on object centroid. Conflation should be trivial too: every candidate that touches OSM building shall not be imported.

I have no relation with Google nor with the area subject of this import proposal.

4 Likes

Good work!
I Checked your file to make sure the conflation turned out well, and it looks like it has.

If you’ve got the support of the community, I believe the import will turn out fine.

I’ll also ask that you use the standard changeset tags that have been provided on the plan outline:
https://wiki.openstreetmap.org/wiki/Import/Plan_Outline#Changeset_Tags

Thanks,
James

Love to see a new dataset being used!

Given this is the first (to my knowledge) import from the Google dataset it’s probably worth someone from the Legal Working Group signing off that there isn’t some super weird thing about how it’s licensed.

Do you have a sense of error rate? Either buildings objects that aren’t actually on imager (false positive) or missing buildings or shape etc.

My understanding is that plus code tagging is seen as completely redundant and discouraged as it can be generated from lattitude+longitude. There’s discussion on the wiki about general plus code support and this note : Proposal Open Location Code - OpenStreetMap Wiki

2 Likes

I’ve loaded the prepared file into JSOM as well as current OSM data for chunk of the full area. Running the validator show there’s likely a large number of buildings that conflict with currently mapped roadways/paths. ex: plus_code=5QX3HV4W+37H9.

You’ll want to do a run finding/filtering buildings that run through mapped roads/paths.

According to the website:

How is the data licensed?

The data is shared under the Creative Commons Attribution (CC BY-4.0) license and the Open Data Commons Open Database License (ODbL) v1.0 license. As the user, you can pick which of the two licenses you prefer and use the data under the terms of that license. However, please note the liability disclaimer in the footnotes 1 .

And as mentioned by @Cascafico

Legally, we can use it.

2 Likes

Did you do any filtering based off of the confidence score? I see they’ve recommended using only buildings with confidence score greater than 0.752 for what they mark as 90% confidence for this sector.

Picking a few things that are somewhat ambiguous on the various imagery I have in JOSM, and pulling their records from the sector file, there’s some chunk of these additions that have scores below that threshold.

From the provided “score thresholds” file.

s2_token confidence_threshold_80%_precision confidence_threshold_85%_precision confidence_threshold_90%_precision
2c5 0.651 0.714 0.752

We can decide whatever threshold we’d like to use of course and I would love to hear folks thoughts about it.

2 Likes

To figure out which data is more reliable between Google buildings and OSM highways, I would use the few Strava traces available in main island. As far I see, there could be a slight highway offset (say 2-5 meters) that could lead to building and highway crossings, particularly in dense areas.

Update: having processed buildings with confidence > 0.752 disjoined from OSM ones, JOSM validator raises just 78 building-highway crossing issues. I think is a good result for a 20k objects dataset.

1 Like

I recommend dropping that as a tag, because a plus code can be derived from the location. I find this explainer from Frederik Ramm about Open Location Codes & OpenStreetMap very good, tl;dr: It’s easier, faster, and more reliable to make other software support plus codes, rather than adding that as OSM tags.

4 Likes

One thing you may wish to try is to make a small dataset for whatever is (for example) the 100 buildings on either side of the current quality line you are using (.752) and letting that help guide whether the cut line is too high or too too low.

2 Likes

Besides, I would say that buildings are more precise than highway network

1 Like

I have updated some statistics on dataset filtering. See wiki import data - background. I would go for 0.752 (90%) confidence and more than 25 square meters.

1 Like

Current local community discussion :