Import proposal

I want to compare the latest upload candidate with the current building data on OSM.

  1. Open the latest upload candidate with JOSM : :arrow_upper_right:
  2. Filter the data to only show ways that contain building=yes.
  3. Download the current existing building data by using the Overpass API.

Here’s the result :

  • Light pink : Current OSM data
  • Grey : Latest upload candidate

Looks good.

I have read the Wiki @Cascafico. That’s a good approach to exclude building with those criteria.

  1. Could you please further detail QGIS action only?
  2. @rtnf 's analysis makes me think this is good data. Thanks for that. So I would like to contribute to this as a volunteer, how can I help?

First, select a region by using this tool, copy the WKT string.

Then, open this colab, paste the WKT string, then run the first codeblock.

Wait, then run the step 2 codeblock to download.


It’s a CSV file.

Open the downloaded file by using QGIS. Layer → Add Layer → Add Delimited Text Layer

Geometry definition → WKT. Setup the CRS. Add.

Right click the layer → Filter… "confidence" >= 0.75 AND "area_in_meters" >= 25

Right click the layer → export → save features as → geojson.

Open the geojson file by using JOSM.

Light pink : current OSM building data
Grey : Google Open Building

Real good.

But at this point, I still don’t know how to remove candidates that intersect with pre-existing OSM buildings.

Basically in Qgis you should save WKT and OSM data in an editable format like Geopackage.

Removing small and low confidence geometries from candidate dataset can be done via table editor “select by expression”.

Removing geometries touching OSM buildings: click menu “processing”, find a search field and type “Select by Location”. Set the layer fields and basically all you need to do is checkbox where elements are disjoined. Run and save resulting layer in geojson for loading in JOSM.

1 Like

Apparently, my PC is way too weak to continue this process :sweat_smile:

hallo Mas @rtnf, Thank you for sharing. It’s interesting to me, but is there any particular reason why we chose this number as the threshold?

Area: I supposed a 5x5 meters room is the minimum for dwelling and smaller shapes are closer and closer to pixel size, hence prone to errors: do we prefer more huts with higher false positive ratio? Of course the local community contribution is important, particularly on these issues.

Confidence: I took this suggestion that seems to me reasonable; if we had human resources that can check data quality broadly, I would drop this filter.

Anyway I think going for with relatively large and reliable objects could be a starting point: if needed, later you can populate further, following the same procedure.

1 Like

Looks like you process the WKT. Probably working on Geopackage (or shp) will solve the issue. Personally I extracted from 50k polys with 7k OSM input in 5-6 minutes, using part of 8G RAM.

Saya pakai angka threshold yang dipilih Cascafico sebelumnya. Tujuan utamanya untuk menyaring data berkualitas. Karena ada kemungkinan kalau bangunan sempit (area_in_meters) dengan akurasi yang rendah (confidence) itu hanya artefak hasil pemrosesan AI-nya saja.

Untuk angka threshold pastinya, tidak harus 0.75 dan 25 sih. Mungkin ada kombinasi angka lain yang lebih bagus. Tapi untuk mengetahuinya secara pasti, kita perlu citra satelit yang terbaru / verifikasi langsung kontributor OSM yang ada di lokasi, untuk memverifikasi setiap bangunan yang ada di dataset ini – secara manual.

1 Like

Do you mind detailing the process?

Looks like you process the WKT. Probably working on Geopackage (or shp) will solve the issue. Personally I extracted from 50k polys with 7k OSM input in 5-6 minutes, using part of 8G RAM.


Load CSV Google Open building, setting geometry as WKT

Zoom to you a Area Of Interest, select objects and export saving elements in Geopackage format

Load OSM building previously extracted in Geojson format with this query

Fill “Select by location” form for selecting buildings disjoined (not touching) OSM ones.

1 Like

The “confidence” threshold (.75) comes from a secondary file provided by Google which has “recommended” scores for each region.


After a week of settling, in order to avoid AOI conflict issues, I uploaded the published OSM file.

Issues solved:

  • building-highway crossings
  • building crossings with join (SHIFT-J)
  • building simplification (SHIFT-Y, 0.25m)
  • consequent buildings self-crossing fix

I took the opportunity to fix (offset) some highways where Bing aerial images and Google buildings were consistent.


Thank you for sharing and addressing the issues above.

  1. I loaded it in JOSM, added the tag “building=yes,” and ran the validation, resulting in this output (picture attached). Could you please confirm if this is the expected result and the correct workflow to import?

  2. Your imported buildings looks amazing with complete changeset comment and source.

  • But I noticed some weird buildings shape like this: 1, 2.
  • And crossing buildings: 1, 2.


Thanks for you post-import report. For some reason I’ve left out about a hundred geometric warnings. I’m solving in a single changeset, but I don’t understand why they were not raised in JOSM before uploading.

Thank you. This is what I can do voluntarily for your effort. I checked your uploaded data in Sabu Raijua and fixed some self-intersection ways and crossing building.

Sometimes it happened to me as well, JOSM shows nothing before uploading. I noticed new(?) warnings after uploading my data. I usually re-download my uploaded data as a new layer and run the JOSM validator.

I found Google uses the latest imagery as they mentioned on their page, this building still does not exist based on Bing, but it appears in the Google Maps satellite mode. I leave it as is for now. This leaves us with a gap if we use Google’s dataset.


I confirm images used for buildings recognition are pretty fresh, or fresher than Bing. Somewhere in Argentina also. It seems Maxar (which usually is updated in Europe) is still unavailable, so right now in many cases we cannot double check for false positive buildings.

@Cascafico do you have documentation link/directory for this import process ?

I posted wiki link in first message, anyway here it is :slight_smile:

Ah thank you, I didn’t notice that there was a hyperlink text!

1 Like