Poor quality imports of microsoft AI buildings

Hello OSM UK community,

I am writing this post to express my concern about the current additions of microsoft/BuildingFootprints data to OSM, as well as the large amount of these buildings already present in the database. These buildings, particularly unsplit (blobbed) semidetached houses and terraced houses cause significant problems for other editors trying to improve the map, especially StreetComplete users, while only offering an illusion of quality building coverage.

According to taginfo, there are currently 820,000 buildings with the tag source=microsoft/BuildingFootprints, with the majority being created after January 2024.

The proportion of buildings with this tag that still need to be corrected can only be estimated, as many buildings with this tag are detached buildings or have since been split and improved, while on the other hand there are many older changesets that imported ai buildings without applying any source= tag. I am most concerned about a subset of these, the import of semidetached and terraced houses as single, grouped buildings.

While I acknowlege that many older buildings within OSM are unsplit semis and terraces, I believe we should avoid adding such buildings into the database, now that we have reasonable quality aerial imagery and cadastral parcels to assist with splitting buildings. My main concern is that these buildings are much more difficult to correct later on than if they never existed in the first place, particularly if other users have added tags such as housenumbers and roof shapes. Speaking of housenumbers, users surveying with StreetComplete often make errors when assigning housenumbers to unsplit buildings, given the polygons do not accurately represent the real-world situation. Of course it is more time consuming to map buildings accurately, and the UK has a serious lack of buildings, but I feel we should always proiritise the accuracy and usefulness of our data over simply number of buildings mapped per day. I am particularly dissapointed by the fact that there are many users with thousands of changesets importing these poor-quality buildings, who one might expect to understand the implications of bringing such data into OSM. I have attempted to raise my frustrations with some users before, but was unfortunately met with radio silence or dismissal of my concerns. (Please excuse the passive aggressive tone, but you should see some of the buildings people are accepting into the database!).

Now, some illustrative examples of problematic buildings:

Buildings added by PicaPico in Changeset: 158855738 | OpenStreetMap , this user has imported over 183,000 ai buildings

Buildings added by GinaroZ in Changeset: 176196991 | OpenStreetMap , this user has imported over 6500 ai buildings

Buildings added by Grove11 in Changeset: 175983510 | OpenStreetMap , this user has imported over 45,000 ai buildings

Buildings added by RyanBush in Changeset: 171320900 | OpenStreetMap , this user has imported over 165,000 ai buildings

In conclusion:

It’s going to take years to clean up and split all these poor-quality buildings that have been mass-imported, and unmanageably large amounts continue to be imported. (This is coming from someone who has added or improved ~190,000 buildings.) My view is that we should expressly disallow the import of such subpar and objectively incorrect buildings into OSM, specifically unsplit semis and terraces or buildings with very poor geometry. In some places, I feel it is necessary to perform mass-deletions of these buildings, so that these buildings can be manually remapped with some semblance of care and quality. In most cases, it’s simply more more time consuming to realign and split existing ai buildings than simply deleting and redrawing them. Additionally, if these had been manually drawn by a new user we’d leave a helpful comment explaining that the buildings should be split. Why make an exception just because they were suggested by AI? Ultimately, my one goal is to have OSM in the UK be as accurate and complete as possible, and in my view the continuing import of Microsoft AI buildings is a barrier to achieving that.

If you’ve made it this far, thank you for reading.

LGS.

Additional:

I wanted to tack a poll onto this thread to seek the opinion of the UK community members on whether you would support deletion of poor quality buildings without immediate replacement, or if you would only like to see these imprecise buildings deleted as they are being split/replaced. (Assume said buildings are still on v1 with no useful tags added subsequently).

  • It’s ok to delete poor quality AI buildings without replacing them immediately
  • Poor quality AI buildings should only be deleted when being split/replaced
0 voters
20 Likes

I completely agree.

I think I used Microsoft Buildings in 3 changesets and as you said, it’s so time consuming to realign and split AI slop buildings. In addition to this, it’s also very time consuming as the mapper when closing the changeset to cherrypick the buildings that look good and are accurate (which is what people using Rapid/MapWithAI should be doing in the first place). After all that fuss it’s much quicker to just draw in buildings manually.

Even though OSM is pretty much all about improving things over time, there simply isn’t enough people actively editing everywhere to keep up with the rate at which badly drawn AI buildings can be added, meaning these could be left for years. It’s far better for things to be added right first time.

15 Likes

I totally agree as well! I would much rather that the building data be slowly added manually by careful mappers than just mass importing this ai rubbish.

What’s the point/benefit of importing buildings that will just have to be fixed up later? Might as well do it right the first time!

11 Likes

I have a slightly different perspective - whilst I think it would be ideal if buildings were moved, orthogonalised and split before being added, having the rough outlines there can be useful.

I often split a semi-detached house or terrace and link it to the street at the same time using the terracer plugin. By matching these split parts to UPRNs the postcode can be found, and then by filtering on this postcode, the street can often be added to other buildings on the same street.

There are also some limitations of the tooling that may be contributing to this too - I don’t believe ID has a feature to quickly split buildings, and the terracer plugin in JOSM struggles with odd shaped buildings. Similarly, I don’t believe Street Complete has a way to split a building on the go.

3 Likes

It does struggle which is a good reason to ensure the splitting is done when they are created.

Speaking from personal experience of buildings I should have split when I drew them and not wait until I had surveyed the housenumbers.

5 Likes

The ‘Split Objects’ thing in the UtilsPlugin2 plug-in is useful for this.

I agree about the slop as well, the outlines are often hilariously wrong.

4 Likes

A different source, but sometimes similar effects occur in OS OpenMap Local. Sometimes single blocks where complex separate buildings exist. Way: 1367948873 | OpenStreetMap

I only found out by accident after @JassKurn and I decided to map the same unmapped area at the same hour and day and clashed. :grinning_face:

Yikes, I know a lot of older buildings were traced from OS openmaps… but there’s no reason to trace such imprecise buildings in 2025 with better aerials and cadastral parcels…

3 Likes

I suspect the OS data is intended for cartographic use and not spatial analysis where a representation of reality is important. Or it is really old and has not been subject to any qualitative improvement. Doesn’t really matter as the outcome is the same for OSM.

Splitting a building in iD is more of a workaround than a real function. Just draw the new building parts but one yourself, reuse old nodes wherever it makes sense. Then disconnect the original building from the new ones and move or delete superfluous nodes from the original building to form the last building of the now split terrace. (BTW, merging buildings in iD is literally the same just reversed)

As far as I’m aware, the only possible way to manipulate a building outline with StreetComplete is by using the Address Overlay. If you put the address onto the outline an entrance=yes will be created automatically (along with any door related quests if these are enabled).

2 Likes

I was thinking about it too. I wasn’t particularly impressed with these additions - in many cases it was easier and faster to delete the building and map it from scratch. Some of the building on attached screenshots are atrocious.

However, I’d rather have Microsoft improve the AI tool in following ways (comments welcome):

  • Ignore more complicated buildings (e.g bent, or very irregular) or unimportant buildings garages/sheds and focus on simple and obvious cases (easily identifiable detached, semidetached and terraced houses)
  • Map them with a simplified geometry (simple rectangles, ignoring all the extensions, garages etc)
  • Split the semidetached and terraced houses.
  • Tag them at least as building=house

This would be a useful base for mappers to build on and would open up possibility for adding addresses, POIs etc. I noticed having an “unfinished” map tends to be quite motivating for new mappers to join and add a few more details, so I would rather err on having fewer better buildings than more low quality ones.

4 Likes

Tagging AI geometry as houses risks tagging being wrong which is worse than missing. Missing Maps tells its contributors to use building=yes for that reason.

3 Likes

Unfortunately, this wouldn’’t really work when the imagery is crustier than an asda smartprice pizza… in the middle of a large city with crisp quality imagery maybe.

And if we’re just asking the AI to map the simple rectangles, you may as well just draw those in yourself… and split them yourself while you’re at it.

3 Likes

@Wynndale, @ceirios, don’t underestimate AI - it can work well if trained correctly and used in a way that supports manual mapping and not trying to replace it 100%. Right now it is calibrated for recognising all the buildings, which neither needed nor useful and it prevents actually useful features from being added in simpler cases.

Look at this screenshot - if the model put more effort in 80% of the simple cases and ignored the 20% with likely poor certainty scores that would be a whole different picture.

As for usefulness - everything can be mapped manually and I’ve done my share of that too. Would I like a simple GUI with decent candidates, which I can approve/reject with a single click - yes. Thinking about that, I could easily add tagging myself (in bulk). But splitting semis would be great as it is more involving.


However, Microsoft may have a different idea for their AI tool. I wouldn’t be surprised if their use case was:

(hypothetically) OSM is still missing many buildings and progress is slow/uneven, so let’s just strip out the OSM database from all the buildings and replace them downstream with an AI generated layer that shows lower quality buildings but with much higher coverage.

If that was the case, then (1) we shouldn’t import them, (2) it may be difficult to convince Microsoft to alter their algorithm, as the current one serves this hypothetical goal well.

1 Like

That was exactly what Facebook (also part of “Overture”, like Microsoft) did with it’s “Daylight” map, wasn’t it? I can see the logic from the point of view of someone who “just wants to show that it is a built-up area” in maps in their products.

1 Like

I am cleaning up one of these Microsoft imports in two places I’m working on. I see garages tagged as houses. Shops tags a houses. It seems everything gets to be a house.

1 Like

I would much prefer building=yes than building=house. Canonsburg, which I’m fixing up, seems to be completely made up of building=house even when it’s a garage, or a shop, or terrace.

10 Likes

This note was created by a new mapper who had the sense to give up when asked for the number of an unsplit terrace.

2 Likes

StreetComplete users (including myself) really rely on accurate and up-to-date building polygons to survey addresses correctly. Even something as simple as a single missing house can confuse everything and leave you standing at the end of the street with one more housenumber and no house to put it on! I’ve surveyed areas in St Albans with really old building=terrace polygons and I just ended up dropping address nodes wherever I saw housenumbers because it was too hard to work out which building polygons covered which houses. StreetComplete is designed to be easy to use for new contributors, but we need to provide good quality building data to ensure the quests they get actually make sense!

3 Likes

I was also thinking about ways to do QA on buildings like this (and others not tagged with source=microsoft/buildingfootprints) and was wondering how to identify buildings that cross multiple cadastral parcels - i.e. likely need to be split.

@rskedgell - I saw in your methodology for importing UPRNs and postcodes that you’re matching INSPIRE polygons to buildings and importing only when there’s a 95% geometry match. Would it theoretically be possible to identify the opposite? For example, building polygons where less than 75% of the geometry is located within a single INSPIRE polygon, or something like that? This could help to identify buildings that need to be split which might otherwise be missed by the inevitable ai building cleanup.