Is there a comprehensive list of map features that can/cannot intersect with buildings?

Background - I’m already working with public Estonian address dataset where I have set up a monthly script that updates changed addr:* tags for buildings. As a byproduct I get list of building footprints that are not in OSM. Some of these buildings have already been loaded to OSM through a manual conflation process with JOSM, in small chunks. But going forward, there should be an automatic process that takes building footprints and assesses whether they can be imported without manual conflation.

So, are there any validation rules that state with what kind of objects buildings cannot intersect with?

The naive solution would involve taking all ways/nodes around the building area and checking if they are inside the building or if they intersect. However, I am not satisfied with this approach since landuse/boundaries/power lines/leisure ways can intersect with buildings and POI/addr nodes can be located inside buildings.

The preferred method would be to identify a list of objects that buildings should not intersect with:

The obvious start would be from other buildings by using building tag.

Secondly, here’s a quote from wiki:

What will happen if building present in you dataset is already mapped as man_made=storage_tank without building tag?

It turns out the man_made tag has complex usage patterns. In most cases, it should not intersect with buildings, but there are exceptions. Here is a list of values where it can intersect: clearcut, cutline, quarry, wastewater_plant, water_works, works, embankment, courtyard.

highway tag - this is fortunately straightforward. Although highway=proposed is one exception.

barrier tag - I assume there shouldn’t be any exceptions with it?

natural tag - fairly complex tag. By default, it shouldn’t intersect, but with values like wood/scrub/wetland/grassland/coastline/sand/beach it can.

I assume this list isn’t complete - am I missing anything significant? Or are there better methods to validate whether a given building footprint might cause any issues with existing OSM data?

Other mandatory details when the word “import” is mentioned:

Source data quality - is already very high. In addition, I have additional methods to verify that I’m not importing non-existing buildings.
Licence - dataset licence is compatible with OSM.

I am currently working on something similar, so I get the problems you are having.

Generally, the ground that a building footprint sits on should be relatively empty but there are some things that are fine like highways, railways, powerlines. Etc.

However, you could also look at it differently, it does not matter if you wait your import for when the ground has been cleared. It has to be done anyway. So, ether you wait for a mapper to clear ground and then you import, or you just import all new buildings and wait for a mapper the clear the ground, the end result is the same.

The real problem are buildings that are already imported but that changed over the years. I see many cases of where a new building would get imported over another building because that building is out of sync.

The problem with updating existing buildings in OSM is that a lot of these buildings have modifications like building passages, entrances, pedestrian squares etc. attached to them. This makes automatic updating of existing buildings hard.

I agree, there are situations when imported data with minor issues might be far more valuable than no data at all. However, in this specific case I’m looking to find a way to distinguish building footprints that can be automatically imported without any kind of potential issues. Since buildings imports are fairly common I’m hoping I’m not the first person who has encountered this.

If the external dataset has high quality and is updated regularly you could monitor the changes. This might be helpful in detecting demolished buildings or ruins. Or if the external dataset has stable id’s/ref codes, it can be very valuable as well.

Exactly, over time this problem will only become more significant. One solution could be to break it down into smaller parts. Working with smaller subsets might be helpful in reducing complexity and eliminate certain edge cases. Then it’s easier to assess if a particular problem can be handled by automatic conflation process or does it need manual review.