San Antonio Import

Hi All,
I am currently looking to import MS buildings + address into the Bexar county area. Currently I am trying to get a plan on how to do this so I have not written a wiki page yet. So far this is my plan:

  1. Update existing osm buildings with address points that intersect them/are the closest point to a building that has no tags (if there is a conflict manually review data for conflicting info)
  2. Import all ms buildings that do not intersect any existing osm data (areas/lines) and add addresses to them like above.

Since there is so much data I am currently working with postgis to do the updating/merging. I am using the snapshot schema in my db since it is lossless. I did see user data is missing (possible from datasource?) is this necessary to update objects on the osm side? If so what other data is needed for updating an object?

Any suggestions to how this should be done would be appreciated.


I had a role in the Denver building import. You can read about our process here:

It was based in part on the Los Angeles building import:,_California/Buildings_Import

If you work through the buildings first, you can utilize various tools in JOSM to make adding the address data a lot faster. I’ve written a lot about address additions from this dataset in my OSM Diary. Here’s one example you may find interesting: watmildon's Diary | Using the JOSM Conflation plugin to add 1500 addresses in 10 minutes | OpenStreetMap

I’m sure there’s fancy QGIS tools too but may also rely on having good+accurate building footprints first.

I did see that, the issue is that San Antonio has about 600k addresses. I got the suggestion to add addresses separately from the buildings as different imports as it would probably take quite a bit to get the data where it needs to be for an import

The Microsoft Buildings dataset doesn’t contain any user-contributed data from OSM, so you’ll need to merge the datasets, for example using the conflation plugin mentioned above.

Imports commonly perform these two steps in the opposite order. One advantage is that, if you discover that a building no longer exists in reality, you don’t have to review it twice. Another is that you can use process of elimination when conflating with the existing buildings. On the other hand, your proposed approach would avoid the case where an existing OSM building incorrectly intersects with a building in the Microsoft dataset, preventing it from getting imported at all.

So I was referring to the osm file I downloaded from (which has all data for Texas). Loading the data into a DB locally in postgis, I saw all the user ids were 0 and the user table is blank.
Also while looking at the data I saw the some of ms buildings intersected. I was thinking of resolving this programmatically where I would either keep the larger of the two objects or try to merge them somehow (haven’t gotten that far). Maybe for conflicting data, something like tasking manger can be used instead of trying to merge things that have conflicts. Maybe keep the larger object but tag it as suspect?

What tools did yall use to edit all that data (combine geometries, merge addresses, resolve conflicting data, etc)?

Meant to reply here

Talk about a multiple tab snafu.

The two most common tools are QGIS (some with custom python) and JOSM (lots of plugins). If you haven’t seen the import catalog, it’s worth reading through some things there to get a sense of what the “state of the art” is. Import/Catalogue - OpenStreetMap Wiki

OK, let’s try this again, without remnants of a very old discarded draft post…

Geofabrik omits any user identifiers from its publicly downloadable files to avoid running into problems under the GDPR. If you really need user names and IDs, it’s available to logged-in OSM users. However, I’m not sure it would be useful to your import. If you intend to verify in step 2 that it isn’t a building you’ve already touched in step 1 by checking the last-changed user ID, it would introduce a reasonably realistic race condition in which someone else edits the building in the meantime.

Oh, intersections between two buildings in the Microsoft dataset? I’ve seen this happen before but didn’t hear back from the RapiD/MapWithAI team when I reported the issue. Hopefully it’s very rare.

1 Like

Oh thank makes sense, I’ll probably download the data with the user data just in case.

Yeah when I was going through the ms buildings there were a l good amount that had overlap.

As I recall:

  1. PostGIS to convert to OSM and split into “tasks”, and add address information to buildings
  2. Additional processing with custom Python program to deal with multipart buildings. Your input data probably doesn’t have multipart buildings.
  3. Tasks where there were zero existing OSM buildings were uploaded automatically at this point.
  4. Created tasking manager project with one task per file generated in step 1.
  5. Osm mapper would work on one task at a time using the tasking manager and JOSM. If there were duplicate buildings between input data and OSM, they would be handled with the replace geometry function in JOSM. This preserved the history of these objects.


If you’re having trouble loading the data into JOSM but still wish to use it’s tooling, The file can be split into smaller chunks.

The full file I generated from the data you had earlier was 191.8MB, which you said you weren’t able to load due to running out of RAM

But after cutting out just the northeast of the city, I was able to load it into JOSM with default RAM allocation.

Try it out yourself and see if it works. I saved another file with this extract.

I imported buildings for Connecticut and for data manipulation I used python packages GeoPandas and Shapely that allow basic geometric manipulations.

My workflow was something like this:

  1. Use QGIS or JOSM to convert building data to GeoJson (or Shape file) format.
  2. Load into GeoPandas using Fiona package
  3. Use GeoPandas to fix tags and geometry
  4. Split dataset into smaller fragments and export as GeoJson files.
  5. Upload all files to Google Cloud and create a project in Tasking manager pointing to the files stored in Google cloud.
  6. Use JOSM to work one task at a time and used Conflation Plugin or Geometry replace to resolve conflicts with existing buildings.

I don’t think that there is a tool to do the conflation programmatically and preserve the history of the object at the same time. But you could go the route you suggested and upload first all buildings that don’t overlap with exiting OSM data and solve the conflicting manually in JOSM.

I personally would do this:

  1. Use this Overpass query inside JOSM to download all existing buildings and save them as GeoJson (You will need OpenData plugin if I am not mistaken).
  2. Load them, as well as your polished import dataset into GeoPandas.
  3. Filter out buildings that overlap between the two datasets.
  4. Use upload script to upload the non-conflicting new buildings. (I’ve never used this script, but worst case you can use JOSM)

Just in case links to imports and Github code:
import1, import2, Github

Ah thanks! I’ll try it out

Regarding the above overpass query, would I also need the nodes/meta data to update the tags or just the way itself and its data?

Edit: Nvm I misunderstood the query.

So I noticed when I save the file as a geojson, it loses the metadata (osmid, user, timestamp etc). Is this how you edited existing osm buildings or did you import the data into python a different way?

I always modified existing OSM data in JOSM, because I couldn’t find a way how to do a lossless conversion between OSM format and shapely geometry format.

But if you only want to update tags you could make it work with some of the existing packages.

And you could use on of the overpass API packages for selective download of buildings:

I don’t know how compatible their data formats are so you might need to write your own data transformation between overpass → osm api datastructures.

If there is going to be any modification of existing OSM data I would highly recommend doing so in JOSM as @Mashin suggests. This is what the Denver building import did. @SherbetS had a good suggestion as to how to deal with large amounts of data in JOSM. In addition to losing the necessary metadata, if too much time elapses (e.g. more than an hour) between when the data is extracted from OSM and when you upload changes, you run the risk of conflicts (because someone else has modified the data after it was extracted from OSM).