San Antonio Import

ATZ410 · August 14, 2023, 11:09pm

Hi All,
I am currently looking to import MS buildings + address into the Bexar county area. Currently I am trying to get a plan on how to do this so I have not written a wiki page yet. So far this is my plan:

Update existing osm buildings with address points that intersect them/are the closest point to a building that has no tags (if there is a conflict manually review data for conflicting info)
Import all ms buildings that do not intersect any existing osm data (areas/lines) and add addresses to them like above.

Since there is so much data I am currently working with postgis to do the updating/merging. I am using the snapshot schema in my db since it is lossless. I did see user data is missing (possible from datasource?) is this necessary to update objects on the osm side? If so what other data is needed for updating an object?

Any suggestions to how this should be done would be appreciated.

Best,
ATZ410

tekim · August 14, 2023, 11:38pm

I had a role in the Denver building import. You can read about our process here:
https://wiki.openstreetmap.org/wiki/Denver_Planimetrics_Import

It was based in part on the Los Angeles building import:
https://wiki.openstreetmap.org/wiki/Los_Angeles,_California/Buildings_Import

watmildon · August 15, 2023, 1:12am

If you work through the buildings first, you can utilize various tools in JOSM to make adding the address data a lot faster. I’ve written a lot about address additions from this dataset in my OSM Diary. Here’s one example you may find interesting: watmildon's Diary | Using the JOSM Conflation plugin to add 1500 addresses in 10 minutes | OpenStreetMap

I’m sure there’s fancy QGIS tools too but may also rely on having good+accurate building footprints first.

ATZ410 · August 15, 2023, 1:48am

I did see that, the issue is that San Antonio has about 600k addresses. I got the suggestion to add addresses separately from the buildings as different imports as it would probably take quite a bit to get the data where it needs to be for an import

Minh_Nguyen · August 15, 2023, 2:24am

The Microsoft Buildings dataset doesn’t contain any user-contributed data from OSM, so you’ll need to merge the datasets, for example using the conflation plugin mentioned above.

Imports commonly perform these two steps in the opposite order. One advantage is that, if you discover that a building no longer exists in reality, you don’t have to review it twice. Another is that you can use process of elimination when conflating with the existing buildings. On the other hand, your proposed approach would avoid the case where an existing OSM building incorrectly intersects with a building in the Microsoft dataset, preventing it from getting imported at all.

ATZ410 · August 15, 2023, 2:48am

So I was referring to the osm file I downloaded from geofabrik.de (which has all data for Texas). Loading the data into a DB locally in postgis, I saw all the user ids were 0 and the user table is blank.
Also while looking at the data I saw the some of ms buildings intersected. I was thinking of resolving this programmatically where I would either keep the larger of the two objects or try to merge them somehow (haven’t gotten that far). Maybe for conflicting data, something like tasking manger can be used instead of trying to merge things that have conflicts. Maybe keep the larger object but tag it as suspect?

ATZ410 · August 15, 2023, 2:56am

What tools did yall use to edit all that data (combine geometries, merge addresses, resolve conflicting data, etc)?

ATZ410 · August 15, 2023, 2:57am

Meant to reply here

Minh_Nguyen · August 15, 2023, 3:22am

Talk about a multiple tab snafu.

watmildon · August 15, 2023, 3:23am

The two most common tools are QGIS (some with custom python) and JOSM (lots of plugins). If you haven’t seen the import catalog, it’s worth reading through some things there to get a sense of what the “state of the art” is. Import/Catalogue - OpenStreetMap Wiki

Minh_Nguyen · August 15, 2023, 3:24am

OK, let’s try this again, without remnants of a very old discarded draft post…

Geofabrik omits any user identifiers from its publicly downloadable files to avoid running into problems under the GDPR. If you really need user names and IDs, it’s available to logged-in OSM users. However, I’m not sure it would be useful to your import. If you intend to verify in step 2 that it isn’t a building you’ve already touched in step 1 by checking the last-changed user ID, it would introduce a reasonably realistic race condition in which someone else edits the building in the meantime.

Oh, intersections between two buildings in the Microsoft dataset? I’ve seen this happen before but didn’t hear back from the RapiD/MapWithAI team when I reported the issue. Hopefully it’s very rare.

ATZ410 · August 15, 2023, 2:42pm

Oh thank makes sense, I’ll probably download the data with the user data just in case.

Yeah when I was going through the ms buildings there were a l good amount that had overlap.

tekim · August 15, 2023, 5:10pm

As I recall:

PostGIS to convert to OSM and split into “tasks”, and add address information to buildings
Additional processing with custom Python program to deal with multipart buildings. Your input data probably doesn’t have multipart buildings.
Tasks where there were zero existing OSM buildings were uploaded automatically at this point.
Created tasking manager project with one task per file generated in step 1.
Osm mapper would work on one task at a time using the tasking manager and JOSM. If there were duplicate buildings between input data and OSM, they would be handled with the replace geometry function in JOSM. This preserved the history of these objects.

SherbetS · August 15, 2023, 7:27pm

@ATZ410

If you’re having trouble loading the data into JOSM but still wish to use it’s tooling, The file can be split into smaller chunks.

The full file I generated from the data you had earlier was 191.8MB, which you said you weren’t able to load due to running out of RAM

But after cutting out just the northeast of the city, I was able to load it into JOSM with default RAM allocation.

Try it out yourself and see if it works. I saved another file with this extract.

Mashin · August 15, 2023, 7:38pm

I imported buildings for Connecticut and for data manipulation I used python packages GeoPandas and Shapely that allow basic geometric manipulations.

My workflow was something like this:

Use QGIS or JOSM to convert building data to GeoJson (or Shape file) format.
Load into GeoPandas using Fiona package
Use GeoPandas to fix tags and geometry
Split dataset into smaller fragments and export as GeoJson files.
Upload all files to Google Cloud and create a project in Tasking manager pointing to the files stored in Google cloud.
Use JOSM to work one task at a time and used Conflation Plugin or Geometry replace to resolve conflicts with existing buildings.

I don’t think that there is a tool to do the conflation programmatically and preserve the history of the object at the same time. But you could go the route you suggested and upload first all buildings that don’t overlap with exiting OSM data and solve the conflicting manually in JOSM.

I personally would do this:

Use this Overpass query inside JOSM to download all existing buildings and save them as GeoJson (You will need OpenData plugin if I am not mistaken).
Load them, as well as your polished import dataset into GeoPandas.
Filter out buildings that overlap between the two datasets.
Use upload script to upload the non-conflicting new buildings. (I’ve never used this script, but worst case you can use JOSM)

Just in case links to imports and Github code:
import1, import2, Github

ATZ410 · August 15, 2023, 9:05pm

Ah thanks! I’ll try it out

ATZ410 · August 15, 2023, 9:28pm

Regarding the above overpass query, would I also need the nodes/meta data to update the tags or just the way itself and its data?

Edit: Nvm I misunderstood the query.

ATZ410 · August 15, 2023, 10:25pm

So I noticed when I save the file as a geojson, it loses the metadata (osmid, user, timestamp etc). Is this how you edited existing osm buildings or did you import the data into python a different way?

Mashin · August 15, 2023, 11:03pm

I always modified existing OSM data in JOSM, because I couldn’t find a way how to do a lossless conversion between OSM format and shapely geometry format.

But if you only want to update tags you could make it work with some of the existing packages.

osmapi (OSM API package)

And you could use on of the overpass API packages for selective download of buildings:

I don’t know how compatible their data formats are so you might need to write your own data transformation between overpass → osm api datastructures.

tekim · August 15, 2023, 11:48pm

If there is going to be any modification of existing OSM data I would highly recommend doing so in JOSM as @Mashin suggests. This is what the Denver building import did. @SherbetS had a good suggestion as to how to deal with large amounts of data in JOSM. In addition to losing the necessary metadata, if too much time elapses (e.g. more than an hour) between when the data is extracted from OSM and when you upload changes, you run the risk of conflicts (because someone else has modified the data after it was extracted from OSM).