Salem Oregon Address Import Proposal

Salem Oregon Address Import

My fellow Americans, I am proposing to import the Salem, Oregon Primary Addresses dataset, sourced from City of Salem GIS.

Documentation

This is the wiki page for my import:
https://wiki.openstreetmap.org/wiki/Salem_Oregon_Address_Import
This is the source dataset’s website:

(The data download is available there)

This is a file I have prepared which shows the data after it was translated to OSM schema:
Address/downtown-sample-changeset.osm · main · zyphlar / salem-import · GitLab (edited)

License

I have checked that this data is compatible with the ODbL.
This data is distributed under Public Domain (no copyright) as per a screenshotted email from Travis at the City of Salem GIS.

Abstract

  • The dataset contains address POIs for everything in Salem and West Salem proper, excluding Kaiser, Hayesville, Middle Grove, and Four Corners. Apartments and suites are tagged separately.
  • It’s 92MB uncompressed.
  • The data has been processed via QGIS to be 24MB. We’ll then import the processed GeoJSON via JOSM.
  • Tags have been translated according to the filter/processing steps in the README.
  • Since these are just address POIs, we’re simply deleting any new conflicts (duplicate addresses) as they’re obviously unneeded.
  • I plan to either do all the work myself, or segment the work and use OSMUS Tasking Manager. It’s only one city with quite clean data, however, so DIY shouldn’t be hard.

Are you intending to add the address info to the buildings, or as standalone nodes? I couldn’t figure out from the wiki page.

I am not, in other imports we discovered that conflating building and address was mostly cosmetic for, say, single family homes, and quickly got into trouble with strip malls and existing buildings. This would just be an import of all address POIs minus any conflicts.

There is some precedent for importing buildings with merged addresses, however the vast majority appear to be KiloCrimson’s Microsoft (MapWithAi?) not-imports so I wouldn’t say there’s important local consensus. Way History: 797144359 | OpenStreetMap

Portland also has lots of conflated addresses and houses, but both of these examples come from building imports, not address imports: as far as I can tell, most buildings in Salem are already imported, which would introduce tons of delay to conflate decently.

I tried downloading the above file, and it does not appear to have been translated to the OSM schema. Perhaps I am missing something.

My mistake, the .OSM file was right next to that one and I misclicked.

No problem. I will try to take a look.

It isn’t purely cosmetic. At the end of the day, an address in the U.S. is an attribute of something rather than a location in its own right. I suspect (without any local knowledge) that there isn’t anything particularly unique about how Multnomah County addresses parcels or buildings that would require keeping the address points separate, unlike in, say, Queens, New York.

OSM-based geocoders aren’t currently set up to automatically conflate addresses with buildings when you search for them. It isn’t clear that such automatic conflation would be any more accurate than a more manual one at the data level. But you’re right that there needs to be a lot of care when it comes to strip malls and any other multitenant building. If you aren’t ready to handle those cases, then building conflation would be counterproductive.

Most documented address imports in the U.S. have merged the addresses with buildings – or to put it another way, have tagged buildings with their addresses rather than mapping address delivery points in their own right. However, if there’s a particular goal you have in mind for addresses as points, such as a later conflation step or a later campaign to turn some of the address points into points of interest, then it isn’t wrong to keep them separate either.

If anything it’s just a desire to make the map usable for average people as a Google Maps alternative before the West Antarctic Ice Sheet collapses rendering the entire project moot. Last time I imported all address POIs in The Villages Florida separately, it took about a day singlehandedly with basically no downsides besides such geocoding edge cases (“I really need to know what kind of object this address is associated with but don’t want to do a complex query!” versus “how do I get to my aunt’s house?”). Last time I conflated addresses and buildings, that import still hasn’t finished resolving the overlapping building part of the task: it’s painful and slow, because buildings are incredibly complex (what happens when the address node is hovering above a garden shed, or a driveway.)

I’d be happy to go in and merge buildings and addresses manually neighborhood-by-neighborhood or even include merging as a step in JOSM during the import, but we have to be honest here: 95% of the beneficiaries of such an action will be tract homes. We had this debate in Santa Rosa CA with that import, but again of all the problems to solve on OSM do we really care about whether a single family home is a rectangle and a node, or just a rectangle? Coverage is far more important than support for nuanced queries on unimportant buildings IMO. We can get an entire city worth of coverage in an afternoon, or we can hand-curate the details of suburbs and take months.

2 Likes

That’s a good point – it’s only a good idea to conflate addresses with parcels before conflating with buildings. That way you can apply heuristics like “biggest building on the parcel” or “closest building to the street” without running into as many edge cases. If you don’t have parcel data, then forget about conflating with buildings.

And even with it (for Sonoma County) it’s still incredibly error prone and an uphill slog. Buildings are built across parcels, the biggest building is sometimes a barn instead of a house, etc. Ultimately for high quality data like this where you’ve got someone hand-placing unit number nodes inside of high-rises, I think it’s best to start with that and continue reworking things as the micromapping dictates. We can make sure Auntie’s house and address are merged right around the time Amazon maps her twelve foot driveway, her suburban sidewalk, and her lawn :wink:

In the meantime this approach is immediately effective for the highest importance objects like a strip mall or medical office complex, because we don’t immediately get bogged down in the underlying complex building and 10000% more people need to navigate there than Auntie’s house. I used to be a “one object one element” purist but that’s before I spent an entire afternoon just trying to conflate a warehouse district’s buildings. In the end I’ll probably have to go back to Sonoma, query for all the addresses that fell through the cracks, and just create centroid nodes for those anyway. (With manual review, ofc, since it’d be quite dirty.)

I’m a somewhat active and somewhat experienced OSM contributor and also happen to work with the city and can also try answer any questions you may have.

Nice! Don’t really have any questions for the city, Travis has been great, my big questions are for wider Oregon like Marion and Lake counties. AFAICT Lake County tries to copyright their stuff leaving Eugene high and dry.

The address database is the regional address database for emergency services. I can’t vouch for outside the city, but the roads and addresses are in active use by 911 for Lincoln, Polk, and Marion. In the city, our tech puts the address on the building footprint, usually near the center of the building that is the dwelling where people live (except the weird apartments and high rise condos). We have buildings in our impervious surface layer which lags about a year behind. I can try to check tomorrow about the specifics of placing points, when, how, and where etc. I’m curious myself

They’ve done a great job! No problems there. But yeah coverage stops at the city limits. Four Corners, Hayesville and Keizer are blank, so unless those jurisdictions have something special I’d have go to down to the county parcel dataset for that which is never quite as good (unit numbers, placement, and “oh five parcels are one address / there’s five addresses on one parcel”, etc)

Oh, ok. Well let me check on the stuff outside the city and see what I can find out.

FWIW, the same person maintains lincoln, marion, polk, but we only get address updates for 911, we don’t get any asbuilts or plans to show us where the address point should be so we just have taxlot and situs outside the city. if there’s a visible building in the aerial, the point will go on there, however, while we do update the regional 911 database we probably aren’t going to share outside city limits.

tldr, just stick to the city, get the rest from surrounding counties

Thanks! That’s what I tend to do anyway.

As it’s been 14 days, I’ll begin this import.

How is it going? Did you finish? Any funny gotchas?

99% done! I’m leaving the last apartment complex for a friend who’s interested in maps and OSM to see what an import is like.

Only gotcha was that the JOSM validator was less aggressive at deduplication than I’d like (it seemingly ignores buildings and POIs with addresses by default, and only keys in on standalone address nodes? Which is confusing because I swear I’ve seen a duplicate address warning for plain buildings…) so I made a patch and PR for it to optionally highlight ALL identical addresses in the area even if they’re a Jamba Juice.