Proposed import of DeKalb County Georgia address points dataset

Hello,

There’s a rather large miss in the Atlanta area for OSM based routing through e.g. OsmAnd due to the lack of address point data outside of specific points of interest. The Atlanta Regional Commission has an open data portal where various mostly county and local governments can upload data such as address points. To start with, I’m proposing to import DeKalb County’s address point set. Assuming this goes well, I’ll plan to propose imports for Fulton and other neighboring counties of the Atlanta area as well. Due to not necessarily having all buildings be in place, I’m also suggesting importing as nodes rather than areas/buildings.

I am admittedly not the most familiar with this process, but based on other proposals, the plan would be to create batches of the data (~400,000 points for DeKalb alone) and ensure the address information lines up with the streets each point is near. Notably what I’m not seeing is street name suffixes like “Southeast”, which would need to be corrected based on the nearby roads. Deduplication based on existing nodes for POIs with address info would also take place. Import would be done through a separate user account to keep things easier to follow and correct.

If there’s anyone especially in the Atlanta area that would like to help, I wouldn’t be against it. Or if someone sees a potential issue with using this data I would love to understand.

1 Like

Unfortunately it looks like this dataset doesn’t indicate its license. Do you know if the commission has indicated the license or copyright status elsewhere?

Do any local governments have building data that you could import at the same time, to avoid the inconvenience of conflating the two datasets after they’ve been imported into OSM?

Do you know if the commission has indicated the license or copyright status elsewhere?

The third section down implies CC-BY-4.0, but I’ll ask just to make sure. I’ll get a response in a few business days.

Do any local governments have building data that you could import at the same time, to avoid the inconvenience of conflating the two datasets after they’ve been imported into OSM?

It would appear so for DeKalb County (CC-BY-4.0) as the source, though I do foresee some difficulty in aligning the address nodes with the buildings. The building footprints do make separate footprints for homes and decks/courtyards/etc. based on a few footprints I’m personally familiar with, but somewhat inconsistently and with no metadata to understand what the footprint is for based on a cursory glance. Atlanta does generally have well defined building footprints with a few holes here and there, though I’m not against trying to join the two together. My main question would be whether the standard in address tagging is to use nodes or buildings/areas. Nodes would, in theory, be easier to keep up to date I would think, but reference tagging should solve the issue regardless, assuming the county doesn’t create a new set of IDs on data updates.

OK, CC BY is compatible with our license as long as they agree to a waiver.

Either is acceptable, but there are tradeoffs, especially if you’re interested in improving navigation.

Most addresses officially represent parcels, which we don’t map, so points are technically a fine simplification. You’d probably want to find out whether the dataset tends to put the point over the rooftop, at the end of the driveway, at the mailbox, or simply at the centroid of the parcel. Addresses at parcel centroids can cause unpredictable navigation problems, like navigating to the street behind a wooded lot, past a creek or other obstacle, since a parcel can take any shape.

Tagging an address on a building may not be as technically semantically correct, but it’s a pragmatic choice that matches what users usually want when they search for an address. Most imports have attempted to join parcel datasets to building datasets. Otherwise, if the address points aren’t guaranteed to be on the rooftop, then conflation isn’t going to work well anyways.

2 Likes

Tagging an address on a building may not be as technically semantically correct, but it’s a pragmatic choice that matches what users usually want when they search for an address. Most imports have attempted to join parcel datasets to building datasets. Otherwise, if the address points aren’t guaranteed to be on the rooftop, then conflation isn’t going to work well anyways.

Looks like the data is split between primary addresses that appear to be on top of buildings for the most part, and secondary addresses to demarcate specific units that can be anywhere nearby but at least sometimes seem to be near the unit. I’ll need to do a bit more research into the dataset to verify that this is the intended meaning.

In any case, this leads me to think secondary address types should be treated as a separate set of batches from primary addresses.

It does appear that much of the downtown area in Fulton County has bulidings with addresses, so it makes sense to me to try to join the two datasets for DeKalb County to keep with the same flow. I foresee at least 4 sets of batches from this then:

  1. Primary addresses attached to specific buildings
  2. Primary addresses that can’t be attached to a specific building requiring more QA
  3. Secondary addresses, e.g. apartment units, as nodes
  4. Addresses that can’t be determined what they should be and create notes(?)
2 Likes