Proposal import all addresses in Cook County, IL, USA

Hi, I’ve been looking through the open data available from the Cook County, Illinois government and found a mapping of addresses to lat/long in the public domain. Adding addresses to Cook County data would be very helpful since right now, many suburbs have very sparse address nodes, if any.

This wouldn’t be a full data import. Instead, I’m planning on importing address nodes to JOSM one suburb at a time, and then merging them into the OSM data a few blocks at a time, with each change set consisting of a few blocks (size will depend on how long I have to work on this during that day.), while looking at the Bing Aerial imagery to make sure everything looks correct.

I’m expecting this project to take around 6-12 Months to complete.

Data: https://datacatalog.cookcountyil.gov/GIS-Maps/Cook-County-Address-Points/78yw-iddh/about_data

Data Transformation:

  1. Load the address data into Python, one suburb at a time.
  2. Query OSM Overpass API for all nodes in that suburb.
  3. Remove all addresses from the address data that exist in OSM.
  4. Write the addresses that haven’t been removed to geojson, for import into JOSM.

Here are how I will map the properties from the government data to OSM tags:

    "addr:city":        input['Inc_Muni'],
    "addr:housenumber": input['ADDRNOCOM'],
    "addr:postcode":    input['Post_Code'],
    "addr:street":      input['STNAMECOM'].title(),
    "addr:state":       'IL',
    "addr:county":      'Cook',
    "addr:country":     'US',

Since I’ll be copying a few blocks at a time, I’ll probably find other filters I’ll need to do, or existing nodes I’ll need to flag to modify them.

Please let me know if there are any issues, if you need more info, or if you have any suggestions.

1 Like

Hello! I am excited you’re planning to work on one of the biggest “holes” in the OSM data for most everyday consumers: addresess. I’ve added loads of addresses myself so am happy to answer questions about tactics/strategies/data etc.

Some things off the top of my head. Cook County addresses are already part of the National Address Database. ESRI has periodically taken NAD (done some polishing) and dropped it into a feed for consumers. It is super super easy to pull these down via Rapid or the MapWithAI plugin if JOSM is your jam. The nice part about going that route is that you get, generally, really clean addr:street name expansions and a few other QA checks. The downside of course is that is it will lag the county dataset.

I’ve written a lot about how I go about deduping/conflating/importing. You may find some of that helpful. I’ve also recommend setting something up on a tasking manager to help keep track of your progress and encourage other folks to join in. I have one for the greater Phoenix area that is going in fits and starts.

General thoughts on address addition from NAD

JOSM and conflation

Secondarily cleaning up TIGER tags

The Phoenix Address project

1 Like

Small addition… Generally US addresses on OSM don’t include county. There’s somewhat of a split about whether addr:country is helpful. I don’t add it but some folks do.

Yay, addresses! Agree with @watmildon on not including addr:county or addr:country. Since I’m guessing you’ll get widespread agreement in principle to go forward with this, it would be helpful if you did the data transformation so that we could judge how that processed data looks, rather than the raw data, before uploading. A couple things on that:

  • It looks like some addresses have data in the subaddid field, which should be mapped to the addr:unit tag.
  • Some of the STNAMECOM values are not fully expanded, for example I see at least “PL” in a couple spots where it should be “PLACE”. I also see values like “WEST ST JOSEPH AVENUE” or “LAWRENCE ST E”, both of which will need to be expanded further (Saint and Street respectively, hooray for ambiguity). Also, using only str.title() will get you 10Th and 3Rd on ordinal numbers, so you’ll need to handle that. And names like “MCDONALD” will come out “Mcdonald”. Feel free to copy my code for handling some of these cases.

I think most folks will want a little more detail on how you plan to query and match existing data to these new addresses, as I think that will be the most challenging part of this import. You should also make a Wiki page to document your process and potentially help others in the future!

1 Like

@watmildon thanks for the information! I wasn’t originally going to do the building outlines, but with the rapid data I’ll do the outlines too. This is my first time doing a major mapping project, so there are a lot of data sources I don’t know about (I’ve only done corrections/additions to local data up till now).

I loaded the NAD data in JOSM. It looks like the NAD address locations tend to be closer to the middle of each property, and sometimes don’t intersect with the building outline:
image
While the Cook County open data tends to have the addresses close/over the main building.
image
I think this might cause issues when merging the outline with the address.

I really like the process you wrote about with the Conflation plugin, but I need to do a little research/proof of concept to see which method I want to go with.

Also, I didn’t know about the project page. I’ll create a project after we settle on an approach.

@whammo Thanks for the information and the code! I’ll create a GitHub repo, with the process and showing what the data will look like before I start doing anything.

When you say Wiki page, is that something that would go into a Diary, or is there another page (sorry, I’m new to projects. I’ve only used the Wiki for looking up tag information)?

Also, I’ll remove the County tag.

If you’re working from the county data source it’s pretty useful to get a semi-permanent record of it on the wiki. This will help other folks know where the data came from and whatever modifications you made so they could (in theory) verify and replicate your project. An example is this one for Milwaukee.

The conflation plugin is great. Really helps identify the super obviously correct placements vs ones that need more assistance. It’s super common for datasets to have address info set at the parcel center and not building center where we like things. If you get a good rhythm you can make a lot of progress quickly but it’s definitely easier if the building footprints are already on OSM.

For buildings, the county may have a better dataset than those in Rapid which are computer generated.

Hello @chudified,

This sounds like a good project that will benefit OSM.

Could we have access to the fully converted data (in .osm format and with the OSM tagging)? If you have already provided a link, I apologize for asking. In many cases government data isn’t of high quality, and sometimes, despite the best intentions, the data doesn’t get converted in a satisfactory manner (abbreviations should generally be expanded, names should be in title case, etc.).

Hi @tekim, I haven’t posted the data yet, but I’m going to change my approach based on the feedback from watmildon and whammo, so it’ll take me a little time to post data. Is there a platform you typically use to share data?

I love the project and hope there will be similar projects for the rest of the state!

County: the data it is already there, and will be helpful for some use cases (for example, given an address - look up county). Keep it?

One request - would it make sense to save info about address import source?
I scanned Addresses - OpenStreetMap Wiki and it does not look like it suggests a field to capture it.

(3) Merging addresses:

  • How are you going to choose who is right NAD vs existing ones in OSM?
  • If that’s a manual effort, can you create an OSM project and ask OSM crowd for help scrubbing data?
  • Is a special handling needed for interpolated addresses already in OSM?

Variation on the above: do we need any provisions for future address re-imports? Probably to express someting along the lines: NAD data was trusted vs human manually adjusted something?

Does the license info need to be saved, in case it is ever challenged or becomes more restrictive in the future?

Quality checks:
Are there tools to check that the same (or similar looking) address is not defined multiple times? I had decent experience using address search in OsmAnd (but that assumes that the address data is already OSM, OsmAnd has latest data and the search is kind of local).

Again, this is an awesome project, hats off to you!

Thanks!

I’m not sure if I understand the benefit of having source information. We’re trying to make the locations as accurate as possible, and source data also changes periodically. I’m not sure that having it tagged would be actionable.

Here’s the approach I’m working on, which I’ll show you guys when I have some sample data/code (real life has stopped me from spending as much time as I’d like on this).

I’m thinking of this project as a few steps:

  1. Generate a list of existing OSM addresses that need to be fixed (missing information, abbreviations, etc…).
  2. Use the list to fix the addresses in OSM (if this list is huge, this step can be postponed till after step #4). I can start a project for this if others are interested in helping.
  3. Create map data with building outline and addresses for all addresses. Remove any addresses that are already in OSM, and any addresses found in #1 with data issues (if we skipped step #2 and an address can’t be matched, then remove all addresses within a certain radius, to be safe).
  4. Import sections of the data created in step #3 into JOSM and merge it into OSM block by block while looking at Aerial Imagery. This way there’s a person looking at all the data, and (hopefully) catching any issues before they make it into OSM. Multiple people can be involved in this step if there’s interest. The key here is to only insert data and not update it to make this an agile, streamlined process.
  5. Find any issues where additional input is needed (any addresses that couldn’t be matching in #1, any addresses matching addresses that are far away from each other in OSM vs the source data, etc…).
  6. Research/update OSM using the information from #5
  7. Repeat

I’m working on #1 and #3 right now. My goal is to have step #4 done as soon as possible (I want to use Organic Maps for offline navigation! :slight_smile: ) If step #2 doesn’t take too long then it might speed up step #3.

I still need to add the data source information to the wiki.

1 Like