I got confirmation over email regarding that this dataset was within the public domain. It contains the addresses, street names, apartment numbers (whenever applicable), and zip codes of buildings within and slightly outside of Williamson County, Texas. The only warning is a data accuracy advisory on the website I’ve linked.
I’ve got a general plan laid out so far:
Automated imports could be possible, but as far as street names to buildings. Individual house numbers may be problematic if just blindly done.
This dataset contains 388K+ entries, which will take forever to import. I was primarily thinking about importing individual neighborhoods rather than everything.
Dataset is official and even contains records of houses that are yet to be built.
I still need to find a way to make sure what I’m importing is what it actually is. I would use Bing Streetside, but the captures are just low quality enough to either be barely visible or completely unintelligible. I would use OpenAddresses, but I get parsing errors whenever I try to load it into editors, and none of the other download options seem to work.
I’m open to criticism/questions regarding this, as I’m still fairly new and this is my first import proposal. But I’m unsure if I’ll go forward with this, simply due to the fact I can’t find anything reliable (that isn’t copyrighted) to cross-check addresses and ensure that they are correct.
Yes, it is a good idea to split it up. You might consider doing so by street+city as this can make it easy to spot some types of errors, such as if all the addresses fall along the given street except for one that is miles away. Also, sometimes there are slight differences in how a street name is spelled in the addresses and how it is in OSM, and this would allow you to research the proper way to proceed and deal with all impacted addresses at once.
Just because the data is official doesn’t mean that it isn’t without error. Once you convert the data I would perform some sort of sanity checks on it, both to check the original data, as well as to make sure something didn’t go wrong with the conversion. Short of an in person visit I don’t think there is any way to “cross check” the addresses. You are looking for things that are obviously wrong: multiple addresses in the exact same location for example.
I am very aware of this, and can confirm ghost houses do exist in this dataset.
I will hand edit these into the map, as I don’t have the mapping experience (<30 mapping days) or the knowledge level to do automatic imports. Converting the data is relatively easy though, and the toughest part would be navigating between editors to get info.
Thankfully these kinds of errors are noticeably obvious upon just glancing at the map.
This node reports “120 Diorite Drive”, but it only goes up to 116 Diorite Drive. It also out of bounds from a recognized house area.
If you are “hand editing” these one by one then this probably isn’t an import, and if so, you don’t need community approval. However, your approach sounds slow, tedious, and error prone.
I have assisted with data conversion for several address imports, and it is not “relatively easy.”
I would contact the GIS folks that manage this data and see if they have an explanation. This could be a legit address for some utility infrastructure, and as such, may be appropriate for inclusion in OSM. Addresses are not just for buildings.
Not always. For example, are you going to notice if the postal code for one house in the middle of a block is different from that of all of the nearby houses? Will you notice if a totally bogus postal code is used? What about two addresses at exactly the same location (yes, if you are truly doing these by hand one by one)? What about cases where the postal code doesn’t match the city in the address?
One thing I would recommend is using the mapwithai JOSM plugin. It will flag cases where the street name in the address doesn’t match the name of any nearby street.
Do addresses always show up in Bing street imagery? The OP stated that they often are not (" but the captures are just low quality enough to either be barely visible or completely unintelligible"). If the address is visible and readable, Is the complete address posted, or just the house number? It is not always obvious or logical what the street in the address should be, nor what the city or post code should be.