Following in the footsteps of @pmfox’s Rockbridge County Virginia Address Import, I am proposing a very similar import to be done for an area adjacent to that one. This import uses the same state dataset as Peter’s, and also uses the same scripts by @tekim used to process the data (thanks Mike! they worked great!).
While the scope of this proposed import is 3 counties instead of one, it will be carried out by a team of local Lynchburg mappers (myself included) using a shared OSM account created for this task. I have also split the data using QGIS into 400 batches of roughly 2-5 square miles- the median batch size is 115 addresses though the largest batch has 1,783 due to the varying density. This size should allow for a quick processing speed per batch, especially in rural areas where density and data conflicts are at a minimum.
The processed data can be found in this Google Drive folder, including a map for the 400 batches and their approximate size.
Here is the Wiki page- let me know your feedback, and feel free to reference the previous Rockbridge import for previously discussed issues and methodology- I have heavily referenced that thread for this import.
You are very welcome! I am glad they are being used and are of value. If you need any tweaks, please let me know. Like @pmfox 's project, this seems like a worthwhile endevor.
This sounds like a good plan vs. an “all at once” approach.
Might be better if everyone had their own import account, then if there are questions later the specific mapper will be notified of any changeset comments? But I am not sure…
I’m not sure what official standards are, but at maximum it would only be three of us. The DWG representative who approved the increased rate limit appeared to be fine with it!
As it has been two weeks without disapproval, I am beginning the import. I have completed 8 batches as a test, three of them large (>600). One of the issues from the python script was when a road ends in “Extension” - “Forest St Ext” became “Forest Saint Ext” The data is already processed, and this situation is rare enough to handle manually. The workflow using ColoredStreets and MapWithAI’s validator was enough to catch it.
The largest challenge is where addresses have already been mapped- significant progress by a local mapper in Altavista required a delicate touch, and I erred on trusting the existing data in areas where a ground survey was clearly completed. For the town of Bedford, it appears someone has added many buildings and addresses using MapWithAI, and the address data matches up 1:1 with my import. In this case, conflation returns almost no conflicts.
The majority of the working region is unmapped and most batches have not needed conflation, just QA validation on road names and spatial location.
Great to hear that you are making progress and thanks for the detailed report. Was it one of my scripts that messed up “Forest St Ext”? I can try and fix it for the next Virginia address import.
Yes, it looks like there needs to be some multiple suffix handling when it ends in EXT. It looks like the script didn’t treat “ST” as a suffix since it wasn’t at the end of the string. Should be pretty simple to implement. Thanks!
As of today, November 7th, the import is complete except for clean-up. All 400 batches have been imported into OSM.
Some conclusions based on working with the data:
Campbell and Bedford county’s data were the easiest to work with, and node location was nearly always dead-on to VGIN’s aerial imagery.
Amhest County had less care put into the data, and frequently the nodes would need to be aligned to the closest house on the parcel. I’ve done my best to catch as much as I can, but I can guarantee there are plenty of misaligned nodes still. Unfortunately I have no way to reliably find these, as the state-wide building footprint dataset for Amherst is of poor quality.
Many improvements to the road network were made- TIGER had a variety of missing, incorrectly placed, and misspelled names which were corrected using the address data as the primary source. If there was ambiguity, it was moved to alt_name. Geometry changes were kept to a minimum unless significantly misaligned.
To wrap up the import, a couple tasks for clean-up will be done:
clean up unit numbers where multiple units are listed into ranges if possible
find nodes such as churches in close proximity to address nodes and merge- in the beginning of the import, I was only conflating addresses themselves and it was only later on when I realized I could conflate to common amenities (like GNIS-added churches) and save some time not having to visually scan for them
use the Geofabrik OSM Inspector to identify and correct any other issues
Finally, I will re-use this thread and Wiki page when an address import is planned for Appomattox County, the fourth county in the Lynchburg MSA which was omitted from this first run due to its lower population and non-adjacency to Lynchburg itself. Thanks again to those who helped out.