Proposed Import of Rockbrige County, VA address points


I am planning the import of address points in Rockbridge County, VA, and I seek community approval for this import.

Wiki page: Rockbridge County VA Address Import

I won’t duplicate the details here from the wiki, but to summarize, the goal of this import is to imporove OSM so it can be used reliably for routing within the area. While the focus of the import will be address points, Road names will be corrected/added as needed.

I welcome comments.

1 Like

First, thanks for following the rules: consulting the community, making a wiki page, and adding your proposal to the import catalog.

Since the intention is to be capable of provided accurate address data to first responders, error on the side of leaving an address point rather than removing it, is given when there is uncertainty if an addressable object exists on-the-ground.

If you are not reasonably certain that an address exists at the indicated location I would not add it to OSM. If you detect such addresses you might want to contact the appropriate authority and ask for clarification.

If I understand correctly, you are breaking the file into parts by zip code. Those might contain a large number of addresses (assuming five digit zip codes), and if any manual review is needed it may take a long time to process each file properly. Have you considered breaking down by street + zip?

Are you intending to move each address point to be on top of the “main building” associated with the address as it appears in ortho imagery? Many times address points are placed at the center of the corresponding parcel, which could be some distance from the “main building” or where the main activity takes place. This is particularly true in rural residential areas where the lot sizes are large. It seems that if you are going to conflate existing buildings that do not already have addresses with the imported address points you will have to do this.

Address Coordinates are using EPSG:3968

I don’t think you can tell a CSV file what projection the coordinates are in, so JOSM is going to assume WGS84 — but I haven’t tested this. You might want to use QGIS to re-project the data.

Can you provide us a copy of the data after it has been converted to the final format before import?

The data from the state only has FULLADDR. How do you plan on parsing into addr:housenumber, addr:street, etc?

I am not seeing zip codes in the shapefile I just downloaded now (unless it is somehow encoded in SITEADDID).

Data is in all upper case, how are you planning on converting to the proper case for OSM?

1 Like

one more thought: you should expand abbreviations, e.g. RD → Road

I know there are different schools of though on it, but I would get rid of the addr:state tag since it is fairly redundant and easy to calculate for applications that need it. I would also add to be careful with the Mc- and Mac- street names which can’t just be title-cased from the upper case it looks like it came in. That’s pretty easy to run a regex on or something to fix.

The address data around Rockbridge is also available on the ESRI National Address Database feed. You may consider using that instead as it will already have name expansion etc taken care of for you.

As your goal is to have something accurate for first responders, you’ll very much want to do something with a bit more heavy review. Moving addresses to buildings (instead of driveways/parcel centroids/etc) will almost certainly being doing them a solid. We’re about 70% done with the greater Phoenix metro using that data source and moving things as necessary. I estimate my average rate is ~3-5000 addresses an hour. You can see the project here: OSMUS Tasking Manager

I am very happy to create a project on the OSMUS task manager if you think that is helpful.

I strongly recommend installing the MapWithAI plugin as it has a wonderful address validator rule: “addr:street name not found nearby”. It’s incredibly good at catching typos/errors in both the current roadway data AND the address data source. A fantastic cross check.

I like having addr:state but won’t fight about it. But please don’t add addr:country! :grin:

First, thank you for the detailed response.

‘Resonably certain’ to me means a structure visible on VBMP imagery, or in an area of new housing development where a structure may not be seen on imagery. For structures, sometimes this is what appears to be old abandoned farmhouses by themselves. I do not include addresses that are sometimes present for a lot of empty land, usually an open field of farmland. In areas of new housing developments, where it is obvious from aerial imagery that construction is active, I include addresses even if the structure is not present on imagery. If this is questionable, I’m open to discussion.

I am breaking it by zip code. This sometimes is a large set of data, what I have done previously is split by CSV file into files of 500 addresses each.

I place the address nodes on what seems to be the main structure or object (tower site, water tower, residence). If a structure or POI exisits, the address is conflated with the pre-existing object.

When I add the csv as a layer to JOSM, it asks sometimes for me to define the projection used.

Yes, I can provide a dropbox link for the csv files. (still working on the road names)

The data from the state does provide all the addr:* tags. It seems the shapefile downloadable from VGIN is missing that data for some unknown reason. If you download the text file, all the data is there. A side note, I submitted the VGIN database to ESRI, who kindly added it as a feature layer, but unfortunately, it is missing the addr:postcode and addr:city tags presumably for this problem. Last year, the downloadable shapefile included all these tags.

I imported the text file to QGIS. I’m not the best with the software, and I’m sure there is a more effecient way, but I added a second layer with the county boundary. Then, I extracted address points within that boundary (as opposed to splitting the entire state by county which takes forever). I then export the entire county as a CSV file.

I parse all the text using Excel. Expanding N–> North, using the proper function for capitalizing road names (Manually, I edited the McM and MacM* type names to add the mid word capital letter. I conectate the column of RD, ST type abbreviations with ‘!’. Then I use the find and replace function to find the next !, and replace for example ST! → Street, TPKE! → Turnpike, etc. Finally, I conectate the columns containing street prefix (North, East, etc), Street name, and street type.

Then, using filters, I filter by postcode, copy and paste those results to a blank excel file which is saved as CSV for import to JOSM.

1 Like

I have been including addr:state, but what you say makes sense, I’ll remove it. Should the addr:county tag be kept?

Unfortunately on my last import of Augusta County, VA, messed up on the Mc and Mac road names, and they do not contain the second capital letter. Maybe I can figure out an overpass query to fix it. I’m not skilled in regex or overpass, I did after several hours yesterday figure out a query to find any address not within 80 meters of a way, and made that a maproulette challenge.

I wonder if this forum has a spell check feature I can turn on?

My understanding of the ESRI NAD database is that it is a USDOT resource. I think, but have not compared, that the VGIN address points are more accurate, as each locality (who maintains it’s own GIS system), submitts quarterly updates to VGIN, so it is regulary updated.

I do have the MapwithAI plugin on JOSM, and the street name validation is very useful for these imports. I end up adding a layer in JOSM with the VGIN road centerlines, and refer to that for accurate road data. Quite frequently, a road name is mis spelled from tiger imports, or a small lane is missing.

I see you like the addr:state tag. I previously commented here I would remove it. I would be interested to hear the arguments (both sides), of whether or not to include. I presume data consumers would like it included, while map users find it extra.

Thank you for the OSMUS tasking manager offer! Let me think on this some. It may serve a better purpose for adding driveways for houses that are not visible from the road, which is easy to do with VBMP imagery by anyone in JOSM or RapID. I would want to review the USDOT data, comparing to VGIN, before saying go on that. I’ve messaged ESRI about the missing data in the VGIN layer they’ve added, but I don’t think it’s been acted on as of yet.

1 Like

Additionaly, since there are opinions both ways on this discussion on the addr:state tag, I would like to learn both arguments. Would you mind explaining or pointing me to a discussion on this topic?

I will check the text file out. Interesting that the shapefile doesn’t contain the same information.

I would highly recommend using a script to do this so that it is repeatable. For example, supposed you go through all of these steps, but discover you need to make some adjustments and start over, you now run the risk of missing one of these many steps. Others have done address imports and published their scripts. Check the wiki.

I would include it. It is a standard part of an address in the US.

No, this is not a standard part of a US address.

If the addressing authority assigns addresses to vacant lots, I think it is ok to include them. What I would not include are addresses that do not make any sense and are likely the result of errors.

500 seems like a large number given that you will be moving each node to be on top of the main object in overhead imagery, but if it works for you I guess it is fine. Are all addresses in the 500 geographically clustered? Working area by area is useful as inconsistencies stand out, such as if you have an address associated with a street that is far away from that area (although QA tools should spot those as well). I still think working street by street is a good approach.

1 Like

What your saying about a script sounds very nice. I enjoy conflating addresses and correcting position in JOSM, but the data processing is, well, painstaking.

I’ve done some brief searching, and I’ll study some more. I’m looking through the Import Tools page. If you have any reccomendations I would find that very helpful.

I will look more later, but if you go to the import catalog and search for address it should lead you to some examples. If you need help modifying the code I could potentially help.

As @tekim notes, addr:state is part of the standard US postal address so I tend to include it. It’s also how most local folks would expect their address to look so there’s a consistency argument.

The contrast to this is the “is_in” tag which is mostly a relic from when reverse geo-coding wasn’t as robust as it is today. It’s much easier to ask “what city/county/state is this node in” than it was 10 years ago.

Here’s an Overpass query that can help patch up Augusta County. Feel free to play with it and let me know if you have any questions.

In the US when addressing a letter to be sent to another address in the US it is formatted

123 Main St
Auburn, AL 36830

Which is my understanding of why addr:state is preserved in the US. I’ve never written USA on a domestic letter in my life, and I never put addr:country on addresses in the US.

1 Like

I don’t feel super strongly and the others say keep it, so that’s fine. Maybe the “leave it off” school of thought is a school of one! My thinking is that it’s just tag clutter; the addr:state tag is just as geocode-able as addr:country.

Thank you all for all the help so far. @watmildon Thanks for the query, I fixed a few already.

@tekim I’ve found someone’s code on github. Although I have no idea how to implement that. I found ogr2osm, and the osm wiki page for it says “Using a mechanism known as “translation”, the script also allows the conversion of source tags and meta-data into properly formatted OpenStreetMap tags”, which sounds like it might be helpful to me.

Which led me down the path of installing python, attempt installing ogr2osm (failed because I didn’t have gdal), attempt installing gdal (failed because I didn’t have C++), installing C++ with visio studio now, we will see how I get along.

1 Like

I don’t feel strongly either regarding keeping it. It is certainly not “a hill to die on.” There might be some edge cases where addr:state doesn’t match physical state, but they would be very rare if they happen at all.

1 Like

I feel your pain. Unfortunately (in regards to this situation), I only have Linux, so I can’t provide any guidance as to how to get all of that running on Windows.

ogr2ogr is a great tool, and could be used here I think, but since you are just working with csv files, you probably only need Python. I will try and put something together, and then you can expand it with your logic which you have already outlined for us.

1 Like

Ok, I’ve got ogr2osm running, now trying to figure out translation files.

I do have a shapefile, that I created with QGIS from the downloaded text file of address data points.

I guess I can use whichever file type is easier to process.

There are a few instances of this, in an area I’m not working with right now. Near Bergton, VA, are some houses located in West Virginia, but their mailboxes are some distance from the house in VA. So these West Virginia properties have a VA address. Definitely a rarity.