Proposed Import of Rockbrige County, VA address points

Ran the script on my three counties of interest, some things of note:

  • My counties have significantly more addresses (this makes sense, Rockbridge is significantly more rural than the Lynchburg suburb counties). Amherst at 16,314, Bedford at 42,429, and Campbell at 38,130 addresses. JOSM is still able to load this number of nodes so no issue there.
  • Running the script on Bedford County caused an error due to street_suffixes not being defined- I just copied the table from street_prefixes and it ran without issue.
  • In Amherst County, H* is used as an abbreviation for Highway. Similarly E* for Extension and T* for Trail. I assume these are edge cases but should be pretty easy to write in. EDIT: more of these letter* abbreviations than I thought. B* for Bend, P* for Plaza, R* for Row AND for Ridge… augh. I’ll have to see if there’s a lookup table for this. However some of the letters are recycled and ambiguous so there’s no good solution except for cross referencing other data sources.

Otherwise it looks great, hacked together as it may be. I don’t have a lot of Python experience but I can easily modify it for my counties. I will still wait for the Rockbridge import to be done as a trial.

There was consensus earlier in this discussion to remove the addr:county=* tag. But what about addresses that have addr:city=Waynesboro, some being within the city limits, and others in Augusta County? They all use the same PO, but would vote at different places, use different schools, etc. the same goes for Staunton, and, I think, Lexington in Rockbridge County.

This might be a community question, should this be A) disregarded as non important (can determine if in county or city by boundaries), or B) selectively include the addr:county=* when addr:city=* is split by administrative (or might be magesterial, idk) boundary?

edit: I’m finding some tags like addr:city=Waynesboro;Augusta County and addr:city=Lyndhurst;Augusta County. I’m certain this is not the correct way, (all lyndhurst addresses are in Augusta county anyway), and am removing the “;Augusta County”. I’m not sure how that came to be, but I think it may have been a bad edit on my behalf in the past.

I am of the opinion addr:county should only be used if the addr:city field does not exist.

Virginia is a special case in the USA as legally our independent cities are not part of any county. But for addressing, I do not believe it is any different than a point being inside/outside a city boundary within other states. I know the USPS addressing likes to bleed across city limit lines there, too.

Glad things are going well. I have been off the grid most of the day.

addr:city=ABC does not mean the address is actually inside the city limits of ABC, ABC is just the city the USPS has assigned to that address. The address could be in an unincorporated area, or even inside another city.

I double checked and the Python program does not produce these tags.

More later…

Ok, sounds good we will continue without the addr:county=* as agreen.

@tekim, if these bad tags were caused by my bad edits, it most certainly was NOT your python script. Sorry I was not trying to imply that.

Something to consider that may have not been brought up yet: What are your plans when addr:street do not match the nearby road name (case, hyphenation, etc)? Will you edit the address node, edit the street, or let there be a data discrepancy? I ran into this a little bit when working on addresses from rapID for Lynchburg.

Example A: addr:street=McDonald Lane, street name=Mcdonald Lane
Example B: addr:street=Crane Hill Road, street name=Cranehill Road
Example C: (real I saw from the Amherst dataset) addr:street=C & O Lane, street name=Chesapeake and Ohio Lane

I assume most roads are going to be still untouched from the TIGER import. I lean more towards correcting the road in this situation as the address data is going to be updated and from VGIN rather than TIGER. Best practice is probably to go out there on the ground and verify what the street sign itself says, but that’s out of scope of an import.

1 Like

I run into that a lot. Dice’s Spring Road (Dices), Tinkling Spring Road (Springs), C&W (C and W), I could go on of actual examples in Augusta County.

The problem is multifaceted. First, the county GIS alone has discrepancies (which is then reflected in VGIN). Next, VGIN road centerlines don’t have apostrophes (maybe locality doesn’t submit them). If it’s the spelling I’m after, usually VGIN RCL has the answer. Just not always for symbols.

I try always make the street name and addr:street of all addresses match. My first preference is mapillary for viewing roadsigns. If that doesn’t work, typically I pick whatever format has the majority.

We did this as part of the San José building and address import, at least for the cases we couldn’t verify using street-level imagery. If I remember correctly, we used QGIS to come up with an optimized route between problem spots, though I don’t recall if we jotted down the process anywhere. It ended up being a mix of TIGER being right, the city being right, both, or neither.

We were also able to spot obvious errors without even checking imagery. For example, there were often names like “Business Driveway” and “… Apartment Entrance”. In a similar import in Cincinnati, there was a major street where the data had consistently used the abbreviation for “Road” where it should’ve been “Pike” by common knowledge. These were things that I just patched up in JOSM while working on an individual cell in the tasking manager.

1 Like

While I agree with mapping on the ground, like has been said here it’s not always feasible (in the present short term).

My thought is it’s better to have the addresses on the map, all matching with a matching road name, than having whatever road is there from tiger import with no addresses. If a solid attempt at using authoritative data is not successful in finding the correct formatted name, it’s probably pretty close. My preference is to have the data there. When opportunity comes, and it is verified by a visit, it’s a pretty easy fix.

Basically while I agree we want to have it right, I don’t want perfectionism to get in the way of practical usefulness. I do that in other areas of my life sometimes.

I agree with that sentiment. In that case I’d probably leave mismatching names as stated in the source. JOSM’s validator will yell at you, but consider that a breadcrumb for someone to find the discrepancy later on. Another way to leave a breadcrumb would be to add any disagreeing names to the roads as alt_name. That would be more difficult to automate though.

1 Like

Here’s my experience doing multiple large address additions. YMMV.

It’s very possible to sort out a huge percentage of mismatched addr:street= and roadway name= tags using street level imagery alone.

For mismatches, the number of roadways that are clearly human error typos was higher than I’d expected.

I include hyphens and apostrophes where it makes sense and will match the two.

If I cannot sort it out, I add the address and leave the roadway alone. Other validation tools (ex: OSM Inspector) will keep reminding us that someone should go check.

1 Like

No apology needed. I didn’t think you were implying that. I am quite willing to admit that I might make mistakes, so I had to check.

The question now becomes how exactly did that get into the data, and what else might have gotten into the data? I understand that you added fixme and name fields, but that shouldn’t changed anything in the addr:city field.

@OptikalCrow,

My plan is to generalize the tool so that you can just enter the county in which you are interested on the command line. I will also incorporate all of the USPS abbreviations provided by @pmfox (thanks!), and try to account for all of the exceptions (e.g. McCormick vs. Mccormick).

@pmfox , @OptikalCrow
Perhaps we should enter these issues that related to the Python program on it’s github project.

I will look into it, I am not intentionally removing hyphens

Not sure what we can do about that, other than to address manually at the time of import. Thinking…

Currently the rule N->North is only applied to the STREET_PREFIX and STREET_SUFFIX. I am not sure how this happened, but I will look into it.

Might want to put indivdual abreviations in the regex for addr:street as perhaps some form legit parts of the name?

This is an issue with the source data. “STREET_NAME”=“AND W”, “STREET_PREFIX”=“N”

e.g.

ADDPTKEY=29583217061
ADDRNUM=29
ESN=3
FIPS=51015
FULLADDR=29 N AND W LN
FULLNAME=N AND W LN
LASTUPDATE=4/17/2023 0:00:00
LAT=37.955318929000043
LONG=-79.163319932999968
MSAG_COMMUNITY=RAPHINE
MUNICIPALITY=Augusta County
OID=-1
PO_NAME=RAPHINE
PSAP=7085
SITEADDID=51015000030900.000000000000000
STATE=VA
STREET_NAME=AND W
STREET_PREFIX=N
STREET_TYPE=LN
USNGCOORD=17SPC6136002448
ZIP_5=24472

BTW, I am not seeing a “STREET_P_1” column/field in the source data, only “STREET_PREFIX”

This is also an issue with the source data. In the source data “STREET_NAME”=“C BO G” for the three addresses in Augusta County that got translated as “addr:street”=“C Bo G Lane” The corresponding street in OSM has name=“C Bo G Lane” incidently.

e.g.

ADDPTKEY=30275234787
ADDRNUM=40
ESN=3
FIPS=51015
FULLADDR=40 C BO G LN
FULLNAME=C BO G LN
LASTUPDATE=4/17/2023 0:00:00
LAT=38.115030580000052
LONG=-79.154685908999966
MSAG_COMMUNITY=STAUNTON
MUNICIPALITY=Augusta County
OID=-1
PO_NAME=STAUNTON
PSAP=7085
SITEADDID=51015000016008.000000000000000
STATE=VA
STREET_NAME=C BO G
STREET_TYPE=LN
USNGCOORD=17SPC6176720185
ZIP_5=24401

BTW, it looks like addresses have already been imported for Augusta County and that import translated the addr:street as “C Bo G Lane” as well.

Been busy off late sorry for leaving you guys hanging, I’ll have much more time this weekend.

@Minh_Nguyen & @watmildon, on the topic of having matching formatting for addr:street and street name, when there are discrepancies in the data: What has become clear to me is, if I make them all match when I am unable to verify, no one in the future will smell a problem. If I leave a little problem, like leaving street name Dices Spring Road, while the addresses would have Dice’s Spring Road, that would signal to future mappers that something needs review, even without a fixme tag.

I want to check into this some more. My hunch, is that during either my initial import a year ago, or my update a few months ago, the addr:city column somehow got the county name in it. Then using the conflation plugin, it merged the two tags. Still seems a little funky to me, because it was relatively low instances of this compared to the size of my import edits, so I don’t think it was mass error on import data.

Be cool if they had data in the right columns. Thanks for checking.

I’m not sure where I got the street_P_1 name, perhaps it was from one of my earlier chat GPT edited files.

1 Like

No problem. We all have limited time.

I didn’t realize this issue concerned data already in OSM. Nevertheless, I searched the Augusta county area in OSM and didn’t find any examples where the addr:city tag contained “;” (I used Overpass attic data in case you had already fixed it). Could you provide more details as to exactly where this was found?

The quality of the data in some of the other counties is much worse. e.g. some addresses contain “(dup)”, apparently to indicate that it is duplicate? Why not just fix it?

@pmfox I took a look at the file rockbridge_updated_w_names.csv from your Dropbox account. All looks good as far as I could tell. Obviously there will be some tweaks you will need to make during the actual import, for example to make the address match the associated street name (if possible). Also, most of the points appear to be on top of the main building associated with each address,which is great. I don’t feel there is a need to review the split files.

@OptikalCrow I am working on revising the program. There will actually be three programs, one to split by county, one to do the translation, and a third to do a “summary” for QC purposes. Each will accept command line parameters so that you can run it on any Virgina county you want. I also hope to move all of the translation rules to a seprate parameter file so people don’t have to edit the Python code.

I have an idea as to how to handle the street names that have a “R*” (e.g.) to represent the street type. We can download the roads from OSM for the county, and if, for example, those roads have a “Oak Ridge”, but no other “Oak R…”, then we know that the addresses should be “Oak Ridge” and not “Oak Road” (or Row, Run, etc.).

1 Like

Ok, I found some:

It looks like the error was not from my import, after all. The error happened on V2, where somebody was doing a task from the MapRoulette challenge I created.

I’m inexperienced with maproulette, but I don’t think my challenge would in any way cause these bad tags to be created without knowledge of the editor.
It seems a common theme in all the examples is user hwierzbicki. Maybe I should message him.

@tekim thanks for reviewing the data.