Proposed Import of Rockbrige County, VA address points

Ok, I’m stopping for today. It will be a few days until I can pick this back up. I’ve done several conversions with ogr2ogr, ogr2osm, and gdal_translate.

I’m impressed by how quickly it processes the files, so much quicker than QGIS.

I think where I’m at is figuring out how to make a translation file. I understand this file to be the current format I need, but need to do some modification to match my data. Maybe if I can find enough examples online I will be able to come up with something functional.

Here are the fields, and what I need to do to them:
  • ADDRNUM -Reliable, addr:housenumber=*
  • ADDRNUMSUF - Probably should be addr:unit=*
  • UNITTYPE - abbreviations expanded, conectated with unit id for addr:unit=*
  • UNITID - conectated with the above unittype. Some of these fields have the UNITTYPE abbreviation in front of the unitid instead of in the unittype field.
  • STREET_P_1 - Direction prefix (N=North, etc)
  • STREET_NAM -Street name, reliable, needs capatilizaiton fixed. 3RD is correct when listed that way and should not be changed to ‘Third’. 3RD → 3rd.
  • STREET_TYP - expand abbreviations using USPS abbreviations list
  • PLACENAME - name=* & fixme=* for manual review.
  • PO_NAME - addr:city=*
  • STATE -addr:state=*
  • ZIP_5 -addr:postcode=*
  • Any node with a addr:unit=* tag should have a fixme=* tag added, many times there are multiple nodes placed on a building, one will not have the unit id, and the others will. Many times the nodes with addr:unit=* are deleted.

Here is a link to the CSV export.
and a direct download link for the VGIN TXT file.

I’ll post updates here as I make progress.

I don’t know anything about translation files but wanted to say I’m impressed with the progress!

Thank you. I just like tinkering.

1 Like

I was reminded today of the various populated areas that span 2 or more states (ex: Texarkana) as a good example of why including addr:state can be valuable.

A couple of additional observations about the source data:

  • It appears that the STREET_NAME itself may contain abbreviations rather than just the STREET_PREFIX and STREET_TYPE. For example, in some cases Mountain is abbreviated Mtn (but in other cases it is spelled out).
  • In some cases the STREET_NAME seems to be possesive, but is missing the apostrophe For example, “JACOBS LADDER” I am not sure as to whether there should be an apostrophe. I guess during the manual step you can check how the name of the associated street is spelled.
1 Like

If a word in the STREET_NAME starts with MC can we assume that the third letter should always be upper case, or do we have to handle ever case individually, e.g. McDonald, McCrown, etc?

1 Like

Excellent observation and attention to detail. Many expansions skip this for fear of “guessing too much” but I think it ends up being cleaner looking. I ended up handling the few cases around me somewhat ad-hoc. My giant switch statement of cases is here.

As for possessive apostrophes, I add them manually when I see opportunities that are super clear but skip otherwise.

I have some results! I used Chat GPT to format and filter my data (some may call it cheating, I call it using my resources :grin:). You can read over my entire chat here. Scroll all the way down to the last reply for a good summary of what I did, and the promp I’ll try next time to make it quicker. I’m probably close to my 50 prompts every 3 hours use limit right now :face_with_raised_eyebrow:.

Here is the final CSV file on Dropbox.

I’ll be adding this to the wiki as well. I need to do some more work to break up the file into managable portions. Maybe I’ll ask chat GPT to sort by road name, then give me 250 address blocks.

In JOSM, after conflating the address points, I use the review plugin with a little Auto Hot Key script. So I can approve an edit and go to the next with a single button press.

Let me know if you see any trouble in this data.

Intersting idea to have Chat GPT do the work.

Street names: MCClure should probably be McClure. Applies ot other names that start with MC too.

The file has a number of cases where addr:street is blank but this is not the case in the source file as far as I can tell

“B And W Cabins Court” probably should be “B and W Cabins Court”

“Hunter Hill Ext Road” should probably be “Hunter Hill Extension Road”. Applies to other cases where EXT appears in the data

I will continue examining later

Thank you, @tekim. My apologies for not examining the results in more detail prior to posting. I understand the problems you found and definitaly need to fix them. I’ll do some more work and post results.

@pmfox No problem. Btw, since I am interested in adding addresses in my area, I started writing a Python script to do these type conversions (barrowing from the work of others). It is nearly ready. I can send it to you if you are intersted.

I am most certainly interested. While Chat GPT is useful, it’s not binary and the same prompt with the same info doesn’t get the same results. I guess sometimes thats a feature and other times a flaw. A python script would not be like that.

Ok, did some more work with it. Here is the dropbox folder containing a file with all the data, then smaller files broken up for managable tasks.

Question for the community:
The wiki says:

As of June 2021, it is far more common to omit the designator and only include the identifier (typically a letter or number) in addr:unit=*.

For example, “Apartment 5” would become simply “5”.

I know the wiki is descriptive rather than prescriptive, what does the community think is best practice now?

I would like to know this as well. For context, the data has both styles in it, probably half have the designator in front of the unit identifier, and the other half does not.

For a commercial building with Suites, I think retaining the Suite is desireable. 2654 Anyplace Way, Suite 101, Sometown, VA 99999 seems better than 2654 Anyplace Way, 101 (or #101), Sometown, VA 99999.

However in the case of a duplex house, it might be written out as 2654A (or 2654 A) Anyplace Way, Sometown, VA 99999.

The latter case not useing the address line #2, as in the case of many forms where one enters their address, but would be entered entirely in address line #1.

My personal preference would be to leave the designator in place, when provided by the data source. However, I do believe there needs to be consistancy in one building. All units in one building must be formatted the same.

Look forward to hearing.

Much improved! Good work. I will try to get some time to look at this more indepth tomorrow, but here are a couple of things I noticed:

  • The addr:street “Street Andrews Drive” should probably be “Saint Andrews Drive”
  • The addr:street “Off The Beaten Path” should probably be “Off the Beaten Path” This is how the associated street is named in OSM.

Almost all of the addr:unit that I have seen around the US just have the identifying portion (number or letters). Clicking through the values in the US taginfo page for this key seems to show that stripping identifiers down to just the number/letter is typical. That said, values like “Unit A” and “Apartment A” do show up in non-trivial quantities.

There’s also ~600 items with “addr:unit=Apartment” which probably warrants a follow-up. I’ll add that to my list.

  • It looks like any time “AND” appeared in the street name in the original data it got changed to “&”, for the cases I tested, USPS says that “and” is correct. I wouldn’t say that USPS is the ultimate authority on this, but given that the source data from VA and the USPS agree “and” is probably correct.
  • The unit (OFF/Office) was removed from “481 Steeles Fort Road, Office, Raphine, VA 24472”, according to USPS an address can have a unit type of “Office” without a following number. The original data contained “OFF” for the unit, which isn’t the official abreviation for “Office” (it is “OFC”).
  • The unit (STO) was removed from “481 Steeles Fort Road, STO, Raphine, VA 24472”. “STO” is probably an error, but it probably needs to be replaced by something else, perhaps “Apartment B14” based on the units around it. For now, leaving “STO” and adding a fixme=* tag is probably the way to go. If you are local you might visit the complex, or call the local authorities or even the apartment management. Note “STO” is not an accepted abbreviation according to the USPS, but perhaps it means “Stop” (as in Mail Stop), but that would require a following number/letter designator, which this doesn’t have.
  • It seems if the original data contained “APT” immediately followed by a number or letter (no space, “APT” was not expanded to “Apartment”

More later…

This record in the input data:

ADDPTKEY=3912193935
ADDRNUM=143
ADDRNUMSUF=A
ESN=312
FIPS=51163
FULLADDR=143A GRAVEL LN
FULLNAME=GRAVEL LN
LASTUPDATE=3/16/2023 0:00:00
LAT=37.747397316000040
LONG=-79.455601039999976
MSAG_COMMUNITY=LEXINGTON
MUNICIPALITY=Rockbridge County
OID=-1
PO_NAME=LEXINGTON
PSAP=7194
SITEADDID=51163000013166.000000000000000
STATE=VA
STATUS=CURRENT
STREET_NAME=GRAVEL
STREET_TYPE=LN
USNGCOORD=17SPB3606278911
ZIP_5=24450

Was translated to:

addr:city=Lexington
addr:housenumber=143
addr:postcode=24450
addr:state=VA
addr:street=Gravel Lane
addr:unit=A

With all of that data it may not be obvious what happened, but I wanted to make sure that everyone had the full context. The ADDRNUMSUF in the original record became addr:unit
in the translated record, when it should have been appended to ADDRNUM to create addr:housenumber. This is supported by the fact that FULLADDR=143A GRAVEL LN in the original record.

This record in the original source data:

ADDPTKEY=16760208188
ADDRNUM=3875
ESN=307
FIPS=51163
FULLADDR=3875 I-64 & I-81
FULLNAME=I-64 & I-81
LASTUPDATE=3/16/2023 0:00:00
LAT=37.875692131000051
LONG=-79.309467771999948
MSAG_COMMUNITY=FAIRFIELD
MUNICIPALITY=Rockbridge County
OID=-1
PO_NAME=FAIRFIELD
PSAP=7194
SITEADDID=51163000000032.000000000000000
STATE=VA
STATUS=CURRENT
STREET_NAME=I-64 & I-81
USNGCOORD=17SPB4868093369
ZIP_5=24435

Does not appear in the output (1_Complete data in one file.csv). I am guessing that ChatGPT didn’t like the fact that STREET_NAME was actually the intersection of two highways.