Cleanup and normalization of GNIS imports to only use gnis:feature_id for the id tag

It’s a long time ago, but I did attempt to harmonise GNIS nodes from the GNIS import and NHD data for the Upper Colorado. The NHD data was often much better located and on the appropriate feature. So there may be more stuff which can be done with some of this data whilst tidying up. IIRC I tried to move the original GNIS Id on to the NHD feature (mainly reservoirs, dams and similar objects), but I think it was fairly ad hoc.

1 Like

I’ve been seeing this a lot with the “dam” feature class. A node from an import and a very nearby way with the same duplicated information. Always more cleaning to do. I’ll add some more notes for the next wave of scrutiny.

1 Like

NHD:GNIS_ID is now also eliminated.

I’m working through tiger:PLACENS but it’s much slower as a lot of boundaries have conflicting entries (either gnis:feature_id or gnis:id) so I’m having to do some homework to separate out the GNIS Populated Place id (and put it on a label node) and the boundary Civil or Census identifier.

There will need to be a bunch of scanning and cleanup for this as well but, again, I hope this makes it a bit easier in the future to keep things tidy.

There are about 14k tiger:PLACENS tags. Would it make sense to document some of the steps needed for the clean up and make it a group effort?

Fortunately, it’s not unmanageable. there were only ~60 in all of TX that had conflicts. I did not scrub the others for if they were correctly on boundary or label (and adding labels as necessary), that will need to be a bigger project and definitely a group effort.

Nation wide boundary tagging cleanup would be super helpful and we should build some tools for it.

2 Likes

The work continues apace. The current status:

  • Both NHD variants for GNIS id are now eliminated
  • About half of the tiger:PLACENS tags have been consolidated
  • About 25% of the gnid:id tags have been consolidated
  • ~30 items remain with more than 1 id tag where the tags disagree (almost all water features in AZ)

I should be able to make good progress through tiger:PLACENS and gnis:id this week. It’s somewhat slow because I cannot help but fixup various validator issues like “broken boundary relation”, “role validation issue” etc. As you all know, it can be tough to remain focused!

One funny secondary cleanup story. One extremely common issue is that a post_office and place node, imported from GNIS to the exact same location, got merged together at some point. Some of those had mismatch id’s in different tags but some lingered with one id or the other but still marked as both place and post_office. I’ve corrected all of the remaining one and extracted and located the various missing entities.

2 Likes

Off the top of my head, states left: AZ, CO, OH, DE, MD. Then some stragglers… ex: there’s a bit of OR that needs some fiddling.

Finishing up CA was a huge pain. JOSM really doesn’t like to load and work with 2.5+ million items.

I have also scoured the various gnis* tags in taginfo and have cleaned up most of the obvious ones (GNIS_ID, GNIS_NAME, gnis_name, GNIS_Name etc).

1 Like

“tiger:PLACENS” is now consolidated and removed from the wiki page.

2 Likes

“gnis:id” is now consolidated and removed from the wiki page.

1 Like

Current to dos:

  • Get this line in JOSM changed, then I can nuke ref:gnis.
  • Adjust this regex to “^[1-9][0-9]*$” in iD so that leading zeroes are not acceptable in gnis:feature_id tags
  • Add the synonyms to the deprecated tags file so they are less likely to get reintroduced
  • Fixup the ~300k items with a “gnis:feature_id” tag that have one or more leading zeros
  • Figure out if I want to tackle the ~8000 objects with gnis:feature_id but with no name=
1 Like

I took a look of a couple of them in my area and a high percentage of those ID’s are not existing. Is it somehow possible for you to check this easier than one-by-one? For the left overs we can use a Maproulette Challenge to fix them.

Yeah. We have some tooling that can interact with GNIS data (web and the national file) and query overpass etc. It will need some work to do a scan of this type but it’s achievable.

I patched up a couple dozen last night in WA and it was a mix of removed entries needing to be deleted out of the db and some wonky edits that caused the name to be stripped.

For example, this edit removed every name=Pond tag but there’s plenty of places that legitimately are named Pond. Easy mistake to make. Changeset: 128049858 | OpenStreetMap

There are now zero instances of a “gnis:feature_id” tag with a leading zero.

“ref:gnis” is now consolidated and removed from the wiki page.

2 Likes

I found that most objects in my area with gnis:feature_id but without name do exist, but the tag should be moved to another object. For example, a school where the name is on the amenity=school area around the grounds but the gnis id is on the unnamed building. Or a park or cemetery that was a node then changed to an area but the gnis tag was not moved.

1 Like

Also frequently I found, the OSM-object moved to a (multipolygon-)relation. Though I think it makes sense first to remove all ID’s which are not existing in GNIS and then work on fixing the issues manually. At least I don’t want to put effort in fixing a link, which is not existing anymore :wink:

Btw. have you checked the ID of your schools are still existing? All schools I have found in my area had a ID which wasn’t existing anymore on GNIS.

There’s actually an important distinction to be made here. Some of the GNIS feature classes (e.g. School) are archived. The records still exist and are readily available in the archived data set provided by USGS, but they haven’t been updated since 2021 at the latest.

Records in the archived classes are often much more out of date than that, though, because USGS didn’t do a good job of maintaining the data. So, you can still find these records and use them for reference. But you need to verify everything in them against more reliable current sources.

The only case where I would suggest deleting the gnis:feature_id tag is if the feature in question no longer exists, or if the current feature present in the real world is no longer the same thing as the corresponding GNIS record (e.g. a church no longer used as a place of worship).

1 Like

For reference:

There are other data formats available from the current GNIS download page.

1 Like

I recently used the archived class data to add missing fire_stations in Hawaii as part of our disaster mapping. There’s lots of junk so definitely be wary. Fortunately, it also has tons of stuff that is very easily seen on aerial and street side imagery once it gets you to the approximate location.