Cleanup and normalization of GNIS imports to only use gnis:feature_id for the id tag

Yeah. We have some tooling that can interact with GNIS data (web and the national file) and query overpass etc. It will need some work to do a scan of this type but it’s achievable.

I patched up a couple dozen last night in WA and it was a mix of removed entries needing to be deleted out of the db and some wonky edits that caused the name to be stripped.

For example, this edit removed every name=Pond tag but there’s plenty of places that legitimately are named Pond. Easy mistake to make. Changeset: 128049858 | OpenStreetMap

There are now zero instances of a “gnis:feature_id” tag with a leading zero.

“ref:gnis” is now consolidated and removed from the wiki page.

2 Likes

I found that most objects in my area with gnis:feature_id but without name do exist, but the tag should be moved to another object. For example, a school where the name is on the amenity=school area around the grounds but the gnis id is on the unnamed building. Or a park or cemetery that was a node then changed to an area but the gnis tag was not moved.

1 Like

Also frequently I found, the OSM-object moved to a (multipolygon-)relation. Though I think it makes sense first to remove all ID’s which are not existing in GNIS and then work on fixing the issues manually. At least I don’t want to put effort in fixing a link, which is not existing anymore :wink:

Btw. have you checked the ID of your schools are still existing? All schools I have found in my area had a ID which wasn’t existing anymore on GNIS.

There’s actually an important distinction to be made here. Some of the GNIS feature classes (e.g. School) are archived. The records still exist and are readily available in the archived data set provided by USGS, but they haven’t been updated since 2021 at the latest.

Records in the archived classes are often much more out of date than that, though, because USGS didn’t do a good job of maintaining the data. So, you can still find these records and use them for reference. But you need to verify everything in them against more reliable current sources.

The only case where I would suggest deleting the gnis:feature_id tag is if the feature in question no longer exists, or if the current feature present in the real world is no longer the same thing as the corresponding GNIS record (e.g. a church no longer used as a place of worship).

1 Like

For reference:

There are other data formats available from the current GNIS download page.

1 Like

I recently used the archived class data to add missing fire_stations in Hawaii as part of our disaster mapping. There’s lots of junk so definitely be wary. Fortunately, it also has tons of stuff that is very easily seen on aerial and street side imagery once it gets you to the approximate location.

has feature-ID: 1623616, searching for it on Geographic Names Information System gives no results. In my understanding, this is the “official” search function. If there are no results, what’s the benefit of having the link in OSM? How should an ordinary mapper be able to verify this data? He wont and you will end up continuously with the situation, that no one cares that link. Because the impression is: This data is useless.

I can understand, using somehow archived/outdated data to map something based on other sources. But this is not related at all to the question, whether links to not maintained data should be kept.

If it’s an archived class and still is that feature it’s still that number. Any mapper can find it in the links Kai has above. Your argument seems to be that any data set id that isn’t actively maintained doesn’t have any value in OSM which I disagree with. If nothing else it makes activities (like the fire stations above) easier for mappers to work through.

To be more precise, this is one of the official search engines for the maintained classes only. The USGS no longer maintains a convenient search engine for all of GNIS, so we’ve hooked up iD to the tool that’s user-friendly and supports the majority of features. An ordinary mapper can still download the ZIP file of CSVs of archived features. (It isn’t very large by modern standards.) There are also third-party search tools, an official linked data distribution, and even a tile layer that you can load into iD, all of which include the archived classes.

The main use of GNIS feature IDs is to serve as a citation, a starting point for research. I have often used GNIS feature IDs of post offices (one of the archived classes) to restore a name that an overzealous mapper turned into a generic “United States Post Office” based on a validator suggestion. While I’m at it, the archived feature still has coordinates that are more up-to-date than what we’ve imported into OSM, and often a street address too. This would be very difficult to obtain without the gnis:feature_id tag on the post office node.

That said, if the relative inaccessibility of these archived classes matters to you, the best course of action would be to link the feature to a Wikidata item which in turn references GNIS (or create such an item if it doesn’t already exist). The GNIS feature ID automatically qualifies the item for inclusion under Wikidata’s notability guidelines.

1 Like

I couldn’t have said it better.

What’s going on behind the scenes with the USGS GNIS search web site is that USGS has the ability to flag individual GNIS records to be visible/invisible on that web site.

I have actually had discussions with USGS about whether certain records should be visible based on the existence, non-existence, or classification of the features. And in some cases they have changed the status of the records based on those discussions. Those records still exist and the Feature IDs are still valid.

There was one case that involved some uncertainty about whether a particular feature is a well (and thus a man made feature that should be archived) or a spring (and thus a natural feature that should be maintained). So far, GNIS has decided that the feature is a well. But subsequent evaluation by a qualified geologist might determine that there is a natural spring at the location and that the man made improvements were merely for the collection of surface water. In which case, GNIS would make the record visible again and update the class and location (which we thought was incorrect). And we’d get to talk to them about renaming it to remove a slur from the name.

For now, it’s mapped as a well in OSM with the assigned gnis:feature_id.

2 Likes

There are two links, one is the USGS, the other one needs a account.

To be frank, I would either delete them after having an object mapped with local knowledge. As the name in the archived GNIS will be always older or simply ignore that tag, leave it where ever it is (which is the current situation).

Why is that? Because the effort maintaining it is not worth it.

Btw. 1623616|San Marino Golf Club|Locale|MI|26|Oakland|125|422917N|0832458W|42.4880901|-83.4160453|||||265|869|Northville|04/14/1980| which is outdated according to the data in OSM

Somehow it feels pretty strange to trust any database with survey dates back in the 80`s

You’re probably referring to this ArcGIS application. That was a temporary tool while they were still setting up the more permanent one. I’ve removed the link from the wiki.

GNIS-LD is another official distribution of GNIS that includes the archived classes. For example, here’s a post office. iD links to the main search engine because this one is full of Semantic Web jargon that would probably confuse the average mapper.

GNIS is the federal government’s official gazetteer. There’s value simply in asserting that a feature in OSM is the same thing that a record in this gazetteer refers to, even if some fields of that record are no longer accurate. If every field differs because of real-world changes and the GNIS record now refers to something only at the same location by sheer coincidence, that would be a solid reason to delete the tag at the same time you delete the wikipedia or wikidata tag.

This is a personal opinion. Those that are doing most of the heavy lifting seem to be OK with it.

GNIS isn’t a surveying program; it has never sent anyone out in the field. Instead, it ingests facts from various government programs and publications. One of these partnerships is The National Map Corps, which enlists local government agencies and lay volunteers to help keep GNIS up to date – including archived feature classes like schools and cemeteries. In my area of interest, I see edits to fire stations as recently as last Sunday.

In Indiana, a state agency comprehensively updated GNIS coverage of public safety facilities several years ago with high-quality edits. Even though these feature classes have since been archived, I have found them to be the best available resource for getting OSM’s coverage of fire/police/ambulance stations into shape. It isn’t perfect, but no external data source is perfect.

You’ve found a bit of an edge case. GNIS has a separate record for both the San Marino Golf Club and the Farmington Hills Golf Club. My understanding is that the San Marino Golf Club merged into the larger Farmington Hills Golf Club. But the mapper who sorted this out apparently didn’t feel comfortable deleting the San Marino Golf Club’s GNIS tag, so they left it on the multipolygon’s outer way. Unless locals still refer to the older section by its old name, the gnis:feature_id tag should be deleted, changed to disused:gnis:feature_id, or moved to OpenHistoricalMap. For bonus points, you could reach out to the USGS about getting feature 1623616 deleted from GNIS.

Duplicates sometimes occur, just as in OSM. It’s good to get those cleaned up. Fortunately, as far as I can tell, this situation doesn’t generalize to the rest of the 1.5 million gnis:feature_id tags in OSM.

2 Likes

A number of “hamlet” -type names were recently (circa late 8/2023) added within San Joaquin County, CA. As a long time resident of the County, and having worked in the GIS field within the County for over 23 years, and as the current GIS Program Manager for San Joaquin County… I can attest that these names are utter nonsense. These names are not in any way in use by local residents to describe the areas depicted, and only serve to confuse the map with meaningless (nay, erroneous) place names. Occasionally they might happen to match nearby road names, but are still pointless additions as supposed place names. More often, these names have entirely unknown origin, a good example might be “Urgon” (gnis id 252821) - I defy you to find a single actual human resident of the area who has ANY idea of what “area” that name supposedly represents. But there were dozens of other such names created as part of that “batch” (here are a few more near “Urgon”: Armstrong, Dougherty, East Side, Guild, Kettleman, Pearson, Peltier, Pope, Villinger, Woodlake, Youngstown), and my suspicion is that the entire changeset should be reverted.

That’s definitely common and I encourage you to do whatever cleanup you think is reasonable in this case. GNIS has a reasonable amount of clutter and really does need sifting to get all the good stuff. We made a MapRoulette task for the area I live in just to get a sense of how useful the place info was. Added some, removed some… Overall somewhat mixed. Correlating with current census data is likely better for many things.

These hamlets were imported from GNIS in 2007. The timestamp you’re looking at is merely the timestamp when the node was last modified for technical reasons (as discussed in this thread).

GNIS has an entry for Urgon that cites “Phase I”. Phase I was the initial effort in the ’70s and ’80s to populate GNIS by manually inputting what appeared on USGS topo maps at the time. But if you search for the Urgon node and look at the USGS topo layer, it shows a label for “Urgon” next to railroad tracks and what seems to be some rail-related facilities.

This was most likely the label of a railroad station, railyard, or railroad junction. The editors who copied data off topo maps often mistook railroad stations for hamlets because the topo maps apply a similar typeface to both kinds of features. Here’s one in New York that propagated to most online maps via GNIS:

https://twitter.com/saranrapjs/status/1158524372837179403

You could delete the node, since it refers to something historical that would belong in OpenHistoricalMap rather than OpenStreetMap. But there’s a reasonable chance that someone will come along later, thinking Urgon is a real place that OSM failed to incorporate from GNIS. To prevent that incorrect data from being reinserted, you can retag it as abandoned:railway=station not:place=hamlet, removing the place=hamlet tag. Sometimes Wikidata winds up with the wrong information too and also needs to be updated, but fortunately not in this case.

Also, you can contact the USGS to get this entry removed or reclassified. As the county’s GIS program manager, you might get priority over an ordinary resident writing in. This would benefit not only OSM but any other map or gazetteer that uses GNIS as a source.

2 Likes

I recently added a section to the USGS GNIS page in the OSM wiki about the issues with some of the Populated Place records in GNIS.

It’s not just train stations that were incorrectly transcribed as Populated Place records in GNIS. I have seen Mexican land grants converted to Populated Place records, as well as natural features like notable palm groves.

This is a problem that affects both OSM and Wikipedia, and Wikipedia has a write up about the issue as well. The challenge is that it’s not easy to prove that a name never referred to a human settlement, or if it did, that the name is no longer in current usage. There have been some attempts on separate occasions to clean up these “hamlet” nodes in OSM, but the cleanup takes some historical research and local knowledge.

The Populated Place class of features in GNIS is still currently maintained, which means USGS is actively working on correcting these records. If you do contact them about some of the Populated Place records around San Joaquin County, you might include historical information to confirm that the location was never a “named community with a permanent human population.”

All, thanks for the replies. I’d be inclined to adopt Minh’s “most likely was rail” interpretation, though I didn’t follow the retag suggestions. I did delete a number of such nodes (changeset 141179081) with comments to the effect that auto-import from older/suspect GNIS entries was likely the source. If someone wants to reinstate, so be it - I won’t roulette it. I’ve already begun contact with USGS (because, yes indeed, this same odd data has begun showing up elsewhere already), but will have to wait and see how effective that is. The topo quads in our area have all sorts of “odd” labels, but it’s an uphill battle to correct them.

2 Likes

Issue opened in JOSM to remove ref:gnis: #23177 (Remove ref:gnis (org change to gnis:feature_id) in the waterway tagging options) – JOSM

PR in id-tagging-schema to disallow leading zeros in gnis:feature_id fields: Disallow leading zeros in gnis:feature_id by watmildon · Pull Request #1007 · openstreetmap/id-tagging-schema · GitHub

1 Like