Mass remove `gnis:created` and similar tags? [final version presented]

It would be very noisy due to perfectly legitimate OSM edits. However, if the tool can flag cases where the GNIS feature has moved since the import but the OSM feature has not, that would be much more useful. We would need to track down a copy of GNIS as of 2009. It sounds like @Kai_Johnson might be already working on something like this?

Another possibility would be to compare OSM to The National Map Corps, which has been the source of quite a few of the improvements to GNIS over the years. I think TNMC tracks edit history, unlike the usual GNIS distributions.

1 Like

Yeah I’m not sure. I don’t mean to stop this process, just eliminate an edge-case, if you will.

Looking at another example, way/289927171, it’s a building that used to be a church and is now office space. So the gnis:feature_id got removed when the name was changed.

Different example, node/358638088, they took the gnis:feature_id off but left the gnis:created date…

What about stripping the tags but adding something like note or fixme and “This object had GNIS tags but no feature ID, confirm if there is a record in GNIS that applies here.”

I’m just spouting ideas. Thank you for working on this.

1 Like

In the infinite timeline of synchronizing GNIS/TNMC and OSM, even things we strip tags off of will eventually be found and re-tagged. I don’t worry too much about various edge cases being mangled and half formed objects are even harder to audit in my experience.

The software Kai runs (we jokingly call it RecoGNISer) can already spit out an extremely large set of work items. It can do it at a rate that is much higher than we have editors to review. The trick is finding the most impactful (or just interesting!) things to get our eyes on.

1 Like

This was exactly the premise that got me started on the project that @watmildon mentioned.

I’ve looked at comparing OSM versions to GNIS versions and that turns out to not be very productive because OSM data changes for all sorts of reasons. It’s more productive to compare current GNIS records to current OSM data. That’s what my tool does.

Here’s an example of the output as collaborative MapRoulette tasks for Mono County, California. The tool can also output OSM Change XML files for bulk edits, or a table of raw matching results for more manual verification.

As @watmildon noted, there’s easily more output to review than people to review it. The challenge is finding ways to get people engaged in working on these updates.

1 Like

I feel I should add that there are some objects that have only a gnis:ftype tag and no other tags to describe what they are.

I found some random, seemingly discardable ways in this area i’m editing and found in the history that they are underground pipelines.

Way History: 48255676 | OpenStreetMap

Way History: 48255677 | OpenStreetMap

Way History: 48255672 | OpenStreetMap

1 Like

These came from NHD rather than directly from GNIS. There an ongoing effort to align the key for an GNIS feature ID across the various imports, but nothing about the feature code/type key:

NHD was not one import but many, imported watershed by watershed. Some of them had very poor tagging. I guess this one slipped under the radar.

2 Likes

I found Key:gnis:ftype - OpenStreetMap Wiki and Key:gnis:fcode - OpenStreetMap Wiki

This are typoed keys, imported actually from NHD.

I will not remove them unless people will want otherwise.

(I am still looking through GNIS tags in my spare time)


And other group of tags: I plan to skip them, though looking at them may make sense. It seems there are many ways how “two GNIS ids apply here” was solved, not sure is there some established solution here.

        "alt_name:gnis:feature_id", # https://overpass-turbo.eu/s/1JzS
        "alt_gnis:feature_id", # https://www.openstreetmap.org/relation/7132203 https://www.openstreetmap.org/relation/274921
        "gnis:id_2", "gnis:id_1", # why has https://www.openstreetmap.org/node/150952282 both? I will create note if noone will investigate this
        "gnis:feature_id_1",
        "gnis:feature_id_2",
        "gnis:feature_id_alt",
        "gnis:feature_id",
        "gnis:feature_id2",
1 Like

The correct way to do it is to use semicolon separated values for the gnis:feature_id key.

It is extremely rare that a single element in OSM should have two valid GNIS Feature IDs, but there are some common cases where mappers cause this:

  • Two features that are distinct have been incorrectly conflated. This happens frequently when people incorrectly combine the Populated Place, Civil (boundary), and Census (boundary) records from GNIS into a single element in OSM.

  • Radio and TV transmitters are often conflated with the tower structure on which they are located, and this results in several GNIS Feature IDs being assigned to a single element in OSM. In general, we need a better way to map these features that fits with the One Feature, One Element guideline for OSM.

  • Sometimes GNIS incorrectly has duplicate records for the same feature. This is better handled by reporting the duplication to USGS so that one record can be deleted rather than putting both GNIS Feature IDs on a single element in OSM.

I’ve sometimes seen multiple feature IDs for a long waterway, one per county. I figured that was by design, but should I be reporting situations like that?

That seems odd. I don’t think I’ve come across anything quite like that. I would definitely report it, but I would check first to see if some of the IDs have already been retired.

1 Like

It looks like many of these cases are instances where the “official” name for the feature in GNIS differs from the name that the mapper decided was the proper name for OSM. And then the mapper put the GNIS name under the alt_name key and prefixed the gnis tags.

There are many legitimate cases where real world features can have more than one name, and it’s easy enough for GNIS to disagree with other sources about which name is the primary name (although GNIS does record alternate names for features). If the name in GNIS is not the name on local signage or not the name in most common use, I would put the GNIS name under the alt_name or official_name key.

But the gnis:feature_id tag doesn’t need a prefix in that case. The ID is associated with the feature, not with the name. It’s still the right feature with the right ID, even if there’s a disagreement about which name is best.

Some examples:

This Wikidata constraint violation report shows the magnitude of the problem, though some of it is the result of overconflation on Wikidata’s part.

The two river examples are definitely duplicate GNIS records. In both cases, one record covers a short distance of the length covered by another record, so the spatial extents clearly overlap. I would report those as duplicates.

In fact, I have a working list of corrections to report to USGS, and I can take care of those two if you like.

The Wikidata constraint violations are clearly problems in Wikidata, but they’re not necessarily issues with GNIS records. For example, the first item on the list, Hagåtña - Wikidata is an entry for a village, but the two associated GNIS Feature IDs are for distinct nearby beaches (one “West” and one “East”).

I think the correct GNIS Feature ID for that Wikidata entry would likely be 1389443 for the village boundary and not 1797402 for the municipal boundary, but I don’t know enough about the administrative hierarchy in Guam to sort that out.

[Edit]

The second item in Wikidata’s constraint violation report is an example of conflating Populated Place and Civil records in GNIS. That’s Maplewood - Wikidata. We should have a place node and boundary relation in OSM to keep those two features separate. I don’t know how Wikidata properly handles that case, though.

[2nd Edit]

Dedham - Wikidata conflates all three: Populated Place, Civil, and Census (i.e. CDP).

2 Likes

Sure, I’d appreciate it.

Ideally, Wikidata would also have two separate items, one for the human settlement and the other for the township. But since that’s a lot of work for what most other sources conflate, this error is commonly resolved by adding a subject has role (P2868) qualifier to both statements, as seen in the item for the neighboring township of West Orange (Q932601).

1 Like

I did notice that in the Wikidata entry for Dedham. But then it still shows up in the constraint violation report.

The qualifiers are well-established as an exception to the constraint, as documented on the property’s talk page. I’m unsure why the report ignores the constraint’s separator (P4155).

This QLever query finds 557 violations of this constraint, taking the separator into account. The set includes a high concentration of towns in Maine and New Hampshire.

Maybe if there are multiple GNIS Feature ID (P590) values then all the values have to have the subject has role (P2868) qualifier?

I noticed that one of the Feature IDs for Dedham did not have the qualifier.

1 Like

+1, makes sense (though as it is not removal it will be no handled as part of this bot edit, if this tagging irritates anyone I would encourage them to fix it)

oh definitely!

I thought I might just fix the tags, but based on the first element I looked at, there may be some more complex issues.

Relation: ‪Glade Creek Reservoir‬ (‪3881986‬) | OpenStreetMap was tagged with alt_name=Beckley Water Supply Number One Lake and alt_name:gnis:feature_id=1559275 but that feature is actually 2.5 miles to the west (and apparently no longer present). The correct ID for Glade Creek Reservoir is 1539425.

I’ll take a look at the other features with the alt_name:gnis:feature_id tag later.

The discussion here prompted me to send another set of updates to USGS. I’ve included Middle Fork Salt Creek and Middle Branch Shade River in that list. It usually takes a month or so for USGS to reply, but when they do I’ll go back and update the features with the correct information.

3 Likes

I fixed all the features with alt_name:gnis:feature_id. I think a lot of what happened was that old imported reservoir and dam nodes were merged with the wrong features.

3 Likes