Proposal to import missing summits in Alaska

I just finished mapping a number of GNIS records for summits in the other 49 States and US Territories. With a couple of rare exceptions, these were all records for features that were not present in OSM. I was looking at doing the same for Alaska, but the total of 1700+ records is more than the rest of the entire US and more than I’m prepared to map by hand. Instead, I’m proposing to import these records without manual review.

The method I used for the other States was to use automated conflation to find GNIS records that were not in OSM. Then I reviewed each one by hand and made corrections using aerial imagery, USGS Topo, and 3DEP. That works fine when it’s 25 nodes at a time.The vast majority of records needed either a minor adjustment of the location or no correction at all. But there were rare cases where corrections were needed, and importing the data without reviewing it would mean these corrections wouldn’t be made. In general, the corrections were:

  • Tagging - GNIS classes are very broad and include things that are mapped in different ways in OSM. In the other States, I used key words in the names to identify some of the variations in tagging, then reviewed local topography to make corrections.
  • Location - GNIS coordinates are frequently off by a small distance relative to USGS Topo and 3DEP. Some rare outliers are off by a substantial distance. In the other States, I checked and corrected each location against aerial imagery, USGS Topo and 3DEP.
  • Name - GNIS has rare typos which can often by corrected by checking USGS Topo.
  • Duplicates - GNIS occasionally has duplicate records for the same feature. These can be identified by either manual review or automated searches.
  • Categorization - GNIS records are rarely miscategorized. When it does happen, it’s usually man made features being miscategorized as natural features.

For Alaska, I am proposing to import the data without the manual reviews. That would mean using key words in the names to determine tags (e.g., “Mountains” with an S implies natural=mountain_range). And I plan to do some automated checks to remove duplicates. But I will not be checking each feature against local topography, and I will not be correcting typos or miscategorizations. So, there would be some inaccuracy and a small number of errors in the import. The tradeoff would be adding some 1700+ features that are currently missing from OSM.

I have written up the import proposal at B1tw153/Missing Alaskan Peaks Import Plan - OpenStreetMap Wiki and I’d be interested in any comments people may have on the proposal or opinions in favor or opposed to the import.

1 Like

Did you also correct the ele=* tag? These were automatically generated from the National Elevation Data(NED), and are often very wrong for summits. This is partly due to the horizontal error you mentioned, as well as the courseness of the NED. 3DEP is a much higher resolution than the old NED.

Unless it is obviously a typo, I would use the GNIS. The board of geographic names is renaming many features that are offensive (but not necessarily obviously to a outsider). For example, “Mt Evans” in Colorado was renamed “Mt Blue Sky” just recently.

My experience has been that there are a lot of duplicates, particularly when a mountain is depicted on two different adjacent topo maps. This also seems to happen when a mountain is on the border between two states.

This has not been my experience.

The good news is the linked plan has no mention of adding elevation data from GNIS (B1tw153/Missing Alaskan Peaks Import Plan - OpenStreetMap Wiki).

You are correct. So is the proposal is to add summits without ele=*?

The elevation data is no longer present in the GNIS data files, so it won’t be included in this import. But I didn’t add it when editing by hand either. Sometimes you can get it from a benchmark in USGS Topo, sometimes not.

When I was working through the other 49 States, I found a total of five duplicate records. For Alaska, I intend to review records with duplicate names before importing them. That should identify most of the possible duplicates so that I can resolve them before the import.

In the other 49 States, I found a total of three records that were miscategorized: one building, one grotto, and one cape.

I did not find any typos while working on the other 49 States. However, I have seen it happen with GNIS records in other cases.

Given that there were approximately 1000 Summits added in the other 49 States, I think this level of errors is relatively low.

1 Like

Perhaps GNIS has been improved since I last looked at it, or the GNIS features missing from OSM are of superior quality for some reason. Do we know that the quality of GNIS in the rest of the nation is representative of Alaska?

I did take a look at your .osc file, and even through you indicated that features whose name included “Hill” would be tagged as natural=hill, there are 355 features whose name contains “Hill” that have been tagged natural=peak. There is also one feature whose name contains “Mountains” (plural) also tagged as natural=peak even though the plan calls for such features to be tagged as natural=mountain_range

Regarding your .geojson file, I assume that this should be the same as the .osc file, except in the WGS84 datum. Is that correct? Is this the actual file that will be improted into OSM? The only valid osm tag I am seeing in this file is name=* All of the other tags have been concatenated into the other_tags=* tag.

Regarding ele=*, if you wish to add this, I can write a JOSM script that will query 3DEP and populate it. I am not sure of the status of 3DEP in Alaska however.

So, I live in Rhode Island, which has no mountains, unless you count the state’s landfill, which is often incorrectly rumoured to be the state’s highest point.

We have many, many GNIS-imported natural=peak objects and I would say the vast majority of them are for minor hills that are so vanishingly insignificant that locals generally don’t know that they have a name. I treat these objects generally on an “okay, if the federal government says so, sure, that minor bump is ‘Mike’s Hill’ or whatever” basis. And I’m sure they come from very old maps or historical records but a lot of that data seems so faded by time and history that I’m not sure what to make of it.

2 Likes

I have taken a look at some specific features in the provided .osc file.

613 meters from likely true location based on 3DEP contours
gnis:feature_id=1406691
name=Mugum Peak
imagea

In water, and 278 meters from likely true location based on 3DEP contours
gnis:feature_id=1896821
name=Sugar Hill

In Canada, and 140 meters from possible true location based on 3DEP contours
gnis:feature_id=1414831
name=Snow Top

65 meters from likely true location based on 3DEP contours
gnis:feature_id=1410949
name=Tik Hill

Misclassified, not really a peak
gnis:feature_id=1413438
name=Little Kobuk Sand Dunes

Feature really represents two peaks, and is mislocated by 142 meters from closest of the two.
gnis:feature_id=1404480
name=Kashwitna Knobs

Feature misclassified, refers to name of a bend in the river. On the off chance that it does refer to a peak, it is 355 meters from the nearest high point.
gnis:feature_id=1398923
name=Big Bend

In the water, 258 meters to nearest bit of land
gnis:feature_id=1895983
name=Nugget Hill

1 Like

Yes. The .osc file is solely the result of the automated conflation step. I haven’t updated the tags in that file, so everything is natural=peak.

That was supposed to be the idea, but apparently I uploaded the wrong file. I have corrected that and uploaded the right file with the correct tags.

That’s pretty cool! I’d love to know how that works.

Adding elevation to the summits where that is missing would be good. I know the USGS benchmark elevations are taken using the best available sources so they’re generally pretty good. But it takes some manual effort to find them and convert them to meters. And I don’t know enough about the quality of 3DEP data to be able to say how it would compare.

But that’s definitely something we could look into, either as part of the import or as follow up afterwards.

There are certainly some errors in the data set, and if you try hard enough, you can find them. From what I’ve seen, the vast majority of the positions are within a few meters of where they should be, so I don’t think the selected examples you’ve cited are representative of the quality of the data set.

That said, I am not proposing to go through all the peaks one by one to reposition anything that’s not in the right spot.

This would be corrected when the names are searched for possible alternate tagging.

This would also be corrected when the names are searched for possible alternate tagging.

I am really not trying hard. I just loaded the file in JOSM, selected all, added to the todo list, and started stepping through. I am seeing about a 25% error rate, not counting those things that are off by 30 meters or less. I suspect that what I have reported is representative of the quality of this data.

Here is another one:
In water and 16 km from nearest land:
gnis:feature_id=1419652
name=Southwest Peak

I am not sure I understand, are you saying that this really is a peak, or that there is information that is part of GNIS that says that it really is a bend in the river?

I didn’t see an error rate that high in the nodes I looked at, but if you can come up with a way of measuring the errors that is statistically valid, that might be interesting.

Neither, actually. I’ve noted this as a record where GNIS might be incorrect, put it on a list of things to ask them about, and removed it from my working data set so that it won’t be included in the import.

With the right setup, I think this is not that hard. Turn on a USGS topo layer in JOSM, load the peak nodes into the TODO plugin, and then you can do this very rapidly. If the node is in the right spot, using the keyboard shortcuts you could easily validate 1-2 per second as you go “next, next, next” and then every so often you’ll have to stop to move a node or drop it from the data set.

3 Likes

This script determines the elevation for an individual selected node:

We could modify it to do so for all nodes where natural=peak/hill and there is no ele=* tag (and actually add an ele=* tag).

I agree! The one tweak I would make would be to use the contours from 3DEP in addition. But yes, this whole effort should be just a few hours. Alternatively, make it a MapRoulette challenge and we all can pitch in.

Be my guest! I’ve just done that for the other 49 States and I’m not offering to do it for Alaska.

3DEP hillshades show this location as being on land. Granted, it is close to the shore, but I wouldn’t rule out a small coastal hill with this name.

It would take me about 15 minutes to make this into a MapRoulette challenge, but my experience is that if I do that, I end up being the only one who works on it. If there’s really some interest in that, I’d be happy to go that direction instead of doing an import. But we need to get a few people to chip in.

For what it’s worth, while we’ve been chatting, I’ve been going through the data for the first pass of QA. I have removed 16 nodes that were duplicates, including Southwest Peak (1419652) that @tekim mentioned above. I have also looked over the coastline and removed 9 nodes that were in the water, including Nugget Hill (1895983) mentioned above, plus one that was not obviously at a place where there could be a summit.

For a set of 1700 records, that still looks like a fairly low error rate to me. I can’t guarantee that I caught everything, but the data set is better now that we don’t have those 26 nodes.

I understand there’s some concern about whether the nodes are reasonably close to where they should be. If there’s still some interest in proceeding with the import (as opposed to someone else volunteering to go through the entire data set by hand), I’ll go ahead and manually review a randomly selected set of 25 nodes and measure the offset to their ideal positions as @tekim has done above. I’ll collect data on that so we can get some statistics on the mean and standard deviation of the offsets.

Here’s the list of nodes randomly selected for QA, if you’d like to take a look at them yourselves.

1411120 Tooth Mountain
1406326 Midnight Mountain
1407507 Old Snowy
1411766 Mount Warbelow
1402475 Gap Mountain
1420911 Mount Case
1895616 Mattress Hill
1854182 The Apocalypse
1402046 Figure Four Mountain
1896915 Tebay Mountain
1409341 Sharp Mountain
1399064 Birch Hill
1403744 Iknutak Mountain
1398458 Atuk Mountain
1414592 Serrated Peak
2581565 Qaaga
1407747 Panorama Mountain
1401818 Mount Elusive
1405502 Little Mountain
1398807 Beaver Mountain
1405988 Marshall Mountain
1400650 Cony Mountain
1412814 Cloudy Mountain
1895722 Mission Hill
1894308 Faith Hill

Can you share an .osm file of the remaining data set?

Sure! This file has all the previous changes and the results of reviewing the 25 nodes above where 5 nodes were repositioned.

I kept the gnis:reviewed=no tags on 3 nodes where there wasn’t enough information to confirm that they were correct. Because of that, I’m thinking about omitting either the natural=hill nodes west of 180* or all nodes west of 180*. The data over there doesn’t look as good and JOSM doesn’t have USGS Topo tiles in that area to verify against.

Edit: LMK if that link Google Drive link isn’t working. It looks fine from here but seems to redirect to a login when I post it.