Proposal to import missing summits in Alaska

Kai_Johnson · December 20, 2023, 10:48pm

If it’s not clear by now, I’ve parked the plans for this import.

Different people have different ideas of what the acceptable level of accuracy of data in OSM should be, and that’s fine. But in the context of this import, it seems that only a manual review of each and every feature to be added to OSM would be acceptable.

@ZeLonewolf or @tekim, if you have specific criteria that you would use to judge whether data to be imported is acceptable, I would be interested in your opinions. But as far as I can tell from this thread, only manually reviewed data would be acceptable.

ZeLonewolf · December 21, 2023, 3:35am

Oh, I hand-reviewed the data set, I would have just hit the upload button were it just me. Maybe I missed a few but I felt pretty good that it was close enough.

Kai_Johnson · December 21, 2023, 4:24am

Lol! That’s easy to say after you’ve hand-reviewed the entire data set and made corrections!

But would you have uploaded it before that?

ZeLonewolf · December 21, 2023, 4:31am

No, I would want to sanity check things to make sure I have confidence that the data is sane. I think 100% is an unreasonable threshold on a large data set, but if it’s small enough to be manually reviewed, I don’t see why you wouldn’t do that (and since I bragged about how easy it was it pretty much meant I had to put my money where my mouth is). If it’s huge, I’d want to sample enough points at random to gain confidence that the data set makes sense.

Kai_Johnson · December 21, 2023, 4:47am

So, this was the absolutely smallest data set I could have considered for an import. But there are similar data sets that could be 10x or 100x the size.

If you were reviewing a larger data set, how much would you sample and what would your criteria be for determining that the data set was suitable?

tekim · December 21, 2023, 5:07am

I don’t know what happened, but it seems in many cases the original dataset was better than the reviewed dataset.

tekim · December 21, 2023, 5:13am

National map accuracy standard for 1:24k maps is 12.2 meters with 95 percent confidence (l said 90 before, but was mistaken). Let’s double that and round up to 30 meters (because it is Alaska). USGS has a document on how to calculate, will try to find it and post.

Kai_Johnson · December 21, 2023, 5:36am

Different people have different standards and different techniques for mapping. And that’s OK as long as we can discuss it and agree on things without getting into fights about it.

That’s a pretty high bar. Maybe USGS has that as a goal for new measurements, but I can say for sure that their existing data isn’t that good. If it were, I wouldn’t have a running spreadsheet of corrections for them. Haha!

ezekielf · December 21, 2023, 2:02pm

I agree. OSM has always benefitted from iterative improvement. Mapping with rough accuracy as a first pass has always been accepted. As time goes on, the accuracy gets improved. For some of these peaks maybe that won’t happen for a long time. If that’s the case it will be because no mapper cared enough about the exact location of peaks in a very remote place, and that is ok.

tekim · December 21, 2023, 2:43pm

Where do you suggest the bar be placed then? 45 meters?
Can we at least agree on the methodology? Here is the USGS document: https://www.fgdc.gov/standards/projects/FGDC-standards-projects/accuracy/part3/chapter3 See appendix 3-B for example calculations. We do need to determine the sample size. The example uses 25 for a topo quad, I have seen a case where ASPRS used 300.
I don’t think 30 meters is too high a bar. For example Maxar claims “5 m CE90” Accuracy | Satellite Imagery, the USGS has stated in a conversation at a conference that the horizontal accuracy of 3DEP is “sub-meter”

In my opinion, bad data is worse than no data. Having no data may motivate another mapper to devote the time and effort to add more accurate data, while having the data in OSM may suggest “its done, don’t worry about it”, and may mislead a user. In any event, my view is we should have a high bar for imports. Keep in mind that without review, this import would have introduced peaks that are not even located on land.

We should be able to agree on what is closer to likely truth, and what isn’t. 3DEP is going to be a better source for the position of peaks than the old topo maps in most cases, which may be misregisterd, as I think you have pointed out. Note, 3DEP Contours are now available in JOSM. I did start working on a plugin to do a contrast stretch of the actual 3DEP data (not the hillshade), but haven’t made a lot of progress. Without the contrast stretch the screen just appears grey.

ezekielf · December 21, 2023, 3:04pm

I agree is some cases but not in others. Incorrect tagging that can go unnoticed is worse than a lack of tagging. For example lots of roads tagged with incorrect surface, speed limit, and access values would be worse than if those attributes were missing. On the other hand if the roads were correctly tagged, but the geometry was inaccurate (while still being topologically correct) then I consider that an improvement over no roads at all.

I don’t believe I or anyone else has suggested importing this data without review. In fact a review has been done. Although it apparently didn’t meet your standard.

ZeLonewolf · December 21, 2023, 3:24pm

Precision guesswork.

That is also my assessment. Since I originally complained that this data set was small enough for manual review, and got called out for that, I performed that manual review, admittedly quickly and going primarily off USGS charts and some guessing about what GNIS intended or which of multiple peaks count as the true peak and that sort of thing. But what I did is the level of review I’d do to satisfy myself.

I think that the easiest way forward is for @tekim to simply complete the review, make any adjustments he deems necessary, and then press the upload button.

Kai_Johnson · December 21, 2023, 9:40pm

This was my original assumption when I thought about importing the data without manually reviewing each of the elements. We are talking about remote peaks in Alaska, after all.

That’s stretching a bit. These things were caught and removed in the QA process which was always part of the import plan. I’d also like to point out that there was a subset of the data which did not pass the basic QA tests and would have been omitted if the import had proceeded.

Friendly_Ghost · December 22, 2023, 5:17pm

This import is the best source we’re going to find, and no one is going to map all Alaskan summits based on local knowledge. At some point the data is just “good enough,” and therefore I agree with @ezekielf. Mapping in OSM is an iterative process, and it’s not at all like we only have one shot to get it right.

tekim · December 22, 2023, 6:08pm

It may be the best source for names, but it probably not the best source for locations. For locations of peaks, the best source we have is probably 3DEP.

But the question shouldn’t be whether it is “best”, but rather if it is the “best and acceptable.”

You are probably correct, but someone may wish to individually review the GNIS data against 3DEP and make the necessary adjustments.

My understanding was that the data was “ready to go”, with the exception of a small amount of sampling. If this wasn’t the case, I should have held off on my review of the data until it was ready.

That was the original proposal if I understand correctly - that is not to individually review the peaks.

I have suggested some standards, but other than people saying that the “bar is too high”, no one has come up with a concrete standard. I would hope that we could at least come up with a metric to use (I suggested one from the USFS), and then have a debate as to the threshold. I have suggested 30 meters, or perhaps event 45 meters, but even if it is something else, at least we would have a concrete go-no go criteria. What threshold must be exceeded before the data would be considered not suitable for OSM import?

My review wasn’t rigorous (I didn’t keep exact stats, it was rigourous in determining horizontal position), but of the ~100 features I initially looked at, about 25 were in error by more than 30 meters, with many being many hundreds of meters in error, and one being at least 14km in error (and in the ocean). Note the question to ask should be “is 25 errors out of over a thousand features unacceptable”, but rather “is 25 errors out of 100 acceptable” as I only looked at 100.

I think OSM should be able to do at least as good as the USGS’s claimed accuracy for 1:24K maps (even if they themselves don’t seem to meet it), afterall, much of our mapping is probably at the 1:5K scale or larger (i.e. more detailed, more accurate). To give the benefit of the doubt, I have suggested 2x or 3x the threshold.

I can setup a MR challenge, and can do a significant part of it. @Kai_Johnson can you send me what you consider your “best” file at this point?

tekim · December 22, 2023, 7:47pm

Woops, should have been USGS , not USFS. Sorry

Kai_Johnson · December 22, 2023, 10:54pm

The import template has instructions to post the source data set as well as the final import data. You were looking at the source data. Sorry if that wasn’t clear, but I did mention that I was still working on QA. I wasn’t going to post a final set of data unless there was some agreement to proceed.

Just for clarity, I was not proposing to manually review each individual peak for the import. I did propose to manually review a subset of the data to confirm that it was reasonably good and to remove anything that obviously shouldn’t be imported (like duplicates or peaks in the ocean).

The tool I used for conflation also builds MR challenges. I’ll set one up for you.

tekim · December 22, 2023, 11:36pm

I presume this will be a “cooperative challenge”[0]? Can we batch ~10 to ~20 peaks per group (not sure of the term MR uses for this)? There is a bit of overhead with using MR and grouping peaks together will spread that out. Can those peaks be grouped spatially?

Sorry for making that assumption. I should have paid closer attention.

Kai_Johnson · December 23, 2023, 12:03am

It would be a cooperative challenge.

Batching peaks into spatially clustered groups would be really nice! I suppose that’s possible in theory, but sadly, the tool can’t do that and it would take some major architectural changes to get there. So, it’s all one-by-one for now.

tekim · December 23, 2023, 12:17am

I am working on an idea…