Proposal to connect NHD waterways in Oklahoma with a mechanical edit, requesting feedback

I have something I’ve come across that I’d like to ask about fixing: issues with NHD-imported waterways in Oklahoma from 2015.

The issue is: there are waterways throughout Oklahoma where the ways don’t actually connect and just end in nodes at the same coordinates. Usually, but not always, at the point of entering or exiting a body of water. All of the impacted ways are tagged NHD:FType=ArtificialPath or NHD:FType=Connector, but have no waterway tag themselves. One example: way/138067071. This way, and the two stream ways north and south of it, do not connect at all.

My proposed solution is to: merge nodes where doing so would cause the ends of existing waterways and “ArtificialPaths” to become connected. If there is also a node that would connect the waterways to the edge of a body of water, then that node is included in the merge also. If the ArtificialPath is already connected to, say, a waterway=stream, but not connected to a body of water that has a node right there (meaning someone probably already connected the waterways and left the natural=water unconnected on purpose), then no action is taken.

Then, set waterway=stream for all ArtificialPaths and Connectors when they have no existing waterway or natural tags. These are mostly, but not all, inside of existing waterbodies. They’re all connected to (or will become connected to) each other or existing streams. There are instances connecting to rivers but none of them are part of the river.

Here is the Overpass query I used to find these. The bounding box is around Oklahoma. WARNING: this query returns 233MB of data.

[out:json][timeout:1800][bbox:33.616646,-103.004057,37.0022,-94.432823];
(way["NHD:FType"="ArtificialPath"];way["NHD:FType"="Connector"];);
node(w)->.set;
node(around.set:0.05);
(way(bn);rel(bw););
(._;>;);
out body;

Then, I wrote a program to parse the information, and for each ArtificialPath, it only looks at its first and last node. For each of those, it takes the nodes within 0.05 meters and their parent ways and relations. For those nodes, it makes a decision on whether they should be merged. It merges the nodes if doing so would cause the ArtificialPath to become connected to one of: another ArtificalPath, Connector, or one of these waterways: stream/river/canal/ditch/drain. Otherwise no node merging happens.

Example

Here’s an example using a smaller area. It returns about 20MB of data.

[overpass turbo]

And a zipped file containing the results, both before and after my proposed changes. (6MB, the two .osm files inside are both 35MB)
oklahoma_sample.zip

Summary

If committed, the changes I’ve put together are:

  • 19,567 ways modified due to waterway=stream being added to them. 33,678 ways modified in total due to having one of their nodes replaced due to a merge.
  • 15,068 node merges involving 2-5 nodes each (about 70% of merges are of 3 nodes and 30% are of only 2 nodes). 26,196 nodes to delete. 100% of deleted nodes have no tags at all.

The full Overpass results without any changes. (30.6MB zipped, 448 MB uncompressed)
oklah_before.zip

The same, with all changes applied. (30.7MB zipped, 449 MB uncompressed)
oklah_after.zip

And the changes with as many unmodified objects as possible purged to reduce the file size. (12.2MB zipped, 176 MB uncompressed)
oklah_after_smaller.zip

Does this make sense? Do you agree with connecting waterways to natural=water edges when the nodes are all in top of each other? And do you agree with leaving it alone when someone already connected the waterways and left the lake edge unconnected?

And, can you confirm creating a separate account for this and filing out a page like this is the way I should go? If I can/should do this?

I haven’t done this before, so I really appreciate everybody’s time. Thank you for any and all feedback!

Waterwaymap.org map of the whole area:

4 Likes

An NHD import in south central Oregon had similar properties and I took a few evenings to download and merge out all disconnected waterway confetti. My method just used the JOSM validator. It has a validator check that will offer to merge nodes from same typed ways that are on top of one another. That import must have adjusted ArtificalPath and Connector to real waterway tags already as I didn’t have to take care of that part. IMO this kind of effort is simply completing work from the original import. I would love to see it completed.

I was curious about why you needed to look at nearby nodes and not simply nodes directly on top of one another. Looking at a totally random patch, it’s filled with stuff like this:

I don’t think I have a good way to understand why the water= nodes are all very slightly offset from the waterway= nodes but definitely seems like a nice bit of cleanup.

As to the mechanics of the edit… By the letter of the law you should probably have a separate account and doc it on the wiki. This work doesn’t look quite big enough to hit the new Rate Limits but having an import account with raised limits if useful if you’re likely to push more than 100,000 changes at once.

1 Like

I don’t have a specific opinion on your particular mechanical edit (which seems reasonable, but…). I do have experience with an import (that I didn’t do) of waterways into my county in California where waterways didn’t always conjoin streams together — sounds very similar.

Bridging my experience with this, @watmildon’s with JOSM’s Validator plug-in and your proposed import, I discovered that if I “touched” a waterway (ever-so-slightly moved a node, say), THEN the Validator would notice that the stream (nodes) were not connected and would flag them. It was tedious, but manually, it was able to finish conjoining streams to their “parent” waterway (a larger stream, river, or the ocean as my county is coastal).

That’s my feedback, I hope it helps you! Happy to dialog or answer further, though, let’s see what other feedback “trickles” — ha! — in here.

1 Like

Thank you both! The validator is interesting. If I just set the 19k ArtificialPaths to waterway=stream and run the validator, it does find 596 instances of “waterway duplicated nodes”. Unfortunately for me the “fix” option is disabled due to it not considering the area fully downloaded, due to my using Overpass to load the data. I am making sure everything connected to the nodes-to-be-merged is loaded, but JOSM doesn’t know that.

The validator does find 30,206 instances of “Way end node near other way”. Divided by two, that’s about the 15,086 merges I was doing. If I cheat and manually add a tag to my osm file to make it think the whole area is downloaded, it enables the ‘fix’ option, but only for the 596 “waterway duplicated nodes”. I guess the nodes have to all be part of modified ways for it to notice them? That’s like what you mentioned stevea.

Of the 15k node merges I did in the file I linked in the original post, about 4.5k are for nodes at identical coordinates. Then about 1.5k are less than 2cm apart. 2.7k between 2 and 3 cm. 2.8k between 3 and 4 cm. And 3.5k between 4 and 5 cm. After all of that, the validator finds 44 instances of “Way end node near other way”. Some of them are false positives, like a conneting waterway that wasn’t included in my query. But some apppear to be waterways that do need connecting, they’re just more than 5 cm apart. These I’d fix manually before uploading.

I’m going to keep looking at the data some more, and work on writing up the wiki article I’ll need to go forward.

!Thank you!

2 Likes

I created the wiki page: Mechanical Edits/AutoMatt/Connecting NHD Waterways in Oklahoma - OpenStreetMap Wiki

If I’m confident in the changes, can I start making them? I don’t mean to jump ahead, if that’s the case.

Thank you!

I think it’s find if you get started. I would be surprised if you could do anything we couldn’t figure it out and get things reverted back to a reasonable state. Definitely let us know how it goes or if there are snags/things we can assist with.

Great, thank you!

Duplicate data layer, convert to GPX, download data along GPX?

Oooo that’s interesting. I suspect you’ll still end up with something enormous as these features span an entire state but worth a try!

one option is to try doing it in sections, rather than everything at once

Unfortunately the validator only catches nodes directly on top of one another. Not ones that as veeeery slightly not on top of one another.

Matt, downloading from OSM’s data “along an edge,” whether it is horizontal (sorta easy if “just a latitude line”) or vertical (sorta easy if “just a longitude line”) can and is done in circumstances like this. (Import plus a validator check being run).

It can get challenging with a GPX or an irregular boundary / edge (like a river, a county / state / country border along a river or coastline…). Downloading those is piece by piece: not too much that you choke your editor, not so wispy-thin little that you miss something. “The Goldilocks amount” comes with a bit of practice. I have confidence you can do it, but do know it can be slow and tedious. I find when I’m doing these (less and less, but I’ve done a number of them over the years) that what is rewarding is getting through it all without mistakes and “it looks right.” (And IS right, to the extent you can, should and do QA your work or somebody else helping you does). A pretty “high bar standard” in OSM is when at least two people have “looked over each other’s shoulders and checked each other that the work is both complete and correct as specified.” OSM likes that, as it paves solid road for this process to be repeated in the future as updates in the real world arrive (and they do). This is what I (sometimes) mean when I say that “wiki chases data chases wiki chases data,” but in this case there is some wiki involved as an import proposal. So far, so good.

It might seem slow going, but you are listening, learning, doing a lot of correct things here (if not everything, but I am not in your head and can’t detect how you learn). I like the progress I see, so keep up the good work! This is how our data really improve with imports: when we listen, learn, follow the community guidelines, get some hand-holding at a place like here when there are questions, the answers arrive, the learning continues and all the pieces come together.

That is really interesting, thank you! I haven’t interacted with this feature of JOSM before.

Thank you, I really appreciate all of this!

1 Like

This is disputed see Should river lines be mapped through lakes, estuaries, gulfs, and other large water bodies?

Thank you skyper, that is fascinating!

At a minimum, all of the ways I’m touching will still be tagged as ArtificialPaths, so they could still be deleted in the future if that is what is decided. One issue with this particular import, is in some places the ArtificialPath is sticking out of the body of water before meeting a stream. A solution could be to connect the existing streams to the body of water and delete the ArtificialPath. But since these ways already exist, I thought it safer to connect and re-tag them instead.

But if I’m wrong, re-tagging or deleting the ways inside water bodies will still be possible.

There’s no consensus for removing such waterways OR for discouraging folks from adding them.

For the United States, it is absolutely common for small water bodies that have a river/stream that goes “through” them also have corresponding waterway= ways. There’s contention about specific cases for large bodies and how best to tag things in various cases but I don’t think this specific work goes against any pattern common to where it is being done.

I did it! Or, I did what what I intended to do, the ways are now connected. Here are the changesets.

Thank you all for your patience with me. I hope I didn’t go too far with my activity.

I also created a river relation for fun.

One thing I didn’t do was node merges to just connect waterways to instances of natural=water, when there wasn’t another waterway to connect to also. Maybe there’s more connections to make? For instance: way/80114906 runs between two small ponds, but connects to nothing. Its ends are 4.3cm and 1.8cm away from nodes belonging to the ponds.

When you’re merging waterways with “ponds,” yeah, I’d join nodes that are mere cm apart. This is what I did, as I mentioned I’m coastal, as many streams (eventually) drain to the Pacific ocean. These kinds of questions are good ones, as we are “saying out loud” that it is a good idea for the geometry to follow the geography where warranted. In the case of “waterway to pond” or “waterway to ocean,” (coastline, actually) yeah, join 'em; it is warranted.

Justifying things as “flows into this” or “drains that” are real sayings that humans say about waterways helps us do this. One thing in OSM that is what I’ll call “allowed and encouraged” is to “make data smarter” like this where we are smart enough as a human to do this. In this place, you are likely nodding your head yes that you, too, can give yourself permission to be smart like this, as you and we are together.

Thanks for asking: join 'em.

1 Like

I noticed that the northwest portion of the confetti blob is still very confetti looking. Is that expected? I may have time to poke around next week if you haven’t yet.

1 Like