I have something I’ve come across that I’d like to ask about fixing: issues with NHD-imported waterways in Oklahoma from 2015.
The issue is: there are waterways throughout Oklahoma where the ways don’t actually connect and just end in nodes at the same coordinates. Usually, but not always, at the point of entering or exiting a body of water. All of the impacted ways are tagged NHD:FType=ArtificialPath or NHD:FType=Connector, but have no waterway tag themselves. One example: way/138067071. This way, and the two stream ways north and south of it, do not connect at all.
My proposed solution is to: merge nodes where doing so would cause the ends of existing waterways and “ArtificialPaths” to become connected. If there is also a node that would connect the waterways to the edge of a body of water, then that node is included in the merge also. If the ArtificialPath is already connected to, say, a waterway=stream, but not connected to a body of water that has a node right there (meaning someone probably already connected the waterways and left the natural=water unconnected on purpose), then no action is taken.
Then, set waterway=stream for all ArtificialPaths and Connectors when they have no existing waterway or natural tags. These are mostly, but not all, inside of existing waterbodies. They’re all connected to (or will become connected to) each other or existing streams. There are instances connecting to rivers but none of them are part of the river.
Here is the Overpass query I used to find these. The bounding box is around Oklahoma. WARNING: this query returns 233MB of data.
[out:json][timeout:1800][bbox:33.616646,-103.004057,37.0022,-94.432823];
(way["NHD:FType"="ArtificialPath"];way["NHD:FType"="Connector"];);
node(w)->.set;
node(around.set:0.05);
(way(bn);rel(bw););
(._;>;);
out body;
Then, I wrote a program to parse the information, and for each ArtificialPath, it only looks at its first and last node. For each of those, it takes the nodes within 0.05 meters and their parent ways and relations. For those nodes, it makes a decision on whether they should be merged. It merges the nodes if doing so would cause the ArtificialPath to become connected to one of: another ArtificalPath, Connector, or one of these waterways: stream/river/canal/ditch/drain. Otherwise no node merging happens.
Example
Here’s an example using a smaller area. It returns about 20MB of data.
And a zipped file containing the results, both before and after my proposed changes. (6MB, the two .osm files inside are both 35MB)
oklahoma_sample.zip
Summary
If committed, the changes I’ve put together are:
- 19,567 ways modified due to waterway=stream being added to them. 33,678 ways modified in total due to having one of their nodes replaced due to a merge.
- 15,068 node merges involving 2-5 nodes each (about 70% of merges are of 3 nodes and 30% are of only 2 nodes). 26,196 nodes to delete. 100% of deleted nodes have no tags at all.
The full Overpass results without any changes. (30.6MB zipped, 448 MB uncompressed)
oklah_before.zip
The same, with all changes applied. (30.7MB zipped, 449 MB uncompressed)
oklah_after.zip
And the changes with as many unmodified objects as possible purged to reduce the file size. (12.2MB zipped, 176 MB uncompressed)
oklah_after_smaller.zip
Does this make sense? Do you agree with connecting waterways to natural=water edges when the nodes are all in top of each other? And do you agree with leaving it alone when someone already connected the waterways and left the lake edge unconnected?
And, can you confirm creating a separate account for this and filing out a page like this is the way I should go? If I can/should do this?
I haven’t done this before, so I really appreciate everybody’s time. Thank you for any and all feedback!
Waterwaymap.org map of the whole area: