Why is Northumberland bordering England?

Hello, I am visually impaired and could do with some sighted help with a problem I have.

I have downloaded the relation for England id=“58447”

and the relation for Northumberland id=“88066”

I have written a program on my desktop that attempts to work out if relations border each other or are contained within each other. It generally works well but I am seeing an oddity where Northumberland is showing as bordering England and not being within England which I know it is.

The Northumberland relation turns out to be made up of 9 polygons. 7 of these, according to my program, are inside England. That is good. One polygon touches at a single node and one polygon is outside the England polygon. I am guessing one of the nine polygons is the mainland of Northumberland and the other eight are islands. But please is anyone able to put some eyes on this and explain why 7 of the polygons are inside England and one just touches and one is outside? I am struggling to find an approach to work this out myself.

Of course, it could be there is a bug in my program but it works so well in other areas I suspect there is something curious about the data I’m not catering for.

Many thanks for any help.

All coordinates in OSM are stored as fixed point numbers with 7 decimal places - that is to say the value is multiplied by 10e6 as you suggest.

Hi. I think I have found the cause of my problem. Floating point arithmetic. Of all nodes tested from the Northumbrian polygon only one fails to be inside the England polygon.

Node “5749887687” has a latitude of 55.6979774 but when I convert this to a floating point variable in my application it has the value 55.697977399999999. And I am guessing this puts it on the wrong side of the England polygon.

What I do with this information I’m not quite sure. I am a little worried it means I should move to integer arithmetic and multiply all my numbers by 10^6 before storing them as an integer. but this is a significant code change.

Question: What is the correct level of precision to work to?

The page on OSM precision suggest 6 decimal places is about as good as it gets. My problematic node above has a lat of 7dpi so changing that to 6 might do the trick also.

Quick tip: If you’re implementing the ray-casting algorithm for your point-in-polygon tests, pay close attention to the edge case where the ray passes through a vertex. This can lead to double-counting the crossing (since the vertex belongs to two segments), which will give you an incorrect result. Common ways to handle this is to count vertex-crossings as “one-half” a crossing, or to only count the crossing of the higher (or lower) vertex (in which case you then also have to handle the case where the ray is collinear with a segment).

For your use case, your point-in-polygon test should determine not just “inside” or “outside”, but “inside”, “outside” or “on boundary.”

As others have mentioned, the use of floating-point coordinates introduces the frustrating challenge of geometric robustness. One way to improve robustness is to use fixed-point coordinates. Natively, OSM coordinates are stored with seven digits of precision (100-nanodegree increments, with a resolution of about 1 centimeter). Coordinates of this precision fit nicely into 32-bit signed integers. However, for mathematical operations, you will still need to use doubles since you may otherwise risk overflow errors.

If you’re reconsidering use of a geometric library, GEOS is a solid choice for C++. Easy to build (CMake) and link, and comes with all the standard spatial-relational operations.

The obvious thing I can think of is that Northumberland is not only within England, but shares a boundary. With Scotland in this case

Do you get a similar issue with Shropshire for example?

Cheers Phil

Of the loops detected in JOSM it appears that:

  • 6 loops are islands made of a single outer way each.
  • 1 loop is a pair of nearby islands with two ways encircling both (presumably these merge at low tide)
  • 1 loop is a loop of four ways that surrounds most of the Farne Islands
  • 1 loop consisting of 186 ways that comprises most of Northumberland.

The big loop shares 17 ways with the England boundary (and Scotland). I don’t see an obvious candidate for the polygon that shares just a single node with the England boundary as the islands all seem well clear of the maritime and land borders.

Thanks for these numbers. When I look at my data I get:

The England relation has 412 ways

My version of the Northumberland relation which I downloaded last week has only 150 ways compared to the 186 ways you report but my first polygon has 138 ways which matches with the remaining 8 polygons being made up of 12 ways as you explain.

I worked out that my result that one polygon joins with the England relation at one node is a red herring. It was almost certainly caused by the way that joins to a shared way. This would have indeed joined at a single (end) node and my algorithm bails out at this point since it knows there is a border now so no need to look further.

I have shown to myself that there are 17 shared ways which again matches what you quoted.

Yet my main Northumberland polygon and the soul England polygon aren’t been captured as NL is inside England. All these numbers make a lot of sense and I now know where to focus my attention to hunt down the problem. Thanks for your info.

Can I also ask… are the island polygons inside the England polygon because the England relation/polygon includes the maritime border region?

And to answer Phil. My algorithm is trying to establish which admin level boundary relations either border each other or contain each other. So the output I’m after is that England and Northumberland border Scotland, that Scotland borders England and Northumberland, and that Northumberland is within England. For some reason though I’m getting the result that Northumberland borders England. Borders Scotland too so it ain’t completely broken :slight_smile:

Hi, I believe I need some more sighted help with my attempt to programatically get my algorithm to correctly determine that Northumberland is inside England.

To recap, I am comparing the England relation id=“58447” and the county of Northumberland relation id=“88066”. Northumberland is in the northeast of England. The England relation makes up one polygon and the Northumberland relation makes up 9 polygons. One main polygon and 8 islands.

My test is to go through every node of Northumberland and do a point in polygon test against the single England polygon.

I find that when testing node “5749887687” a projection out to the east passes through the England polygon twice which suggests it originates outside of the England polygon. This is surely wrong as all of Northumberland is inside England.

This node is shared with a node on the England polygon and I treat this as an intersection. Note that I only count it once.

Then when I am testing this node against:

On England way “28421645”

On segment made of nodes “312210301” and “312210308”

I get another positive intersection result.

I have to go on the basis that both of these results are correct.

So is there a near-miss occurring and I am failing to detect my eastward projection from my node passing through the England polygon a third time? This is what I need the sighted help for.

If anyone is able to take a look and either let me know if my first two positive intersections are wrong or the node that I am just missing - that would be brilliant.

Many thanks for any help.

For the purpose of this test, I think you have to treat the case where a node is shared by the two boundaries as a special case.

What is puzzling is that you only got this result on this particular node.

Some more thoughts,

If you consider a point that is part of the boundary (I am using boundary here in the geometry sense, not necessarily the political sense) of a polygon to be “in” the polygon, then if two area/polygon objects share an OSM a node it is inside both of them. Therefore the first thing you might do in your algorithm is to test whether both area/polygon objects share the node.

There are some cases that the ray-casting algorithm may not deal with correctly. A discussion of some different approaches that I found helpful is here.

Thanks Mike for that link. That looks well worth a deep read.

For now though I’m not trying to optimize my algorithm though this will need doing. First I am just trying to get the right result.

The reason why it only got one result on a shared way is I was focusing on just one test node in the Northumbrian main polygon. So it is totally right that it only finds one shared node with the England polygon. I am focusing on this node because it gives me an even count on my ray cast which seems wrong.

I was expecting the shared node result. Or at least, not surprised by it. Then there is the second intersection nearly a degree away to the east. Inspecting the numbers, this seems a perfectly reasonable intersection. Are you able to visually verify it is correct?

And if it is, is there then a third intersection I am just missing perhaps due to floating point problems?

Just to reiterate what has already been said, many times “weird” results in these kind calculations are due to numerical issues because you are working with numbers that share leading digits and even simple operations massively loose precision.

Try translating the polygons to 0,0 (pick a common node and subtract its coordinate values from all the involved vertices) and see if things work better.

Yes, I was able to verify a second intersection.

Without coding for a lot of exceptions, I don’t think the ray-casting algorithm is going to give you correct results in all cases. The discussions around this algorithm online seem to deal with computer graphics, where it doesn’t really matter if a point that is slightly outside of the polygon is classified as being inside it (or vis-versa).

Can you try:

This is part of the Northumberland boundary, but not the England boundary. It is inside England, and you should get just one intersection.

Is there a reason you are developing this from scratch as opposed to using one of the open source GIS libraries available for your analysis?

@SimonPoole You are correct. Calculations should be done in as high a precision as possible/practical (double precision floating point e.g.), even if the end result is rounded to a much lower precision. Also, the calculations need to be ordered to preserve as much precision as possible.

But even if all of that is done, one is still going to run into precision/tolerance issues as all numbers stored digitally have less than infinite precision. I can find illustrative examples if anyone is interested.

Having said all of that about precision, @Chessel 's issue has to do with the fact that the node chosen for the test is a member of both boundaries, and the ray-casting algorithm doesn’t work in cases like that (at least not without coding an exception).

One further complicating factor is that what appears to be a straight line in WGS84 may not be in a projected coordinate system.

Hi Mike,

Thanks for confirming the second intersection. Did you see any near-misses?

I have tried hard to code for the exceptions. For example, I don’t count an intersection with the first node of the segment but only beyond the first node up to and including the second node of a segment. And I exclude horizontal segments. So I don’t think the problem I am seeing is due to the node being shared between the two polygons as I have coded for this. I hand-crafted test data creating a number of edge cases so I could code these to work. This whole thread could hinge on whether my code is actually any good or I am up against data I haven’t catered for.

On the suggested Northumbrian node “2418742499”

I get a single intersection with the England polygon on way “28421645”

between nodes “312210401” and “312210407”

It is a straightforward intersection through the segment. It isn’t level with either England node.

Out of the 29852 nodes in the main Northumbrian polygon I get:

27200 nodes which get classed as inside the England polygon

2652 which are classed as outside of the England polygon I wonder if I should output these to a file and see if I can identify a pattern.

Why am I developing from scratch? Well, I am not aware of any accessible GIS applications. They all naturally enough display a map but do not let the user control a position on the map with cursor keys and tell you what is at or near that position via a screen reader.

I download data from OSM using Overpass and I can read the XML results I get in a text editor which works very easily with a screen reader . All the pre-processed data I find available is Shapefile format which I can’t view.

But you said open source GIS libraries. TBH, I haven’t looked that hard at libraries. Having been googling for half an hour just now Boost looks useful along with PostGIS. I have been playing around with OSM data mostly for fun but this Northumberland problem is making me think I am wrong and perhaps I should spend time looking at these options.

There are no near misses.

Good, the basics of your algorithm are working!

I would guess that those are all cases where the node is part of both boundaries. The interesting thing is that a quick query in JOSM shows that there are 3075 nodes shared between England’s boundary and Northumbria’s boundary. Probably because of the twists and turns of the boundaries some produce the expected answer.

Incidentally, you have to decide what is the answer you want here. By the above logic (node shared by two boundaries is inside both), some parts of Carlisle would be inside Northumbria (because they share a boundary) - which isn’t the case (I am in the US, so not completely sure, but hopefully you get the idea). This is why we distinguish between the interior of a polygon and its boundary. See DE-9IM.

GEOS is a C++ library
Shapely is a Python wrapper for GEOS
GDAL/OGR is another option (there is a Python wrapper for it)
Java Topology Suite

Mike, I think the penny has just dropped. I think I understand what you’ve been trying to tell me. It is a shared node problem.

On my original troublesome Northumbrian node, “5749887687” it scores 2 in terms of number of intersections. First because it is on a shared way with the England polygon, and then again because the ray out to the right hits (quite possibly another shared way) on the east coast of England though it is cutting nicely through a segment line.

If I ignored my original node then I would get a result that I am inside England.

Your statement “I have to decide what answer I want” is what made me realise. And I am not quite sure of the rule I need to apply. Possibly it is to not count a positive intersection when my test node is the same as one of the segment nodes.

Oh dear. This is making my head ache and I feel like I’m back to the drawing board. Perhaps I’ll go read that page you linked to about point in polygon testing.

Well, on the bright side, I now know the flaw in my algorithm and can try and do something about it.

Many thanks for your help and time Mike. I’ll also look at those libraries you linked to.

That is wonderful. Sometimes I struggle to explain things clearly, so I am very glad that you understood what I was trying to say.

That would work for this specific example, but if there was a shared node on the eastern side of the boundaries it would give you the opposite answer as you wouldn’t get any intersections and that would indicate that you were outside of England (note I don’t think for these two OSM boundaries there are any such nodes, but it is possible generally).

I definitely recommend using a premade solution if possible. There are a lot more “edge cases”, as well as the issue with numeric precision that @SimonPoole raised.

However, you are still going to have the issue of how to treat points that are on the boundary (aka edge) of a polygon (a point can either be outside of a polygon, in the interior of it, or on its boundary/edge - and those libraries should allow you to determine specifically which is the case). Perhaps for your application, it doesn’t matter if a point on the boundary/edge of a polygon is considered inside the polygon, outside the polygon, or if it varies from point to point.

Hi Mike,

I think I can use the info I have now to determine if two polygons have any of the following relationships:

Do not touch //All points from P1 are outside of P2

They overlap //Some points in P1 will be inside P2 and some outside P2

they border //Some P1 nodes will be shared but all unshared nodes will be outside of P2

Inside and border //There is a mixture of P1 nodes that are shared and are inside P2

Totally inside //No shared nodes and all N1 nodes are inside P2

If the test for shared nodes is done first it can then skip the ray test and this should avoid the problem of the test node being on a west vertex or an east vertex delema.

I have quite a bit of code to change but I’m confident this will work.

Thanks again for your time and patience.

Thanks for all those points about ray casting. I hopefully already have the edge case where the intersection is on a vertex covered. I am counting an intersection except when this occurs exactly on the first node of the segment. And I ignore horizontal colinear segments. I also capture an “on boundary” starting position. And now that I treat a shared node as always inside then things are going much better though not yet perfect.

I did try using 64-bit integers which I think would avoid overflow problems when multiplying two integer coordinates together but I have reverted to double precision for now.

I’ll take a look at GEOS. Thanks.