Understanding Puerto Rico way data

I have retrieved the map data for Puerto Rico with the following Overpass QL.

[out:json];
rel["ISO3166-1"=PR];
out geom;

I receive a response with 3 ways in it, which is curious because PR is just 2 islands. In looking at the first 2 ways, they both share the same begin and end point. I am stitching all the ways together programmatically, and so looks to me like an internal boundary, but it isn’t. You will notice the first way is pretty small. The second way is much larger, but they both have the same start/end point. Using my algorithm, if I see ways with common start/end points I just delete them as they are internal boundaries. This causes me to lose entirely the large island.

When I import the JSON into QGIS I get a perfectly fine PR. When I export from QGIS I’m left with two ways, which is correct.

Obviously my algorithm for stitching ways together is faulty. It generally works well, but in a few cases like this it fails. What am I doing wrong? What do I do with two linestrings with common start/end points? Is it true that parallel ways can’t happen in a single relation? What’s the logic for stitching ways together? I need to be able to do this programmatically. I don’t want to have to use QGIS, or any other external utility, to fix it.

Data returned edited for brevity.

  "members": [
    {
      "type": "way",
      "ref": 993017109,
      "role": "outer",
      "geometry": [
         { "lat": 18.4892101, "lon": -65.1825535 },
         { "lat": 18.3982670, "lon": -65.1579180 },
         { "lat": 18.3963140, "lon": -65.1575230 },
         { "lat": 18.3951220, "lon": -65.1575620 },
         { "lat": 18.3873420, "lon": -65.1572550 },
         { "lat": 18.3756450, "lon": -65.1571510 },
         { "lat": 18.3577910, "lon": -65.1575560 },
         { "lat": 18.3247310, "lon": -65.1538630 },
         { "lat": 18.3240090, "lon": -65.1541110 },
         { "lat": 18.2976090, "lon": -65.1611410 },
         { "lat": 18.2756240, "lon": -65.1667670 },
         { "lat": 18.2716390, "lon": -65.1678180 },
         { "lat": 18.2667310, "lon": -65.1689530 },
         { "lat": 18.2635880, "lon": -65.1694210 },
         { "lat": 18.1952780, "lon": -65.1713500 },
         { "lat": 18.1605520, "lon": -65.1373200 },
         { "lat": 18.1349650, "lon": -65.1120110 },
         { "lat": 18.1330236, "lon": -65.1100908 }
      ]
    },
    {
      "type": "way",
      "ref": 318735633,
      "role": "outer",
      "geometry": [
         { "lat": 18.4892101, "lon": -65.1825535 },
         { "lat": 18.4951936, "lon": -65.2042554 },
         { "lat": 18.4981530, "lon": -65.2271751 },
         { "lat": 18.4978821, "lon": -65.2503020 },

 <snip about 700 lines>

         { "lat": 18.1071516, "lon": -65.1126043 },
         { "lat": 18.1074777, "lon": -65.1125419 },
         { "lat": 18.1330236, "lon": -65.1100908 }
      ]
    },
 
}

third way omitted entirely as irrelevant.

That is the correct course of action, and should not cause damage. Unlike Shapefiles etc. “winding order” has no significance in regards to forming a polygon in OSM.

However, I again urge you to make use of one of the existing software libraries to do this. These have been tested through 1000’s of uses and should handle edge cases. If you don’t want to do that - since many of them are open source - you should examine their source code to get an idea as to how to implement an algorithm.

For debugging it might help to put (._;>;); on its own line just above out geom in Overpass Turbo, this will make all the ways individually selectable as well as showing the relation as a whole. Obviously there is a limit to how big a relation you can show in browser without slowing it down too much to be usable. Outer rings are often split either for shared boundaries (as appears to be the case here) or just to try to keep to a reasonable number of nodes per way (very common for large boundaries).

The boundary of the big island is a combination of the “Puerto Rico - USVI Median Line” and way 4422604. Because the island polygon is precisely two ways, with common start/end points, it looks like an internal boundary. So I guess the logic has to be “a way is an internal boundary if it shares start/end points with another way in a DIFFERENT relation”?

The boundary relations themselves state whether ways are outer or inner ways in their role. In the example given this has been JSON-ified to "role": "outer". To my mind if you are piecing ways together to form a boundary loop and the first and last nodes in your concatenated-loop-portion are the same then you know you have a closed loop rather than anything internal (which would have an inner role).

In the similar OSM multipolygon relations touching inner loops are permitted but I doubt this would be considered valid for a boundary. I don’t think it’s acceptable for outer ways to touch in either case.

I say all this with the absolute certainty of someone who’s never had to implement this so corrections are welcome.

The (._;>;); bit was helpful. At least I can see the individual ways. I’m still not having much luck though. I join up the ways, but I always have a few that are start-to-start or end-to-end. I don’t understand what this means or how it happens. I just reverse one of the ways and join them. This may be causing damage, but I don’t know what else to do.

When I do all my joining, I end up with lots of polygons, but most of them are inner. I’m not sure why this is. I don’t exploit the role field as I’m not too sure how this works. What if two coincident ways have a different role?

I need to do a lot more experimenting.

When you gave me the advice about (._;>;); it worked brilliantly. I now need this trick again and it’s not working for me - I don’t get the individual ways showing. My query is as below. What have I done wrong?

[out:json];
rel["ISO3166-1"=PR];
(._;>;);
out geom;