Why does a fraction of a millimeter make such a massive difference in the result of my Overpass query?

Edit: Current status: I’ve found a workaround, but am still puzzled by the underlying issue and would like to know (a) why the query uses so much RAM — what’s going on under the hood — and (b) whether there are ways to increase the efficiency of what I’m doing, RAM-wise and/or time-wise.
Original post below:
Hi all, I’m brand-new to OSM and to Overpass queries, so I’m wondering if there’s something I’m missing here.
I’m trying to use Overpass Turbo to find the nearest point on land to arbitrary latlong coordinates, and the method I’m using (which I realize is pretty clunky and doesn’t scale, but it’s a start) is to query for natural points around the selected coordinates, increasing the radius if there are no results, and decreasing the radius if there are too many. So an example query looks like this.

nw
  [natural]
  (around:1200901.1386,23.59847826,-131.3540059);
/*added by auto repair*/
(._;>;);
/*end of auto repair*/
out;

Now why on Earth am I using four decimal points of precision for a value that is measured in meters, you may be asking. That’s because, for this particular set of coordinates, that level of precision still isn’t enough to find an intermediate value between “no results” and “running out of RAM.” 1200901.1386 causes it to run out of RAM, whereas 1200901.1385 returns no results.
This initially caused me to suspect that there might be some very complicated land feature at that distance from the chosen coordinates. But two things indicate against that (at least from my perspective as a novice): first, as far as I can tell, there’s no land within that distance from the given coordinates (and WikiNearby doesn’t suggest any interesting undersea features either); and second, the same problem seems to arise when I remove the “natural” restriction — except at a slightly different distance. Granted, I haven’t tested that query to four decimal points of precision, but between 1182 and 1183 km, it goes from 69 nodes and 3 ways (the Pacific garbage patch, two lines of longitude, two nuclear waste dumps, and two nuclear explosion sites) to running out of RAM at 2048 MB.
This is the 19th set of random coordinates with which I’ve done this, and it’s the first time I’ve run into this problem.
Does anyone know what might be causing this (and hopefully how to fix it)? Many thanks.

1 Like

Look up maxsize & timeout.

Is it essential to use around? bbox is more efficient.

What are you expecting to find within that area? My results indicate it’s devoid on natural features.

1 Like

Thanks for the reply. What I was expecting to find in that particular radius was indeed nothing, which was why I was confused by the query hitting the RAM limit.
I’m aware of maxsize and timeout; what I’m wondering is why the query hits the RAM limit when there’s nothing there actually being captured by it — which I’ve since confirmed by increasing maxsize, which allows the query to complete, revealing that what I assumed to be too large of a return is actually still empty, just somehow using up large amounts of RAM along the way.
Though I have this workaround (increasing maxsize), I’d still be interested to know (a) why a query that returns nothing uses up so much RAM, (b) why it does so at such a specific threshold, and (c) whether there’s a way to reduce the amount of RAM it uses.
I did read on the wiki that bbox is more efficient, but the reason I’m using around is because I’m interested in finding which points are closest to a given basepoint, not which points lie within a given rectangle. Is there a way to use bbox to streamline the query, while still ultimately achieving the goal of finding the closest points? (For that matter, is there a feasible way of automating the search for an appropriate radius, rather than trying various radii one at a time?)
Incidentally, the wiki suggested that Overpass Turbo put a hard RAM cap at 2 MB (which is why it took me longer to try increasing maxsize than it would have otherwise):

Important notice: Recently, a new mechanism was introduced to abort queries exceeding 2 GB of memory. This exact size of this limit is still under discussion and might change over time. If you experience error messages like “runtime error: Query run out of memory using about 2048 MB of RAM.”, be sure to read this thread as well.

(The linked thread appears to have fallen prey to link rot.)

Yet you never mentioned them.

You never mentioned that.

Then why write the routine?
You sound like another time waster , so I’m out.

2 Likes

No need to be rude, especially since you apparently haven’t read parts of my post. If you don’t feel like reading everything I’ve written, then please just say that instead of acting like I’m in bad faith.
For one thing, I mentioned in the second sentence of my post what I was trying to do:

I’m trying to use Overpass Turbo to find the nearest point on land to arbitrary latlong coordinates

If you continued reading that paragraph, you would see that there’s a reason for me to be making the query. If you’re interested in finding out why, I invite you to read the original post. If you find it confusing, please ask for clarification, rather than accusing me of being a “time waster.”
Perhaps I should have mentioned maxsize and timeout in my initial post, except that timeout struck me (and still strikes me) as irrelevant, given that my problem involved hitting a RAM limit; and I had been under the impression that I couldn’t use maxsize to increase the RAM limit, based on the wiki passage I quoted above. I also improved my knowledge between the time I wrote the post and the time you made your reply, hence why I mentioned a workaround in my reply that I hadn’t yet discovered when I made the post.
I’m not sure why I’m going to this level of effort to defend myself when, from where I stand, it seems quite likely you won’t even read much of this. I guess it gets my dander up when someone throws unjustified mud at me.

  1. no matter what limit is, at some point it will be triggered - you can dig down and find exact limit like you did here in any case where you have limits

  2. you can try with tool more fitting for this job - or semi-manual analysis in QGIS - what would be outcome. Maybe there is some horrific natural object that would be found? Or some other complex geometry?

Though (1) seems more likely to me - there is some limit, this specific query is over it. Free public services need to be quite aggressive while limiting overuse. Personally I am kind of amazed that Overpass works as is.

Personally, I found around to be quite likely to end in query being killed - unless on a tiny area. But it is just a general impression.

1 Like

Thanks for the comment! Upon further investigation, it is apparently possible to use maxsize to go past the 2GB limit on Overpass Turbo, and it does look like (1) and not (2) — when given enough memory to complete the query, it still comes up empty. At this point, what I’m wondering is how the service performs the search. What goes on “under the hood” that causes a search to take up that much memory when it’s not actually finding anything? I had assumed it would only need as much memory as it took to store the objects it actually retrieved.
Correspondingly, I’m wondering if there’s any way to reduce the memory required. It does seem that around takes up quite a bit of memory, and I’m having to construct my radii carefully to minimize the chances of the query being killed.

You can convert your circle into an equivalent BBOX:

[bbox:12.748765168002006,-143.0938377516025,34.433239671297464,-119.6141740483975];
nw[natural];
(._;>;);
out;

But the nearest thing this returns is 1568 km away from your point.

EDIT: Turns out you can mix a BBOX with your original query and it works:

[bbox:12.748765168002006,-143.0938377516025,34.433239671297464,-119.6141740483975];
nw[natural](around:1200901.1386,23.59847826,-131.3540059);
(._;>;);
out;

There was a recent discussion about lat/lon accuracy in OSM which stores at 7 decimals. Several computed as about 11mm. Fractions of mm would therefor be doubtful.

The software is Open Source, so you are welcome to figure out as much as you want about the why.

The around feature is not really geared for distances beyond 10km.

What most likely happened is that at some point the strategy of the database switches between

  • filter all objects within the geography for having this tag
  • filter all objects with this tag globally for whether they are in the geography

The software then figures out that there are too many objects globally with this tag and quits due to expected overuse of memory.

5 Likes

Thanks very much! I’m marking this as the solution, because the only way an answer could be more complete would be if someone personally volunteered to hold my hand through a thousand lines of code. As it is, that’s my cross to bear. If I muster the strength to get to the bottom of it, I’ll edit with more detail.
Incidentally, further experimentation has led me to believe that it’s not actually capturing a bunch of objects and overloading memory that way, because I constructed a query where it’s explicitly operating on an empty set, and it still runs out of RAM. My hazy impressions from a first pass over the code suggest that it might be doing something recursive over the geography, but I could be mistaken. It does seem like the issue won’t be fixed by limiting the number of objects it’s working with.

1 Like