Overpass query. Absence of [maxsize] returns significantly smaller results

maki1990 · February 1, 2023, 1:59pm

I have two overpass queries.

node(33.68336,-117.89466,34.14946,-117.03498);
way["highway"~"motorway|motorway_link|trunk|trunk_link|primary|primary_link|secondary|secondary_link|tertiary|tertiary_link|road|residential|service"](bn);
(._;>;);
out;

The query above returns an osm.xml file that is 167.306 kb big.

[out:xml][maxsize:2000000000];
(
	node(33.68336,-117.89466,34.14946,-117.03498);  
	way["highway"~"motorway|motorway_link|trunk|trunk_link|primary|primary_link|secondary|secondary_link|tertiary|tertiary_link|road|residential|service"](bn);
	(._;>;);
);
out;

The second query returns a file that is 618.994 kb big. Why does the second query return a significantly bigger result? Does the first query not give me the full dataset? Is there a way to get the same result with both queries? (The absence of [maxsize] sometimes leads to an error…)

mmd · February 1, 2023, 4:16pm

The second query has the union operation in lines 2 and 6, which means, that all nodes in your bounding box will be included in the result, not only the ones matching your highway=motorway, … ways.

As a result of that, the second query will return much more data.

utilisateur · February 1, 2023, 5:34pm

You might also want to check the maxsize documentation.

It’s not about the size of the results, but of the RAM needed for the query. It will crash if you get to that limit.

Regards.

mmd · February 1, 2023, 5:36pm

Well, what you’re really getting is an error message that the query has exceeded the permitted memory limit for a query. It will not “crash” as you wrote.

maki1990 · February 2, 2023, 9:29am

Ah, I adapted some example that I found in the docs, not knowing that the brackets will be interpreted as a union operator. It makes sense then. So all I need to do is to omit the brackets to get all nodes that are part of a way in that bounding box, right? Thank you so much for your help!

maki1990 · February 2, 2023, 9:32am

Yes I know. I ran into this error when I selected a bounding box that is too big. So I added the maxsize parameter AND the brackets, not knowing what the brackets do. There is no way to know how big the size of the download will be, right? Would be great if I could have a progress bar in my application for the map download.

utilisateur · February 11, 2023, 12:10pm

Indeed, sorry for the simplistic wording.

Overpass-turbo does have a nice waiting dialog, with alerts when the download might be too big for your browser. Of course only when the query has run, you can know the amount of data that will be returned. But it seems possible to know the data size before downloading it.

Be aware, then, that download speed depends on a lot of factors, but the size should never be that big, so not so long to get.

Whereas the run time of the query will depend on the complexity of the query (regexp for example) and the size of the area to search (or bounding-box). It can get really long if the query is a bit complex.

Best regards.

mmd · February 11, 2023, 9:14pm

The data is extracted on the fly, so in general it’s not possible to know upfront how large the download will be.

The popup in overpass turbo is shown only after downloading (some) data, to avoid killing the browser by showing too many objects.

utilisateur · February 19, 2023, 1:13pm

I’ve checked my network console, and in fact overpass-turbo does download all the data, and then asks if it should try to render it.

As a side note, gzip compression seems to work really well on geojson data, 21MB transferred for 146MB uncompressed (and the dialog says “approx 200MB”).

The timing is interesting as well, 13 seconds waiting (server-side computing), and 15 seconds downloading, for a really simple query, just getting all highways on a region-wide bounding box.

I kind of remember another (earlier) warning, but I can’t trigger it now, so it’s probably just out of my imagination…

Regards.

drolbr · February 20, 2023, 7:26am

There is no request in the log files (of 2023-02-18 or 2023-02-19, not looking further back) that matches your description.

Thus, a couple of performance tips: the delay on the server side comes from the requirement to order the node by ascending ids. I.e. on the server needs to collect the full result first before outputting because the lowest node ids might be in the north-eastern corner, which would otherwise be send out last. This is because beginner tools in the past required OSM XML to be ordered that way.

The server can respond faster if you can use out qt or omit the explicit nodes at all by using out geom and dropping the (._;>;) part.