Stuck between a rock and a hard place! Our app needs to find the closest N businesses that match specific tags. in the results, we need the full set of tags for each result. Here is the current queryString we are using… (NOTE: We have a local instance of nominatim and overpass setup working smoothly)
data=[out:json];(nwr(around:50000,24.5557,-81.78259)[“lgbtq”~“primary|friendly”];);out skel center qt;
We use the same query for different apps, each looking for specific tags. I’m looking for advice on a modified query strategy to meet the requirements of finding the closest N businesses around a lat/lng with full tags in the results. Here are some of the drawbacks in the current approach…
The ‘skel’ output filter is faster but does not include full tags in the result
The ‘body’ output filter is unacceptably slower but does include full tags in the result
The ‘ids_only’ filter is must faster, but apparently the nominatim API does not provide a way to get full tags for IDs
#3 was the strategy we thought would be the most performant… get matching IDs using overpass and get full data using nominatim. However, nominatim’s ‘extratags’ parameter does not provide the full set of tags that we need. How can we get full tags by ID using either Overpass or Nominatim?
To find the ‘closest’ N places, we calculate the distance for each result, sort by distance and chop N results from the top. Side question: does OSM APIs provide any sorting options by distance? The ‘qt’ sort does not provide sufficiently granular distance sorting. Have not yet checked the performance impact of removing the option as it does not provide with any benefit for our more precise needs.
If you go this far, thanks for reading my rather long post! Any advice would be most helpful.
Also you can convert pois to centroids using osmconvert & filter for a wide range of POIs with osmfilter. Such a data set will probably give you what you want & can be used to populate a Nominatim style database (although you still need Geocoding to find the point around which you want to retrieve data). I pulled out all the world’s retail establishments a few years ago doing this & it worked very well for my analysis: this was around 10 million POIs at the time.
Hmmm… centroids… hmmm, that may be an option down the pike a bit as it sounds like a pretty serious investment. But curious, what kind of performance gains did you see with precalculating POIs with similar requirements?
Too much noise in the channel to provide any discreet performance numbers between ‘around’ and ‘bbox’ alternatives. I assume bbox is faster and less load bearing based on earlier comments, but have not been able to get clean measurements yet to prove it. It’s not markedly faster though.
I have minutely updates going, but may back off to an afterhours daily to take a load off the server. Response time variance is pretty high, wildly swinging from 5s to 60s. It appears there is some sort of server caching in play as a repeat request does run faster. Now if only I could prime the infinitely large cache with searches from various key parts of the planet, I might feel a bit better about application response time.
Thanks for providing some alternatives and additional context. Just barely under the curtain at this point.