Understanding and complying with Nominatim Usage Policy

Hello,

I am a research engineer providing support to another researcher local to my university workplace. We’re engaged in doing open, peer-reviewed science with the current project I am assisting her with. To make it very clear, there is no commercial purpose to this project.

We have received a dataset with coordinates in a given bounded area. These coordinates are the only geospatial information this dataset contains, so I’d like to enrich the dataset with city districts, suburbs, municipal regions, etc, so that the dataset can be filtered for visualization purposes, to e.g. ‘show all points within district 5’. Therefore, I would like to use Nominatim Reverse Geocoding to find these overarching regions belonging to each set of coordinates. I’ve already managed to confirm that a query of “https://nominatim.openstreetmap.org/reverse?lat=yyy&lon=xxx” gives me exactly what I need.

I can clearly read the Usage Policy for Nominatim what the rules are for using it are, though I am unsure where the limit goes. The dataset contains ~800k sets of coordinates. I would only need to get a response for each point once, at which point the task is done and dealt with – the dataset now has been enriched. Spacing out each request to once every second would take ~10 days of constant running, which is not an issue for me. But I wonder if I would run afoul with the policy, specifically the one concerning ‘Unaccepable Use – Systematic queries’. Technically, I would not be doing some exhaustive grid search of an entire area – I’d be getting data concerning whatever coordinates I am submitting, and I would know before making each request that they would be meaningful.

If this is considered unacceptable use, I’m not going to argue, but I would appreciate hearing of alternatives. Setting up a local database is not an option with the system requirements stated, but I wonder if the there’s some way to do a one-time export from the underlying data that would make up said local database. Or does the Weekly Planet XML File provide Nominatim Reverse Geocoding data?

Thanks.

I can’t say if it is “unacceptable” use but it certainly does sound like a waste of resources. Could you share what exactly the “bounded area” is so that we can provide a realistic solution rather than theoretical examples?

This exact thing exists for Photon, an easy-to-install geocoder broadly based on Nominatim data, and with ready-to-download data dumps: GitHub - komoot/photon: an open source geocoder for openstreetmap data

If I had that requirement I probably wouldn’t use anything like Nominatim at all - just whck the bits of the dataset that you’re interested in and some admin or similar areas into a spatial database and you’ll be able to answer “return all X within Y” questions with regular spatial queries.

2 Likes

I can. The entire area is the Stockholm Municipality. The points are bounded within the actual boundary of the area, and not just in the smallest bounding box, so I wouldn’t be requesting meaningless points e.g. in the water in the north-eastern corner of the box. I’ll also add that some points could potentially overlap, so if they fall within a reasonable distance threshold, I’d skip making that request. As of writing, I cannot rightly say how much of the dataset are overlapping.

Thank you for the suggestion! It looks promising for sure, since I was able to verify that it gives passable results like the actual Nominatim API does. Though, the terms of use is even more vague than Nominatim’s. In a similar vein to this forum thread, I wouldn’t want to overstep.

The point is that you can host Photon trivially yourself and then you don’t have to worry about rate limits or other terms.

If Nominatim works for your purposes, simply set up a local instance and import an extract (might as well do the whole country), this should easily work in a VM on a reasonably sized laptop.

Alternatively use photon as has been suggested, though for reverse geocoding it is probably less suitable.

PS: just for completeness sake: you will need to take licensing in to account for the merged dataset.

1 Like

Just to make it clear, and I’m not sure if you’re actually contesting the idea, the need to extend the dataset with these additional columns of data is not up for discussion. For many reasons unrelated to the topic, whatever application would be doing these filterings cannot be hooked up to external processes, whether they be accessible online through an API or running locally on the host platform. The data needs to be prepared this way before being loaded to the application.

But if you were just talking about preparing the data using spatial queries, the thought had crossed my mind as I explored OSM, that I could export the boundary polygons defining the areas and then do a spatial query based on them. The limited time I spent on the idea seems to suggest the following things:

  • The boundary seems to make up from smaller parts, and exporting the boundary from OSM gives me references to these smaller parts, rather than the points which makes up the polygon. These smaller parts may or may not also consist of even smaller parts in a similar vein. Parsing through the smaller parts recursively to get a simple list of points is doable, but requires some additional work.
    • It is entirely possible I’ve missed something as I’ve explored, so smarter ideas are welcome.
  • Navigating the web today is a mess – if there is a free spatial query API out there that is able to handle ~800k spatial queries in order to enrich my dataset in a similar fashion to using Nominatim or Photon API (i.e. simple cURL a url with lat and lon and I wouldn’t need to setup something locally), I’d welcome the recommendation.

Thank you all for the replies and suggestions. Many suggested solutions seems to point to setting things up locally. While I’m not entirely opposed to the idea of setting something up locally, I’d rather skip the workload and the hours of trial and error, if there’s a resource available already that I would not violate the terms of use for using the way I intend. I suppose I will end up doing it locally, if no clearing-up or better alternatives present themselves.

I think what many of us are trying to say is this: You’re prepared to invest some work in writing code that iterates through your data points, issues API calls, and enriches the data with the API call results. Doing the enriching locally would be marginally more work for you, but it would be vastly more sustainable in terms of our project resources.

It won’t kill us if you run queries for 10 days but what if the next researcher does it the same way, and the next, and the next?

It would be a much nicer success story for us if, at the end of your project, you could proudly say: “I have downloaded OSM data and enriched a couple million data points and here’s how I did it so other researchers can do the same” - compared to “I’ve occupied OSM servers for 10 days but if I had had 10 times the amount of data this would not have been feasible”.

Here’s an easy method to download all administrative boundaries in Stockholm county as GeoJSON:

curl -o boundaries.geojson \
   -g https://postpass.geofabrik.de/api/0.2/interpreter   \
   --data-urlencode "data=
    SELECT boundary.*
    FROM postpass_polygon boundary, postpass_polygon stockholm
    WHERE boundary.tags->>'boundary'='administrative'
    AND st_contains(stockholm.geom, boundary.geom)
    AND stockholm.osm_type='R'
    AND stockholm.osm_id=54391"

You can also run that in Overpass Turbo: overpass turbo

From there, it would be really easy to load your point file and these boundaries into a spatial database (or maybe even QGIS can do the job) and add the admin information to each point.

2 Likes

For these amounts of data i recommend running a local instance.

The easiest way would be to spin up a docker instance:
https://hub.docker.com/r/mediagis/nominatim/

I reversed geocoded 650k poi’s and that takes about a half hour.

Yes, I am prepared to spend some time coding the script which would end up enriching the dataset, provided that I have the necessary components in a nice, neat file on my device. I am very proficient at that part, and I know in advance that I’d spend no more than a few hours doing that.

I am far less proficient setting up systems like these. Usual course of events involves a lot of head scratching and ends with a lot of headaches. Could I use the practice and would a local system be beneficial for similar tasks in the future either by me or colleagues? Maybe, but I’m having a hard time justifying it to myself and the researcher I’m collaborating with, as of writing. This is my problem, of course – not yours.

I recognise the validity of the point you’re making. If that is the verdict, then I will respect it, as I said I would.

I appreciate the suggested cURL-command, even if it grabs everything from Stockholm County, and not Municipality. I should be able to tweak it myself or simply just exclude parts of it manually. From what I can see from the resulting file, it should suffice.

I’ll mark this as resolved, as my question has been answered. And I apologise if I came across as obstinate – it was not my intention.

Have you seen anyone offering Nominatim with global database prefilled?

Sadly, I am pretty sure that my computer cannot really do this for now.

(maybe next one will be capable enough)

I don’t know any prefilled Nominatim downloads but the Photon download for Sweden is here: Index of /public/extracts/by-country-code/se/

@Suminro use ID 398021 for the Stockholm Municipality.

1 Like

And Index of /public/ seems to contain global data

Already done, and I just finished the spatial query for enriching the dataset. :slight_smile:
Once again, thank you for the discussion, answers, and suggestions.


2 Likes