How to replicate results from OSM's nominatim with local server?

I’m trying to set up a nominatim instance and getting different results from the openstreetmap instance. Can anyone suggest why?

I’m doing a search for “620 S Cherokee Ln, 95240” which is a random grocery store in Lodi, CA. I have also tried this with a structured query separating out all available fields.

On my server, I get:

[
  {
    "place_id": 16081205,
    "licence": "Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright",
    "osm_type": "way",
    "osm_id": 181942349,
    "lat": "44.8109956",
    "lon": "-123.0543064",
    "category": "highway",
    "type": "residential",
    "place_rank": 26,
    "importance": 0.0533620332646677,
    "addresstype": "road",
    "name": "Cherokee Trail Lane South",
    "display_name": "Cherokee Trail Lane South, Marion County, Oregon, United States",
    "boundingbox": [
      "44.8107879",
      "44.8112034",
      "-123.0547995",
      "-123.0538133"
    ]
  }
]

That’s in Oregon!

The same query on openstreetmap gives:

[
  {
    "place_id": 297161517,
    "licence": "Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright",
    "osm_type": "way",
    "osm_id": 185568298,
    "lat": "38.1236628",
    "lon": "-121.2604261",
    "category": "highway",
    "type": "secondary",
    "place_rank": 26,
    "importance": 0.053388300144960675,
    "addresstype": "road",
    "name": "Cherokee Lane",
    "display_name": "Cherokee Lane, Lodi, San Joaquin County, California, 95240, United States",
    "boundingbox": [
      "38.1163212",
      "38.1310081",
      "-121.2605032",
      "-121.2602337"
    ]
  },
  {
    "place_id": 297729931,
    "licence": "Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright",
    "osm_type": "way",
    "osm_id": 1314522145,
    "lat": "38.1331119",
    "lon": "-121.2605946",
    "category": "highway",
    "type": "secondary",
    "place_rank": 26,
    "importance": 0.053388300144960675,
    "addresstype": "road",
    "name": "Cherokee Lane",
    "display_name": "Cherokee Lane, Lodi, San Joaquin County, California, 94240, United States",
    "boundingbox": [
      "38.1310081",
      "38.1352160",
      "-121.2606770",
      "-121.2605032"
    ]
  }
]

which, although not perfectly accurate as to coordinates, is at least the right street.

I have imported the US extract from geofabrik with the “full” import style, the US postcodes data set, the Wikipedia importance rankings, and the US TIGER housenumber data. What could I be missing?

1 Like

Do you apply minutely updates?

It looks like postcodes are missing in your installation. Are you sure the import has properly picked up the US postcode data set? You can check by running the query SELECT count(*) FROM location_postcode WHERE country_code = 'us' and it should return close to 40000.

Yes, I think the US postcodes are properly imported:

nominatim=> select count(*) from location_postcode where country_code = 'us';
 count
-------
 38829
(1 row)

What does /lookup?&osm_ids=W185568298&addressdetails=1&format=geocodejson return on your server?

{
  "type": "FeatureCollection",
  "geocoding": {
    "version": "0.1.0",
    "attribution": "Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright",
    "licence": "ODbL"
  },
  "features": [
    {
      "type": "Feature",
      "properties": {
        "geocoding": {
          "place_id": 11140183,
          "osm_type": "way",
          "osm_id": 185568298,
          "osm_key": "highway",
          "osm_value": "secondary",
          "type": "street",
          "label": "Cherokee Lane, Lodi, San Joaquin County, California, 95240, United States",
          "name": "Cherokee Lane",
          "postcode": "95240",
          "city": "Lodi",
          "county": "San Joaquin County",
          "state": "California",
          "country": "United States",
          "country_code": "us",
          "admin": {
            "level8": "Lodi",
            "level6": "San Joaquin County",
            "level4": "California"
          }
        }
      },
      "geometry": {
        "type": "Point",
        "coordinates": [
          -121.2604261,
          38.1236628
        ]
      }
    }
  ]
}

Turns out, the mystery here isn’t why your server returns the wrong address but why osm.org returns the right one.

OSM has the street only as “Cherokee Lane” but your search is for “S Cherokee Lane”. Nominatim can handle missing directional prefixes but not additionally added ones at the moment. If you search for "620 Cherokee Ln, 95240”, you should get a good result on your server.

Support for directional prefixes is somewhere on the todo list but currently somewhat hampered by the fact that tagging in OSM is not very consistent.

That IS a mystery. How does it end up handling it?

I have to geocode like 9 million addresses and was hoping to throw some of them at nominatim to handle, so I can’t be hand editing each one. Being a little off would be fine, but not returning a result from a different state when given the correct ZIP code. Why does it return a result in Oregon? Is it not taking the ZIP into account?

This also happens when I add “Lodi, CA” into the query, and when I do all fields as a structured search.

If you are interested in the details: at one point there was a house number “620 S” in the database. It’s probably long gone by now but because Nominatim on osm.org continuously updates, it has retained an entry saying “620 S” could be a house number. That in turn makes Nominatim come up with the following interpretation of the query:

  • house number: 620 S
  • street: Cherokee Ln
  • postcode: 95240

Search then is lenient in the way that house numbers may be ignored because there are still a lot of them missing in OSM. So it will return the Cherokee Lane in Lodi as a result.

Your database is derived from newer data which doesn’t have the ‘620 S’ house number, so Nominatim never gets to the specific interpretation that yields the right result. It is really completely coincidental that this specific address worked. These kind of glitches you have to live with when working with actively changing data like OSM, I’m afraid.

Why does it return a result in Oregon? Is it not taking the ZIP into account?

Because “Cherokee Trail Lane South” has all the parts that “S Cherokee Ln” has (albeit in a different order) and both postcode and housenumber matching is optional.

Your search is underspecified, which makes it difficult to interpret for Nominatim especially with housenumbers and postcodes being only semi-reliable in OSM. You could at least add the state to the search. Or if that is not possible, do some postprocessing checking that the postcode matches at least for the first two digits. That should give you a decent proximity value in the US. If your data allows for it, then pre-processing the data to get rid of the directional prefix is likely to improve search quality in the US as well.