Need help importing custom data into nominatim

<?xml version='1.0' encoding='UTF-8'?>
<osm version="0.6" generator="libosmium/2.20.0">
  <node id="123456789" version="1" timestamp="2024-09-18T23:11:44Z" user="me" lat="45.1234" lon="8.1234">
    <tag k="attr:housenumber" v="12345"/>
    <tag k="attr:street" v="Dummy Street Name"/>
    <tag k="attr:postcode" v="123456"/>
    <tag k="attr:city" v="Dummy City Name"/>
    <tag k="attr:state" v="CA"/>
    <tag k="attr:country" v="United States"/>
    <tag k="attr:country_code" v="us"/>
  </node>
</osm>

Trying to import the above XML (with real values) into my nominatim instance. I will import the XML file with nominatim add-data --file test.xml and see that Processed 1 nodes in 0s - 1/s is present in the output.

However, when I go to index, all ranks show 0/0 and the address I’m trying to import never actually makes it to the placex table.

I’m using pyosmium writer = osmium.SimpleWriter('filename.xml') to generate the XML file. A user will make a request with

{
        "housenumber": str,
        "street": str,
        "postcode": str,
        "city": str,
        "state": str
}

and the python script I have will generate the above XML.

The address tags in OSM have a prefix addr:*, not attr:*.

Thank you. I’ve fixed that issue, bumped the version number, and saw the data index this time. However, querying /search on the nominatim server still returns nothing.

I did originally include all the US data from geofabrik, and can perform queries against that, but unable to add custom data

(side note: I used your previous response here to this question as my basis before coming here.)

<?xml version='1.0' encoding='UTF-8'?>
<osm version="0.6" generator="libosmium/2.20.0">
  <node id="123456789999" version="1" timestamp="2024-09-20T00:42:46Z" user="osm_imports" lat="0" lon="0">
    <tag k="addr:housenumber" v="12345"/>
    <tag k="addr:road" v="Some Road Name"/>
    <tag k="addr:postcode" v="12345"/>
    <tag k="addr:city" v="City Name"/>
    <tag k="addr:state" v="TN"/>
  </node>
</osm>

Quick update:
I have this following code using Osmium 4.0.0 to generate/create an XML file with the appropriate tags.

In short, when a user makes a request to a FastAPI route, there’s a Pydantic model that gets populated with the incoming request body. All references to data are that Pydantic model.

For now, you can see that I’m hardcoding a few of the values and have commented out category and type as errors were being thrown with the writer.add_node(...) when they were uncommented.

This will generate the XML file that I’ve shown in my previous reply.

Currently I see Processed 1 nodes in 0s - 1/s with the command nominatim add-data --file {filename.xml} and I also get Done 1/1 on Rank 30 when nominatim index is ran. However, I am unable to query any of the data that is included in the file. Search and Lookup both return nothing.

Ideally, it would be preferred to add the data programmatically without having to generate an XML file. Is there no way to import data into Nominatim without having to use a file?

My Nominatim version is 4.4.0

class NewEntryActions:

    @classmethod
    async def create_new_entry(cls, data):
        filename = f'{uuid4()}.osm.xml'

        with osmium.SimpleWriter(filename) as writer:
            if isinstance(data, list):
                for entry in data:
                    writer.add_node(await cls.populate_entry(entry))
            else:
                writer.add_node(await cls.populate_entry(data))

        return {
            "status": 200,
            "filename": filename
        }


    @classmethod
    async def populate_entry(cls, data):
        return osmium.osm.mutable.Node(
            timestamp = datetime.now(timezone.utc).strftime('%Y-%m-%dT%H:%M:%SZ'),
            id = 123456789999,
            uid = 1234,
            changeset = 7563,
            user = data.user,
            version = data.version,    # default 1
            tags = cls.set_tags(data),
            location = (data.lat, data.lon),    # default (0, 0)
            visible = True,
            # category = data.category,   # default "place"
            # type = data.addresstype    #default "apartment"   
        )

    @staticmethod
    def set_tags(data):
        tagList = []
        for key, value in data.dict().items():
            if key in ['road', 'housenumber', 'city', 'state', 'postcode', 'country']:
                tagList.append(osmium.osm.Tag(f"addr:{key}", str(value)))
        return tagList
1 Like

Nominatim is first and foremost a geocoder for OSM data. That comes with a couple of assumptions. The most relevant here is, that Nominatim not only saves address points but also has the information for cities and streets in its database. It then uses this assumption to optimise for space by not saving the full information on each address point but using most of the information from its attached street instead.

This has some consequences for sneaking in additional external data. One of them is that when you have a street address (btw the tag is addr:street not addr:road), then there must be an object of a street with the same name in the database already. If there isn’t one, then Nominatim will simply find the closest street and use that as reference. The result is that your housenumber cannot be found under the addr:street, you put in. You can ,to some extend, work around that by using addr:place instead of addr:street but that would be a very evil hack and no guarantees that it doesn’t have strange side effects.

The question about supporting external data sets in Nominatim comes up now and then and I’m not opposed to making that possible. However, it is not one of the core needs of the OSM mapping community (of which there are many) and therefore is unlikely to make it ever on top of the TODO list unless somebody pays for the development.

2 Likes

Hi Ryan, I am currently looking into the possibility of using my own custom data with Nominatim. I was wondering if you had any new updates on your last attempt, were you able to find a workaround solution?

Thanks

Hey StudentHere, I “kinda got a solution”, and by that, it’s more of a band-aid that will fail some day far far in the future (I’ll touch on that further down).

A very high level overview:

I built a FastAPI route that accepts a Pydantic object as a body param:

    housenumber: Union[str|int]
    street: str
    postcode: Union[str|int]
    city: str
    state: str
    lat: float
    lon: float
    country: Optional[str] = Field(default="US")
    country_code: Optional[str] = Field(default="us")
    addresstype: Optional[str] = Field(default="residential")
    buildingtype: Optional[str] = Field(default="apartment")
    visible: Optional[bool] = Field(default=True)
    category: Optional[str] = Field(default="place")

    def create_full_str(self):
        return f"{self.housenumber} {self.street}, {self.state}, {self.country} {self.postcode}"

From there, I do a lookup in Nominatim to see if it already exists. If it does, return the existing addr.

If that addr doesn’t exist:

  1. I create a new Node AND place (as suggested by Lonvia above). Not every obj in OSM is a node, but I found that Node’s work best for this. Programmatically, a Node can be created like so:
import osmium.osm

with osmium.SimpleWriter(filename) as writer:
    newEntry = #do all my processing and formatting here
    writer.add_node(newEntry)
    newPlace = #take the newEntry obj and make change 'addr:street' to 'addr:place'
    writer.add_node(newPlace)
  1. Nominatim requires a lat/lon be attached to every Node. So, I look up that full address. Once again, if it returns everything (street number, street name, city, state, zip), i’ll return that object (before creating the newEntry). If it returns partial data, I’ll take that lat/lon and place it in the newEntry. If it returns nothing, strip the street address and search again with just the Street Name, City, State, Zip. This can be VERY expensive as some addresses require up to 3 lookups. If nothing is found after the 3rd, I reject that address altogether.

  2. Take the bounding box, do some maths in a different library and get the lat/lon coordinates. (this isn’t super accurate and can be off by 1/4 mile or so, but more than accurate for my needs)

  3. From there, all the newEntry are added as XML entries into a file. Take the current timestamp entry_id = round(time.time() * 1000) and set that as the ID. This number generated is large enough to satisfy Nominatim and prevent a collision with pre-existing entries BUT as time goes on and many years down the road, there will be collisions. (this is that failure I had mentioned above)

  4. I upload the newly generated XML file to S3. My EC2 instance has a cronjob that watches for any new files in the bucket. Grabs the file, automatically runs nominatim add-data --file <FILE> and nominatim index then deletes the file.

Once again, I want to stress that this isn’t an end-all-be-all solution to the question of “how does one add custom data”, but it’s a temporary band-aid that will some day no longer be a “solution”.

Lastly, here’s a quick dump of the code I wrote (with edits) that I’m allowed to share: GitHub - rgreen1207/Nominatim-Entries

Edits:
fixed some typos and fixed the formatting of the code blocks

1 Like

Much appreciated! Thank you