Need help importing custom data into nominatim

<?xml version='1.0' encoding='UTF-8'?>
<osm version="0.6" generator="libosmium/2.20.0">
  <node id="123456789" version="1" timestamp="2024-09-18T23:11:44Z" user="me" lat="45.1234" lon="8.1234">
    <tag k="attr:housenumber" v="12345"/>
    <tag k="attr:street" v="Dummy Street Name"/>
    <tag k="attr:postcode" v="123456"/>
    <tag k="attr:city" v="Dummy City Name"/>
    <tag k="attr:state" v="CA"/>
    <tag k="attr:country" v="United States"/>
    <tag k="attr:country_code" v="us"/>
  </node>
</osm>

Trying to import the above XML (with real values) into my nominatim instance. I will import the XML file with nominatim add-data --file test.xml and see that Processed 1 nodes in 0s - 1/s is present in the output.

However, when I go to index, all ranks show 0/0 and the address I’m trying to import never actually makes it to the placex table.

I’m using pyosmium writer = osmium.SimpleWriter('filename.xml') to generate the XML file. A user will make a request with

{
        "housenumber": str,
        "street": str,
        "postcode": str,
        "city": str,
        "state": str
}

and the python script I have will generate the above XML.

The address tags in OSM have a prefix addr:*, not attr:*.

Thank you. I’ve fixed that issue, bumped the version number, and saw the data index this time. However, querying /search on the nominatim server still returns nothing.

I did originally include all the US data from geofabrik, and can perform queries against that, but unable to add custom data

(side note: I used your previous response here to this question as my basis before coming here.)

<?xml version='1.0' encoding='UTF-8'?>
<osm version="0.6" generator="libosmium/2.20.0">
  <node id="123456789999" version="1" timestamp="2024-09-20T00:42:46Z" user="osm_imports" lat="0" lon="0">
    <tag k="addr:housenumber" v="12345"/>
    <tag k="addr:road" v="Some Road Name"/>
    <tag k="addr:postcode" v="12345"/>
    <tag k="addr:city" v="City Name"/>
    <tag k="addr:state" v="TN"/>
  </node>
</osm>

Quick update:
I have this following code using Osmium 4.0.0 to generate/create an XML file with the appropriate tags.

In short, when a user makes a request to a FastAPI route, there’s a Pydantic model that gets populated with the incoming request body. All references to data are that Pydantic model.

For now, you can see that I’m hardcoding a few of the values and have commented out category and type as errors were being thrown with the writer.add_node(...) when they were uncommented.

This will generate the XML file that I’ve shown in my previous reply.

Currently I see Processed 1 nodes in 0s - 1/s with the command nominatim add-data --file {filename.xml} and I also get Done 1/1 on Rank 30 when nominatim index is ran. However, I am unable to query any of the data that is included in the file. Search and Lookup both return nothing.

Ideally, it would be preferred to add the data programmatically without having to generate an XML file. Is there no way to import data into Nominatim without having to use a file?

My Nominatim version is 4.4.0

class NewEntryActions:

    @classmethod
    async def create_new_entry(cls, data):
        filename = f'{uuid4()}.osm.xml'

        with osmium.SimpleWriter(filename) as writer:
            if isinstance(data, list):
                for entry in data:
                    writer.add_node(await cls.populate_entry(entry))
            else:
                writer.add_node(await cls.populate_entry(data))

        return {
            "status": 200,
            "filename": filename
        }


    @classmethod
    async def populate_entry(cls, data):
        return osmium.osm.mutable.Node(
            timestamp = datetime.now(timezone.utc).strftime('%Y-%m-%dT%H:%M:%SZ'),
            id = 123456789999,
            uid = 1234,
            changeset = 7563,
            user = data.user,
            version = data.version,    # default 1
            tags = cls.set_tags(data),
            location = (data.lat, data.lon),    # default (0, 0)
            visible = True,
            # category = data.category,   # default "place"
            # type = data.addresstype    #default "apartment"   
        )

    @staticmethod
    def set_tags(data):
        tagList = []
        for key, value in data.dict().items():
            if key in ['road', 'housenumber', 'city', 'state', 'postcode', 'country']:
                tagList.append(osmium.osm.Tag(f"addr:{key}", str(value)))
        return tagList

Nominatim is first and foremost a geocoder for OSM data. That comes with a couple of assumptions. The most relevant here is, that Nominatim not only saves address points but also has the information for cities and streets in its database. It then uses this assumption to optimise for space by not saving the full information on each address point but using most of the information from its attached street instead.

This has some consequences for sneaking in additional external data. One of them is that when you have a street address (btw the tag is addr:street not addr:road), then there must be an object of a street with the same name in the database already. If there isn’t one, then Nominatim will simply find the closest street and use that as reference. The result is that your housenumber cannot be found under the addr:street, you put in. You can ,to some extend, work around that by using addr:place instead of addr:street but that would be a very evil hack and no guarantees that it doesn’t have strange side effects.

The question about supporting external data sets in Nominatim comes up now and then and I’m not opposed to making that possible. However, it is not one of the core needs of the OSM mapping community (of which there are many) and therefore is unlikely to make it ever on top of the TODO list unless somebody pays for the development.