New natural language search interface for OSM (public demo + open source)

Hi everyone,

my name is Lynn, and together with my team at Deutsche Welle we develop tools to make the life of journalists easier. Geolocation is an essential part of verifying videos and images. OSM is a treasure trove for journalists but challenging for many journalists to use.

To address that, I’ve been working on an open-source tool called SPOT that makes it possible to search OpenStreetMap using natural language scene descriptions, e.g.: “find an Italian restaurant next to a bus station in Paris”. Instead of learning OverpassQL, you can just describe what you’re looking for, and SPOT translates it into an app specific OSM query and shows the results on an interactive map with Google Streetview and other external services integrated so that you can check the location directly.

The tool is not only open source, but we also have a public beta for anybody who wants to test it:

:point_right: Demo: https://www.findthatspot.io/
:point_right: Code: GitHub - dw-innovation/kid2-spot: A collection of repositories for the Spot application.

The project started as part of research for investigative journalists, but I think that the use cases go far beyond that and that people using OSM for a variety of other tasks could also benefit from it.

Since OSM already powers so many amazing tools, I’d love to hear your thoughts:

  • Do you know of any existing OSM projects where this kind of search interface could be useful (e.g. osm.org search, tag explorers, other community tools)?
  • Would anyone be interested in collaborating, whether on code contributions, reviewing or extending the OSM tag bundles (a tag-to-natural-language mapping) we created, or helping us explore integration options?
  • More generally: what do you think of the approach? Are there obvious gaps or directions we should explore?

I would be really happy to connect with others who are excited about making OSM more accessible to non-technical users.

Cheers,
Lynn

(PS: if you’re curious about the technical details, we just published a paper describing the system and evaluation, though certain information is already outdated: https://aclanthology.org/2025.acl-demo.8.pdf)

34 Likes

Why is there a login/account required?

7 Likes

I see this being useful on osm.org as the current search engine is really picky about object names - e.g. if you type “McDonalds” (without an apostrophe) it will only show you objects named McDonalds, and no McDonald’s restaurants.

2 Likes

Fantastic idea! I tried three queries to stress test your system (and I haven’t looked at the details of the technical implementation to understand what should actually be possible and what shouldn’t).

“Pubs that are open where kids are allowed”

I didn’t get any results. “That are open” was interpreted as a reference to outdoor seating, not to opening hours, and the phrase “kids are allowed” made the search look for objects dual-tagged as pubs and as playgrounds instead of the min_age tag. There seems to be no concept of the current time so no way of checking opening_hours against it.

“Playgrounds with climbing walls”

This requires a spatial query (playground=climbingwall inside a leisure=playground) which didn’t work. Instead it found a climbing wall that isn’t inside a playground at all, but in a sports centre, because it went looking for things that had both a playground=* tag (in this case playground=climbingwall but it would have matched any) and a sport=climbing tag. It didn’t find any of the climbing walls inside playgrounds.

“An Italian restaurant near a tram stop”

This just worked perfectly.

I know this isn’t your original use case, but would be amazing to get to a point where queries like this just work in an OSM-based mobile app or on osm.org. That would really show off how rich our data is, and further incentivise mappers to record some of this data.

7 Likes

There are no McDonald’s restaurants because there is a global fast food chain which prevents people from opening a McDonald’s restaurant via trademark protection.

1 Like

Dear all, thank you so much for your replies! I will try to answer your questions to the best of my abilities.

Why is there a login/account required?

There are two reasons why we decided to require people to log in to use the demo:

  1. To be able to get some rough anonymised user info (how many individual users per day for example). Aside from statistics, this info on how each user interacts with the system is used for improvement via benchmarking, error tracking and manual analyses of user prompts.
  2. The main purpose it serves is as a minor deterrence against abuse, just to make it a tiny bit more difficult (though people could of course use throwaway accounts, but we figured better than nothing).

I see this being useful on osm.org as the current search engine is really picky about object names - e.g. if you type “McDonalds” (without an apostrophe) it will only show you objects named McDonalds, and no McDonald’s restaurants.

We actually discussed this case, but I think we are currently also not handling it yet. A simple solution would be to strip any trailing s or 's, and to search for the partial string “McDonalds”.
Thank you very much for bringing this issue up! I will make sure we include this in a future update!

Do you know how I could get in touch with someone working on the OSM search engine?

I know this isn’t your original use case, but would be amazing to get to a point where queries like this just work in an OSM-based mobile app or on osm.org. That would really show off how rich our data is, and further incentivise mappers to record some of this data.

These are all excellent suggestions! It would indeed be great if we could extend the functionality to be able to handle all of these examples.

We currently do not have tags for opening hours or kid friendliness, since these tags are not high priority for location verification. We are pushing our little model quite a bit already, and wanted to focus on robustness for this specific use case.
We should be able to deal with the climbing wall example, but we simply did not know about this tag. We all started as novices in OSM, so the tag list we created is certainly not complete. I will make sure to add this tag for future releases, but there are still many more tags missing I’m afraid.

You can have a look at the tag bundles we created to check what the tool should be able to do (we are always happy about suggestions): https://github.com/dw-innovation/kid2-spot-datageneration/blob/main/datageneration/data/Spot_primary_keys_bundles.xlsx

It would be great if at some point the tool could handle any OSM tag with high robustness, but we are a tiny team with me doing a bulk of the model and tag database work, so getting there is quite a mammoth task.

3 Likes

Do you know how I could get in touch with someone working on the OSM search engine?

You mean Nominatim? That’s @lonvia .
Nominatim can search for addresses and names (New York, McDonalds, …).

For searches that put different elements in relation to another or query certain tags on elements (playgrounds with climbing walls, wheelchair-accessible toilets), the classical engine would be Overpass. However, nowadays I’d suggest you to have a look at QLever as the more standardized solution - you can query OSM data with SPARQL and of course thus also cross-reference it with wikidata.

If it’s not already based on that.

2 Likes

<shameless plug>
Apart from the standard Overpass approach, queries like this can also be done in PostgreSQL on the “Postpass” service, eg overpass turbo - no natural language of course :wink:
</shameless plug>

4 Likes

So basically, you are describing what you can do today by asking ChatGPT to generate an overpass query.

Except open source and not ran by a USA corporation?

3 Likes

Almost certainly it would be an open source wrapper that calls one of those corporate LLMs that you don’t like. Or, an open source model run on hilariously underpowered hardware for an LLM.

I don’t think anyone has yet said “this is awesome”, so: this is awesome.

25 Likes

Not sure what model this uses, but when I search for an ‘outdoor gym’ it uses the wrong tags.

Even though the Wiki is pretty explicit about it:

https://wiki.openstreetmap.org/wiki/Tag:leisure%3Dfitness_station

1 Like

Thanks for all the feedback once again!

nowadays I’d suggest you to have a look at QLever as the more standardized solution - you can query OSM data with SPARQL and of course thus also cross-reference it with wikidata.

Thanks a bunch! We actually do use Nominatim for area names, but I don’t think I tried it with (brand) names yet. I gotta have a look if it works better than our current approach.

Thank a lot for the QLever suggestion! I will make sure to check it out. One bottleneck in terms of speed is the OSM PosgreSQL search, so it would be great to find more efficient solution than what we developed.

Apart from the standard Overpass approach, queries like this can also be done in PostgreSQL on the “Postpass” service, eg overpass turbo

Same as with QLever. I will have a look at the project and see if it could fit! We decided against using the Overpass API because some things we envisioned could not be realised there, so setting up our own was the faster way, but it would be even better if there was a more efficient solution that works for our use case.

So basically, you are describing what you can do today by asking ChatGPT to generate an overpass query.

I don’t want to get into too much detail of what the benefits are of training your own specialised model compared to relying on the large companies, but there are many. For example freeing yourself from dependence on the performance of GPT (what do you do if something does not work, since it’s not perfect?), you are not affected by changes due to updates, you can fine-tune the model to your very specific niche needs, you can host the tool on local servers where you have control over data privacy, the open-source approach allows for collaboration rather than relying on large corporations and your carbon footprint is lower since you can use a smaller, more specialised model. This is just from the top of my head.
While you can get a decent Overpass draft from GPT, it does not perform the same in all factors that are important to us.
You are right in one thing though, while we didn’t just set up a wrapper to GPT (check out the paper I shared if you want to know more details), we indeed rely on “hilariously underpowered hardware” to run our little Mistral 24B (a Hugginface 24GB Nvidia A10G GPU). This is due to our limited budget, but needing to work with limited hardware is why I am especially happy about the performance of the current beta version.

Not sure what model this uses, but when I search for an ‘outdoor gym’ it uses the wrong tags.

Thanks for the feedback! We did not include the tag “leisure=fitness_station” yet, but I will make sure to add it for a future release and map it to the phrase “outdoor gym”. It will become possible to use it in a future update.
If you ever run into another similar issue, feel free to take a look at the list of tags we use, the issue is likely that we simply did not cover the tag, or that the phrase you used to describe it differs too much from the phrases we came up with: https://github.com/dw-innovation/kid2-spot-datageneration/blob/main/datageneration/data/Spot_primary_keys_bundles.xlsx

I don’t think anyone has yet said “this is awesome”, so: this is awesome.

Thank you so much! I really appreciate the kind words! :heart:

7 Likes

I would like to add: I tried to create something with ChatGPT for Overpass three times and it did not work, or maximum “kinda”. So, i think this sounds like a cool project and i see how it can be a usefull tool to have in the toolbox.

4 Likes

Regarding matching to wrong OSM key/values, @Lynn-SPOT do you source OSM tags <> natural language names only from the wiki or also from the iD editor presets?

The latter is a very good, structured and localized source of exactly that (and more). Here’s a Kotlin multiplatform library I wrote to search through this like a dictionary

val matches = dictionary.getByTerm("Bank", languages = listOf("de"))
// matches[0].tags == mapOf("amenity" to "bank")
// matches[1].tags == mapOf("amenity" to "bench") - most likely

The iD editor presets also include terms and aliases for map features, e.g. “tools” for shop=hardware, so that maybe a query like “shops around Hamburg central station that sell tools“ will yield good results.

5 Likes

I wonder why there aren’t any additional terms that would be normally used to disambiguate, and could be used to ask when terms are not clear, e.g. the term “Sitzbank” is not ambiguous, or there could be another word useful for context, like it is done in dictionaries, e.g. “finance”.

Unless they existed before the fast food chain appeared?

Most pubs allow children until about 21:00, it’s not something we specifically map because it’s just the law, common usage.

Obviously children have to be accompanied by an adult.

No “Log in with OpenStreetMap” :pleading_face::pleading_face::folded_hands:

11 Likes