I think OSM's search system needs to be improved

Thanks for doing that. I hadn’t originally thought it would get that many replies or I would have just started a new topic. Although, I’ve copied my comment from my original post below since I think it directly relates to this discussion.

“You can improve Nominatim all day to consider dots and gaps in it’s search results, but it’s still going to give subpar results unless consistent, across the board standards for how to use the name tag are followed. Especially when it comes to abbreviations, but it’s really a more generally problem of inconsistency, which is particularly bad when it comes to “brand” tagging.”

1 Like

I don’t think anyone in this discussion at least is expecting Nominatim to give the same results as Google, but it should at least work as intended and in this particular case it doesn’t seem to. I don’t think we should just accept subpar search results because the intent of openstreetmap.org isn’t to be a “enduser” map either. Otherwise just don’t have a search function in the first place :man_shrugging: Personally, I’ve used it for plenty of things that weren’t specifically “enduser” related. So it’s not like the only people that would benefit from it working well are Google Maps aficionados or whatever :roll_eyes:

There is a shortcoming of nominatim we have in some countries where streets are named after the full names of people but common usage often omits the given names. Unless we tag short names (or alt_names) “for nominatim”, the streets aren’t found. Nobody (literally) would expect from a search form that they had to type the complete name.

On the other hand, there have been lots of improvements, and nominatim does things now that it couldn’t do some time ago. The problem with the names can also be seen as a tagging problem, because there are some rare cases where the given names should not be omitted to avoid confusion.

Specifically, Photon and Pelias implement find-as-you-type or autocompleting search, as well as some tolerance for typos. These features can be viewed as elements of fuzzy search, but it’s a very large problem space. For example, fuzzy search can refer to Google’s (sometimes infuriating) behavior of substituting what it thinks is a synonym. Or it could mean accepting a search term that “sounds like” the real name, based on Soundex, Caverphone, or the various CJK transcription schemes. I’ve even heard of geocoders considering certain typos more than others based on the proximity of keys on a specific keyboard layout.

Before we get too far ahead of ourselves, partial string matching would be a noticeable usability improvement over requiring more or less an exact match in one of the name tags, but it would require some care to avoid burying current good results in less good ones. Nominatim appears to do some partial string matching, but the following issue remains open probably because there are a lot of false positives and false negatives:

Martin Luther King illustrates both this problem and the reverse: in some cases the street is named Dr. Martin Luther King Jr. Boulevard but people know it as King Boulevard or MLK Boulevard; in other cases, the street is named Martin Luther King Street but people know it as Martin Luther King Jr. Street. It isn’t a big deal to special-case a few world-famous figures like Dr. King, but some things have been named “King” after a slaveowner, while other things may be named after Martin Luther, tripping up a pure edit distance algorithm. name:etymology:wikidata can help to solve this problem more generally.

yes, or “Via Camillo Benso conte di Cavour” which is more commonly spoken as “Via Cavour”, e.g.

I could be easier than we think. As far as I know, the PostgreSQL database used by OSM already supports fuzzy search using trigrams.

1 Like

The codebase is at GitHub - osm-search/Nominatim: Open Source search based on OpenStreetMap data, so if you believe it’s easy, I’m sure the maintainers would be delighted to receive a high-quality pull request!

2 Likes

I know your just trying to encourage @amadvance to do a pull request, but your comment comes off as rather patronizing. Or like your trying to call @amadvance’s bluff that it’s easy to fix by linking to GitHub and telling him to do it then when you know he’s probably not going to. Even if that’s not what your doing though, people should be able to have an opinion about something without being told that they should fix the problem themselves if they think it’s so easy to deal with. Really, the same goes for your last comment. No one thinks computers are easy or that OSM isn’t made by people applying their talents to make things better or whatever. This issue isn’t going to get resolved by subtly taunting commenters or doing aspirational hand-waving.

No, I’m not trying to call anyone’s bluff. I’m saying where the code is. @amadvance looks like he knows more than I do on the subject, because I couldn’t have told you that the database used by Nominatim supports trigrams (I don’t even know what trigrams are). It sounds like he might be able to make a helpful contribution so I was pointing him to the right location to do so. Assume good faith.

2 Likes

I don’t disagree with that, but it doesn’t mean you couldn’t have written the message in a way that sounds less like you were saying “If you think it’s so easy then do it yourself :roll_eyes:” and more like “It sounds like this is something you know a lot about. We appreciate really new contributors if you want to work on this here’s where you can.” Honestly though, I probably wouldn’t have said anything about it except you’ve treated people the same way in other discussions before and it’s never came off as genuine.

I think I did that when I know your just trying to encourage @amadvance to do a pull request. I hope you’d agree that we should all be able to accept feedback on how we are approaching others without defaulting to assuming the person giving the feedback is just being bad faithed about it. At the end of the day my feedback about your approach is pretty milk toast. I know I’ve been pretty harshly dragged through the dirt myself for way less and I assure that wasn’t the intent behind my comment.

Implementing some cool stuff only to find out that maintainers don’t consider it a good fit for the project may cause some frustration on all sides.

For any sort of non-trivial change, start with a discussion first: osm-search/Nominatim · Discussions · GitHub

Let’s be sure everyone understands the problem space in sufficient detail. It’s a good opportunity to discuss pros and cons of the proposed solution, its implications on the implementation, new dependencies, performance, hardware requirements, etc.

5 Likes

I guess there isn’t a Nominatim/searching issue specific forum category huh? If not it might be worth creating one for conversations like this where the discussion starts out being to general to justify a GitHub issue. I’m sure there’s plenty of people out there who have questions or problems about Nominatim and/or search the website but know nothing about GitHub and don’t want to use it.

that is discussion forum solely about Nominatim, in Nominatim repository

You can’t really have general discussions about it there can you? Like if I just want to lament about the poor state of the style to random OSM users I’m not going to open an issue about it on their issue tracker.

It’s completely okay to discuss search and/or Nominatim here. I (with my Nominatim maintainer hat on) follow these discussions and will answer/comment if concrete questions come up. This includes question of ‘why do I get these strange results’ and ‘how to tag so Nominatim understands it’.

The discussion in this thread has rather so far only evolved around the more general topic of ‘search sucks’. There is not much to say about that except that I’m aware about the problems. Alas, each of the points you mentioned (Korean word boundaries, KFC vs. Kentucky Fried Chicken, spelling correction) needs a different solution that takes a couple of months of engineering work to complete. If you are interested in details, then Github discussions is indeed a better place to ask.

In the meantime, please continue. Your points are noted, even when not commented on.

16 Likes