I think OSM's search system needs to be improved

Ok. noted. I wasn’t sure. May mmd forgive my pitbull defence of the peace of the channel.

1 Like

Yup. I’m absolutely not telling Nominatim or other developers what to do. It’s not my role. The EWG is the only entity that can take a formal position for OSMF on the value of a proposal from any source, including this channel. And EWG can use that opinion to negotiate with a developer and secure funding etc.

The Engineering Working Group (EWG) is charged with

  • Handling software development paid for by the OSMF
  • Putting out calls for proposals on tasks of interest, and accepting proposals on other tasks
  • Offering a platform for coordination of software development efforts across the OSM ecosystem
  • Managing OSM’s participation in software mentorship programs"

Further, I believe FOSS developers should be paid a fair fee for commissioned work - I personally wouldn’t refer to paying a fair fee as “throwing money at the problem”. Right?

1 Like

It really depends on what the time to cost to benefit ratio is, which my guess would be not so great given the complexity of the problem :man_shrugging:

Even superficial research on the topic of “fuzzy” search on OSM data would have immediately turned up https://photon.komoot.io/ GitHub - komoot/photon: an open source geocoder for openstreetmap data (which btw is maintained by the Nominatim maintainer and uses Nominatim data as input). A lot less time than it took to craft underhanded slurs (implying that the current code is not written to “professional” standards for example).

There are however a number of issues with deploying it on openstreetmap.org, some technical (language support), some strategic (osm.org is not intended as an “enduser” map site). Naturally expecting the same results as a product that can use your complete search history and has received 100 of millions of $ investment over the years is misguided.

5 Likes

I think I should have added a “trigger” warning for the “throwing money at the problem” bit. It was mostly a direct response to this section:

“old” and “papers wirtten in the last century” somehow implies that the issue has already been solved, and all it takes is to secure some funding and have it implemented by someone.

My response was an attempt to offer a different perspective to this idea, and think more about the big picture, and in particular the long term implications. And most importantly, have those ideas reviewed by the subject matter experts early on.

By the way, EWG did the same mistakes initially and involved project maintainers towards the end of their proposal process only. I guess everyone is still learning here.

5 Likes

:shield: There have been some flagged posts on this thread. I’m happy to see that several people have made an effort to calm things down by clarifying their intention in later posts, so there seems to be no immediate need for moderator action. There has been some unnecessarily triggering language on all sides, though, so please keep things civil going forward.

Also, I’ve split off the KFC naming debate.

5 Likes

Thanks for doing that. I hadn’t originally thought it would get that many replies or I would have just started a new topic. Although, I’ve copied my comment from my original post below since I think it directly relates to this discussion.

“You can improve Nominatim all day to consider dots and gaps in it’s search results, but it’s still going to give subpar results unless consistent, across the board standards for how to use the name tag are followed. Especially when it comes to abbreviations, but it’s really a more generally problem of inconsistency, which is particularly bad when it comes to “brand” tagging.”

1 Like

I don’t think anyone in this discussion at least is expecting Nominatim to give the same results as Google, but it should at least work as intended and in this particular case it doesn’t seem to. I don’t think we should just accept subpar search results because the intent of openstreetmap.org isn’t to be a “enduser” map either. Otherwise just don’t have a search function in the first place :man_shrugging: Personally, I’ve used it for plenty of things that weren’t specifically “enduser” related. So it’s not like the only people that would benefit from it working well are Google Maps aficionados or whatever :roll_eyes:

There is a shortcoming of nominatim we have in some countries where streets are named after the full names of people but common usage often omits the given names. Unless we tag short names (or alt_names) “for nominatim”, the streets aren’t found. Nobody (literally) would expect from a search form that they had to type the complete name.

On the other hand, there have been lots of improvements, and nominatim does things now that it couldn’t do some time ago. The problem with the names can also be seen as a tagging problem, because there are some rare cases where the given names should not be omitted to avoid confusion.

Specifically, Photon and Pelias implement find-as-you-type or autocompleting search, as well as some tolerance for typos. These features can be viewed as elements of fuzzy search, but it’s a very large problem space. For example, fuzzy search can refer to Google’s (sometimes infuriating) behavior of substituting what it thinks is a synonym. Or it could mean accepting a search term that “sounds like” the real name, based on Soundex, Caverphone, or the various CJK transcription schemes. I’ve even heard of geocoders considering certain typos more than others based on the proximity of keys on a specific keyboard layout.

Before we get too far ahead of ourselves, partial string matching would be a noticeable usability improvement over requiring more or less an exact match in one of the name tags, but it would require some care to avoid burying current good results in less good ones. Nominatim appears to do some partial string matching, but the following issue remains open probably because there are a lot of false positives and false negatives:

Martin Luther King illustrates both this problem and the reverse: in some cases the street is named Dr. Martin Luther King Jr. Boulevard but people know it as King Boulevard or MLK Boulevard; in other cases, the street is named Martin Luther King Street but people know it as Martin Luther King Jr. Street. It isn’t a big deal to special-case a few world-famous figures like Dr. King, but some things have been named “King” after a slaveowner, while other things may be named after Martin Luther, tripping up a pure edit distance algorithm. name:etymology:wikidata can help to solve this problem more generally.

yes, or “Via Camillo Benso conte di Cavour” which is more commonly spoken as “Via Cavour”, e.g.

I could be easier than we think. As far as I know, the PostgreSQL database used by OSM already supports fuzzy search using trigrams.

1 Like

The codebase is at GitHub - osm-search/Nominatim: Open Source search based on OpenStreetMap data, so if you believe it’s easy, I’m sure the maintainers would be delighted to receive a high-quality pull request!

2 Likes

I know your just trying to encourage @amadvance to do a pull request, but your comment comes off as rather patronizing. Or like your trying to call @amadvance’s bluff that it’s easy to fix by linking to GitHub and telling him to do it then when you know he’s probably not going to. Even if that’s not what your doing though, people should be able to have an opinion about something without being told that they should fix the problem themselves if they think it’s so easy to deal with. Really, the same goes for your last comment. No one thinks computers are easy or that OSM isn’t made by people applying their talents to make things better or whatever. This issue isn’t going to get resolved by subtly taunting commenters or doing aspirational hand-waving.

No, I’m not trying to call anyone’s bluff. I’m saying where the code is. @amadvance looks like he knows more than I do on the subject, because I couldn’t have told you that the database used by Nominatim supports trigrams (I don’t even know what trigrams are). It sounds like he might be able to make a helpful contribution so I was pointing him to the right location to do so. Assume good faith.

2 Likes

I don’t disagree with that, but it doesn’t mean you couldn’t have written the message in a way that sounds less like you were saying “If you think it’s so easy then do it yourself :roll_eyes:” and more like “It sounds like this is something you know a lot about. We appreciate really new contributors if you want to work on this here’s where you can.” Honestly though, I probably wouldn’t have said anything about it except you’ve treated people the same way in other discussions before and it’s never came off as genuine.

I think I did that when I know your just trying to encourage @amadvance to do a pull request. I hope you’d agree that we should all be able to accept feedback on how we are approaching others without defaulting to assuming the person giving the feedback is just being bad faithed about it. At the end of the day my feedback about your approach is pretty milk toast. I know I’ve been pretty harshly dragged through the dirt myself for way less and I assure that wasn’t the intent behind my comment.

Implementing some cool stuff only to find out that maintainers don’t consider it a good fit for the project may cause some frustration on all sides.

For any sort of non-trivial change, start with a discussion first: Discussions · osm-search/Nominatim · GitHub

Let’s be sure everyone understands the problem space in sufficient detail. It’s a good opportunity to discuss pros and cons of the proposed solution, its implications on the implementation, new dependencies, performance, hardware requirements, etc.

5 Likes

I guess there isn’t a Nominatim/searching issue specific forum category huh? If not it might be worth creating one for conversations like this where the discussion starts out being to general to justify a GitHub issue. I’m sure there’s plenty of people out there who have questions or problems about Nominatim and/or search the website but know nothing about GitHub and don’t want to use it.

that is discussion forum solely about Nominatim, in Nominatim repository