I think OSM's search system needs to be improved

Ok. noted. I wasnā€™t sure. May mmd forgive my pitbull defence of the peace of the channel.

1 Like

Yup. Iā€™m absolutely not telling Nominatim or other developers what to do. Itā€™s not my role. The EWG is the only entity that can take a formal position for OSMF on the value of a proposal from any source, including this channel. And EWG can use that opinion to negotiate with a developer and secure funding etc.

The Engineering Working Group (EWG) is charged with

  • Handling software development paid for by the OSMF
  • Putting out calls for proposals on tasks of interest, and accepting proposals on other tasks
  • Offering a platform for coordination of software development efforts across the OSM ecosystem
  • Managing OSMā€™s participation in software mentorship programs"

Further, I believe FOSS developers should be paid a fair fee for commissioned work - I personally wouldnā€™t refer to paying a fair fee as ā€œthrowing money at the problemā€. Right?

1 Like

It really depends on what the time to cost to benefit ratio is, which my guess would be not so great given the complexity of the problem :man_shrugging:

Even superficial research on the topic of ā€œfuzzyā€ search on OSM data would have immediately turned up https://photon.komoot.io/ GitHub - komoot/photon: an open source geocoder for openstreetmap data (which btw is maintained by the Nominatim maintainer and uses Nominatim data as input). A lot less time than it took to craft underhanded slurs (implying that the current code is not written to ā€œprofessionalā€ standards for example).

There are however a number of issues with deploying it on openstreetmap.org, some technical (language support), some strategic (osm.org is not intended as an ā€œenduserā€ map site). Naturally expecting the same results as a product that can use your complete search history and has received 100 of millions of $ investment over the years is misguided.

5 Likes

I think I should have added a ā€œtriggerā€ warning for the ā€œthrowing money at the problemā€ bit. It was mostly a direct response to this section:

ā€œoldā€ and ā€œpapers wirtten in the last centuryā€ somehow implies that the issue has already been solved, and all it takes is to secure some funding and have it implemented by someone.

My response was an attempt to offer a different perspective to this idea, and think more about the big picture, and in particular the long term implications. And most importantly, have those ideas reviewed by the subject matter experts early on.

By the way, EWG did the same mistakes initially and involved project maintainers towards the end of their proposal process only. I guess everyone is still learning here.

5 Likes

:shield: There have been some flagged posts on this thread. Iā€™m happy to see that several people have made an effort to calm things down by clarifying their intention in later posts, so there seems to be no immediate need for moderator action. There has been some unnecessarily triggering language on all sides, though, so please keep things civil going forward.

Also, Iā€™ve split off the KFC naming debate.

5 Likes

Thanks for doing that. I hadnā€™t originally thought it would get that many replies or I would have just started a new topic. Although, Iā€™ve copied my comment from my original post below since I think it directly relates to this discussion.

ā€œYou can improve Nominatim all day to consider dots and gaps in itā€™s search results, but itā€™s still going to give subpar results unless consistent, across the board standards for how to use the name tag are followed. Especially when it comes to abbreviations, but itā€™s really a more generally problem of inconsistency, which is particularly bad when it comes to ā€œbrandā€ tagging.ā€

1 Like

I donā€™t think anyone in this discussion at least is expecting Nominatim to give the same results as Google, but it should at least work as intended and in this particular case it doesnā€™t seem to. I donā€™t think we should just accept subpar search results because the intent of openstreetmap.org isnā€™t to be a ā€œenduserā€ map either. Otherwise just donā€™t have a search function in the first place :man_shrugging: Personally, Iā€™ve used it for plenty of things that werenā€™t specifically ā€œenduserā€ related. So itā€™s not like the only people that would benefit from it working well are Google Maps aficionados or whatever :roll_eyes:

There is a shortcoming of nominatim we have in some countries where streets are named after the full names of people but common usage often omits the given names. Unless we tag short names (or alt_names) ā€œfor nominatimā€, the streets arenā€™t found. Nobody (literally) would expect from a search form that they had to type the complete name.

On the other hand, there have been lots of improvements, and nominatim does things now that it couldnā€™t do some time ago. The problem with the names can also be seen as a tagging problem, because there are some rare cases where the given names should not be omitted to avoid confusion.

Specifically, Photon and Pelias implement find-as-you-type or autocompleting search, as well as some tolerance for typos. These features can be viewed as elements of fuzzy search, but itā€™s a very large problem space. For example, fuzzy search can refer to Googleā€™s (sometimes infuriating) behavior of substituting what it thinks is a synonym. Or it could mean accepting a search term that ā€œsounds likeā€ the real name, based on Soundex, Caverphone, or the various CJK transcription schemes. Iā€™ve even heard of geocoders considering certain typos more than others based on the proximity of keys on a specific keyboard layout.

Before we get too far ahead of ourselves, partial string matching would be a noticeable usability improvement over requiring more or less an exact match in one of the name tags, but it would require some care to avoid burying current good results in less good ones. Nominatim appears to do some partial string matching, but the following issue remains open probably because there are a lot of false positives and false negatives:

Martin Luther King illustrates both this problem and the reverse: in some cases the street is named Dr. Martin Luther King Jr. Boulevard but people know it as King Boulevard or MLK Boulevard; in other cases, the street is named Martin Luther King Street but people know it as Martin Luther King Jr. Street. It isnā€™t a big deal to special-case a few world-famous figures like Dr. King, but some things have been named ā€œKingā€ after a slaveowner, while other things may be named after Martin Luther, tripping up a pure edit distance algorithm. name:etymology:wikidata can help to solve this problem more generally.

yes, or ā€œVia Camillo Benso conte di Cavourā€ which is more commonly spoken as ā€œVia Cavourā€, e.g.

I could be easier than we think. As far as I know, the PostgreSQL database used by OSM already supports fuzzy search using trigrams.

1 Like

The codebase is at GitHub - osm-search/Nominatim: Open Source search based on OpenStreetMap data, so if you believe itā€™s easy, Iā€™m sure the maintainers would be delighted to receive a high-quality pull request!

2 Likes

I know your just trying to encourage @amadvance to do a pull request, but your comment comes off as rather patronizing. Or like your trying to call @amadvanceā€™s bluff that itā€™s easy to fix by linking to GitHub and telling him to do it then when you know heā€™s probably not going to. Even if thatā€™s not what your doing though, people should be able to have an opinion about something without being told that they should fix the problem themselves if they think itā€™s so easy to deal with. Really, the same goes for your last comment. No one thinks computers are easy or that OSM isnā€™t made by people applying their talents to make things better or whatever. This issue isnā€™t going to get resolved by subtly taunting commenters or doing aspirational hand-waving.

No, Iā€™m not trying to call anyoneā€™s bluff. Iā€™m saying where the code is. @amadvance looks like he knows more than I do on the subject, because I couldnā€™t have told you that the database used by Nominatim supports trigrams (I donā€™t even know what trigrams are). It sounds like he might be able to make a helpful contribution so I was pointing him to the right location to do so. Assume good faith.

2 Likes

I donā€™t disagree with that, but it doesnā€™t mean you couldnā€™t have written the message in a way that sounds less like you were saying ā€œIf you think itā€™s so easy then do it yourself :roll_eyes:ā€ and more like ā€œIt sounds like this is something you know a lot about. We appreciate really new contributors if you want to work on this hereā€™s where you can.ā€ Honestly though, I probably wouldnā€™t have said anything about it except youā€™ve treated people the same way in other discussions before and itā€™s never came off as genuine.

I think I did that when I know your just trying to encourage @amadvance to do a pull request. I hope youā€™d agree that we should all be able to accept feedback on how we are approaching others without defaulting to assuming the person giving the feedback is just being bad faithed about it. At the end of the day my feedback about your approach is pretty milk toast. I know Iā€™ve been pretty harshly dragged through the dirt myself for way less and I assure that wasnā€™t the intent behind my comment.

Implementing some cool stuff only to find out that maintainers donā€™t consider it a good fit for the project may cause some frustration on all sides.

For any sort of non-trivial change, start with a discussion first: osm-search/Nominatim Ā· Discussions Ā· GitHub

Letā€™s be sure everyone understands the problem space in sufficient detail. Itā€™s a good opportunity to discuss pros and cons of the proposed solution, its implications on the implementation, new dependencies, performance, hardware requirements, etc.

5 Likes

I guess there isnā€™t a Nominatim/searching issue specific forum category huh? If not it might be worth creating one for conversations like this where the discussion starts out being to general to justify a GitHub issue. Iā€™m sure thereā€™s plenty of people out there who have questions or problems about Nominatim and/or search the website but know nothing about GitHub and donā€™t want to use it.

that is discussion forum solely about Nominatim, in Nominatim repository