dinsdag 28 juli 2009

Relevance problems over simplified

A few days ago there was an article on CMS Watch about relevance ranking.

Grant Ingersoll of Lucid imagination reacted to that post:

I couldn’t agree more with Therea Regli’s excellent discussion of relevance, especially the point to be “skeptical” of why results are the way they are. This is definitely true for search application developers too. The problem is, if you’re using a proprietary vendor, the only thing you can do about such skepticism is bang your head against the wall. For Apache Lucene and Solr, on the other hand, understanding why something scored the way it did is as simple as making an API call (Lucene) or adding &debugQuery (Solr) to your input and you get, in full unadultered glory every last detail about why a particular document scored a particular way for a given query. Furthermore, if that doesn’t satisfy you, just pop open the source code or ask us or ask on the mailing list!
I must react to the technical over simplification of this relevancy problem. Off course there are options in several search engines that explain why a document has a high relevance score in relation to the query. Solr / Lucene obviously has excelent functions to show it, but I must say that even Autonomy IDOL displays links, scores, terms and weights that explain this also.

Lucid must be carefull not to react to every case about search and relevancy ranking with the "Open source is open so you can build your own search engine" attitude. Let's be realistic about the fact that implementing Search is not cheap and transparent. That's exactly why Lucid entered the business.

2 opmerkingen:

Daniel Tunkelang zei

I didn't see Grant's post, but I'm surprised that he'd put forth such an oversimplified view. I corresponded with him while writing a book on faceted search, and I found him to be civil and respectful, despite my association with one of those proprietary vendors.

What is interesting to me is that we have such different notions of what it means for relevance ranking to be transparent. To me, it means result ordering (and filtering) that is explainable to users, not just visible to application developers. Granted, this is tip of the iceberg of a philosophical debate between system-centric and user-centric approaches to search.

Grant Ingersoll zei

I guess I was more reacting to the fact that I hear a lot of application developers/customers coming into Solr/Lucene from other places saying they want to know more about why a result is the way it is and that they want to have more control over it, including, sometimes, the ability to add their own scoring code.

I also don't think Lucid (nor me) reacts to every case as "open source is open so you can build your own..." even if we did this time, but we are, obviously, big believers in the power of open source and what it has to offer. Still point taken.

You are also right, search is not cheap and transparent, but it needn't be excessively expensive and murky, either.