vrijdag 19 maart 2010

What Some people think of Enterprise search...

Is sometimes not what you think they mean.

Read the Article "Google Dominates Enterprise Level Search" (http://www.marketingpilgrim.com/2010/03/google-dominates-enterprise-level-search.html).

No... you have not been sleeping for 3 year while Google has penetrated the Enterprise. It is just a wrong use of the words "Enterprise search". The author means "the use of web search from within the enterprise to fullfil a business information need".

Here is my comment on the Article I posted on the site:
Hi,

When I saw the title of this article I was suprised. How can it be that, as an expert on enterprise search, I missed this disruptive news?

On the other hand I thought this must be a Google Sponsored Article because Google is just starting its search business in the enterprise.

Reading the article it came clear that you don't use the concept "Enterprise search" as we search experts do. I will make myself clear.

Enterprise search is all about making the information that is stored in databases, CMS's, DMS's, Fileshares etc. etc. WITHIN the ENTERPRISE accessible for SEARCHin and data discovery. Also see the definition on Wikipedia: http://en.wikipedia.org/wiki/Enterprise_search.
Players in this field are for instance Autonomy, Exalead, FAST (Microsoft), Attivio, Coveo and Endeca.
Google is big in internet search but still building its enterprise search proposition and market share. The Google Search Appliance (GSA) lacks some important functional aspects that are need in the enterprise search environment. Yahoo simple doesn't exist in the enterprise domain and neither is Bing.

What you mean in your article is the share that web search engines like Google, Yahoo and Bing have when searching business related information. When someone in a company while doing his or her work, searches for information on the internet, you call that enterprise search.

You should call it business-use of web search engines.

woensdag 17 maart 2010

Blending "Real-time" results in resultlist

Today I had a meeting with a client and an external party that is going to develop the internal and external websites (intranet and internet) for them.

Subject of the meeting was the relation that the (re)development of the websites have with "search" functionality. We tried to get a vision on how the sites could evolve with the use of search technology.

Of course the issue of relevancy came to speak and I explained that relevancy is subjective and influenced or even driven by the context of the user and even of the context of the information it self (the semantics of information but it would be off-topic to get into that in this post).

One aspect of relevancy, "freshness" or "new information" triggered the person from the webdesign / webbuilding company to refer to Google and the integration of real-time result into their search results. He presented this as if current information came op automatically when searching for something and that this proved that the freshness of information was taken into account when calculating the relevance of all the results.

I would like to differ on this statement. When I look at the integration of real-time information in search results, Google has made a seperate section to show results from real-time sources like Twitter. This is the only way to do this, because there is a difference between "freshness" of a result and the "information value" of other results. You can not mix 140 character messages about, for instance, the islam with a profound description of what the Islam is al about.

Just watch this video and you will see what I mean. The Google interface serves 2 information needs by "clustering" the results in a "real-time"-section and a "contents / value based"-section.

maandag 15 maart 2010

Search initiative Newssift is no more

Just read an article about pulling the plug on the advanced search feature of the Financial Times: Newssift.
About one year ago the interface got some pretty positive reactions, due to the fact that this kind of user interface was fearly new on a public website.
Powered by Endeca the functionality heavily made use of facets. All categories, subjects etc. were made-up of metadata in the content, were the values are used to generate the lists of choices.

It's a shame that FT has no logging or other results as to the way the service was used and if the users liked or disliked this new way of searching FT content.

As a search professional I am always looking out for experiences with non-conventional search interfaces. Could this experiment have been more succesfull when guided and analyzed?

See: http://paidcontent.co.uk/article/419-financial-times-shuts-semantic-search-service-newsift/

donderdag 11 maart 2010

Excellent clarification on semantic search

Today I received an update on one of te discussions on Semantic search on LinkedIn. Charlie Hull put up an excellent example on how semantic search works. This has to do with the capabilities of the search technology that is used in a specific situation, but also with the fact that a search application has to engage in a dialog with the user to assess his meaning or context. This has to do with the fact that most users just use 1 to 3 words to formulate a query. There's not much you can do with such a query in the first try. But... the search application has to pick up on those keywords and try to make something out of it.
The next step is to try to ask the user what he means.

Semantic search technology - does it actually exist? 33 comments »

Started by Charlie Hull

At Expert System we have been building semantic search systems for 20 years. Here is what we learned in serving 100's of corporate customers. A semantic search system must establish and store the CONTEXT of content. Then you need an interface to choose the CONTEXT you would like so a match can be made.

Establishing CONTEXT means the following processes must be followed. 1). word morphology (e.g. stems), 2). word roles (e.g. nouns, verbs, etc.) 3). word logic (e.g. subject - verb - object reduction) and 4). sense disambiguation (e.g. assignment of a definition for each word based on the best fit from available alternatives and in the context of the rest of the sentence(s). All 4 of these methods require the use of a semantic network that is both broad - covers the majority of the language to be used and deep - has many ways in which words relate to one another.

With the above approach will you reach a precision ("accuracy") and recall ("completeness") in search beyond the 80% mark. With further customization a 90% mark is easily achieved. Systems that rely on statistical / heuristic methods typically fall far short of these benchmarks. This is true since statistical / heuristic methods cannot fully establish logic and disambiguation.

Finally the interface must be constructed in a way that allows the user to tell the system what CONTEXT the query is in. Full natural language questions using the above methods can do this automatically. But the reality is we live in a 1-3 query word world. So allowing the user to select the word sense of one or more of the query words gives the system much more "to chew on" and is not generally an intrusion for the user. Similar user interface interactions include showing categorical, domain, people, places, organization outcomes from a search which are clickable, showing lists of semantic triples (subject-predicate-object) from which to choose, etc. All of these are at most 1-2 more clicks than a normal keyword based search but improve the experience immensely.

Such interfaces also allow what we call a 3-step walk through search where step 1 is about precision - less of a list, step 2 is an expansion of concepts - to include things related but that you did not know about, and step 3 another step of precision. This "ratcheting" effect therefore begins to bring into the Enterprise Search function other important aspect of corporate work like discovery, exploration and analysis. http://www.expertsystem.net By Brooke Aker