dinsdag 23 februari 2010

Query-time JOIN operator

Everyone who is active in the information access business knows that it is sometimes very necessary to combine the data from two resultsets into one.

Example:
The main search focusses on finding information from within a document. A document can have relations with many other datatype like geographical data like authors. Authors can have metadata themselves, like age, hobbies etc.

Now let say you want to find documents that contain the keyword "snow" and that are written by authors that have the hobby "skydiving", or you want to show the hobby of the author of a book in the result list.

For this to be searched in a search engine that doesn't have the possibility to combine the two types of information, you have two options:
1. to make this kind of search the data has to be flattened. With this we mean that all the information that can be related to a document must be indexed along with that document. This means that the fact that author X has the hobby skydiving, this must be stored with every instance where the author is X, while we already know this. This can lead to a dramatic expanding index.
2. If you want to show information from another recordset, then you have to make 1 extra query for each result in your main resultset (documents and authors) to find the hobby of an author.

In this day and age we are trying to make information more accessible and usefull by showing relations that search results have with other types of information so that users get more insight.
Especially within BI applications this functionality is needed because the type of systems that have to be connected is very divers and the data can not always be flattened for reasons of diversity or just because this would mean that lot's of information has to be duplicated...

The query-time JOIN function is very powerfull to make this possible.

Not very many search vendors have this function in there product. I know that Attivio and Exalead are capable of doing this.

Geen opmerkingen: