Information Access and (Enterprise) Search / Zoeken en Vinden: 2010

woensdag 13 oktober 2010

Toelichting op het fenomeen "stopwoorden"

Ik heb al vaak uitgelegd wat "stopwoorden" zijn in de context van een zoekomgeving en waarom het gebruiken van een stopwoordenlijst van belang is.
Mijn laatste zeer korte toelichting wil ik hierbij delen:

Onze taal bevat veel worden die feitelijk geen betekenis geven aan de inhoud van een zin maar wel veelvuldig worden gebruikt. Duidelijke voorbeelden zijn woorden als “de”, “het” en “een”.

Voor een zoekopdracht zijn deze worden betekenisloos en verzwarend als ze in de index zitten. Als je zoekt op “de hond”, dan gaat er uiteraard om het zelfstandig naamwoord “hond”.

Het gaat echter ook om werkwoorden die geen betekenis toevoegen. Denk daarbij aan “worden”, “zullen”etc.

Een zoekmachine probeert onderscheid te maken tussen relevante en niet-relevante informatie die in de indexen zelf zit, zodat tijdens het zoeken gebruik kan worden gemaakt van het gegeven of een zoekwoord voor een bepaald document meer of minder relevant is.
De uitkomst daarvan zorgt er weer voor dat bepaalde documenten hoger of lager in de resultaatlijst komen te staan.

Stopwoorden zullen tijdens het indexeren uit de content worden gehaald. Ook bij het stellen van de zoekvraag worden woorden die in de stopwoordenlijst staan, verwijderd.
Het is dus ook niet mogelijk om te zoeken op woorden die in de stopwoordenlijst staan, omdat die woorden gewoonweg niet in de index voorkomen!!!

Door het aanleggen van een goede, domeinspecifieke, stopwoordenlijst zal de relevantie van gevonden informatie dus toenemen en zal de snelheid verbeteren.

vrijdag 1 oktober 2010

Onderscheidend vermogen facetted search vervaagd

Dankzij een bezoeker van een "round table" sessie van search vendors kunnen we smullen van de opmerkingen van de grote zoekmachines zoals Autonomy, Google en Endeca:

"I was at a search vendor round table today...". Ook op de blog Beyond Search is dit artikel besproken.

In eerdere jaren was het feit dat een search engine die facetted search in zich had, een reden om die oplossing te kiezen boven andere.

In het artikel gaat het met name om de vraag "waarom moeten klanten veel geld betalen voor een technologie die door open source (Solr) oplossingen gratis worden aangeboden?".

Alle commerciële aanbieders van search software waren niet in staat om aan de lezer een goed argument te geven van het verschil...

Voor de afnemers van een zoekoplossing voegt dit aan keuze veel toe. Het maakt immers niet veel uit welke aanbieder gekozen moet worden voor dit aspect op basis van functionele wensen. Search wordt daarmee inderdaad steeds meer een "commodity".

Uiteraard moet de keuze voor een zoekoplossing op meer dan alleen dit aspect worden beoordeeeld, maar het geeft wel aan dat "speciale features" minder discriminerend worden.

donderdag 30 september 2010

Zoekfunctie in Sharepoint voldoet niet

Vandaag blijkt uit een bericht op fiercecontentmanagement dat gebruikers niet tevreden zijn met de zoekfuncties in Sharepoint.

Stephen Arnold haalt dit onderzoek aan in zijn weblog "Fierce critisism of Sharepoint".

Ik onderschrijf deze kritiek. Dit is nou juist waarom zowel Google als Autonomy gebruikers van Sharepoint zien als een potentiele klant. Het IDOL-platform én de Google Search Appliance hebben bewezen dat ze betere resultaten kunnen bereiken met het implementeren van een zoekoplossing op Sharepoint.

Sharepoint is (op dit moment) een goede oplossing voor content en (beperkt) document management. De zoekfunctie kan echter beter worden overgelaten aan andere, in search gespecialiseerde, zoekmachines.

vrijdag 6 augustus 2010

Disclaimers en footers en e-mail bedreigen vindbaarheid

Toen ik net een mailtje ontving van een klant (van de overheid) viel mijn oog op de extra gegevens en disclaimer die onder het bericht stonden.

De extra gegevens, ik noem het bewust geen informatie, nemen meer ruimte in beslag dan de daadwerkelijk inhoud van de mail, die wel informatief is.

Nu is het zeer gebruikelijk dat e-mails van grote organisaties ellenlange disclaimers en commerciële uitingen bevatten, maar dit vormt wel een bedreiging voor de vindbaarheid van informatie.

In het voorbeeld van de klant waar ik aan refereerde komt in alle e-mails van alle medewerkers van die organisatie het woord "identiteitsbewijs" voor. Als ik op dat woord zoek in GMail bijvoorbeeld dan worden die zoekresultaten overspoeld met berichten van medewerkers van die klant, terwijl deze berichten totaal niet relevant zijn.

Stel je voor dat de e-mail van medewerkers in een enterprise-omgeving wordt opgenomen in een zoekindex of informatieplatform dat op een searchengine draait.

Ik stel voor dat het toevoegen van disclaimers en andere extra info aan e-mailberichten voortaan op de manier wordt gedaan die zo eigen is voor de digitale wereld: Gebruik een link naar een tekst ergens op een publiek toegankelijke website.

donderdag 5 augustus 2010

Zoeken is niet meer alleen zoeken: de definitie van Enterprise Search

Een zeer goed artikel op de site van AIIM getiteld: "What is Enterprise Search".

Search is not just search anymore, and the analyst company Gartner has in recent years been using the term "Information access technology" to include and expand on what they previously called "enterprise search technology". They use the term information access to include a collection of technologies to help you find information, such as;

* enterprise search;
* content classification, categorization and clustering;
* fact and entity extraction;
* taxonomy creation and management;
* information presentation (for example visualization).

This is a useful expansion of the problem set, but we should keep in mind that many of the tools around extraction, classification, and categorization remain supplementary to the essential professional task of organizing information.

Ze hebben een "Information Organization & Access (IOA) Certificate program" opgezet om recht te kunnen doen aan de vele facetten die met zoeken te maken hebben.

In onderstaande afbeelding is de samenhang tussen de verschillende subsystemen die met search te maken hebben, geillustreerd:

Het goed is dat ze het probleem van zoeken en vinden dus niet alleen neerleggen bij zoektechnologie, maar ook het beheer en de kwaliteit van de informatie in de keten betrekken.

In "zoekprojecten" is één de eerste zaken die wij doen, het uitvoeren van een data-analyse waarbij we (naast een hadmatige analyse) de ruwe content indexeren. De uitkomsten van zoekvragen op die ruwe content leveren zeer waardevolle inzichten op de kwaliteit van de data. Kwaliteit van de content is een essentiele factor bij het toegankelijk maken van informatie.

maandag 2 augustus 2010

Zoeken met een tilde (~) op Google

Een nieuwe(?) operator bij Google is de tilde (~). Door dit teken voor een zoekterm te plaatsen zoek je niet alleen naar het betreffende woord, maar ook naar synoniemen of vergelijkbare betekenissen van het woord:

If you use Google to navigate the Internet, this just might be the coolest thing you read today: There’s a simple operator that lets you search for a word and all of its synonyms. If you place a tilde (~) before the word or phrase you’re searching with no spaces between the tilde and its associated word, you’ll conduct a search for the word, its synonyms, and terms with alternate endings.

GSA ondersteund Opensearch en Twitter integratie

Net even een blik geworpen op de Google Enterprise Labs Site. Er blijken twee interessante nieuwe features voor de Google Search Appliance te zijn.

Ondersteuning voor OpenSearch
Het OpenSearch protocol beschrijft een standaard manier om een zoekmachine aan te spreken en om resultaten terug te krijgen. Hierdoor kunnen zoekapplicaties gebruik maken van zoekmachines zonder kennis te hebben van specifieke syntaxen van searchengines en de resultaten daarvan.
Internet Explorer en Firefox maken gebruik van het Opensearch protocol voor de zoekbox in die browsers. Dit maakt het mogelijk om een zoekmachine toe te voegen aan de lijst met searchengine die meestal alleen bestaat uit Google en Bing.
Omdat de Search Appliance binnen organisaties wordt gebruik voor het doorzoekbaar maken van de bedrijfsgegevens is dit nieuws met name voor enterprises belangrijk.
Het is nu mogelijk om je intranet of andere bronnen die met de GSA zijn indexeert, direct vanuit je browser te doorzoeken!

Related Twitter Results
Real-time informatie wordt steeds belangrijker om een zoekopdracht compleet te maken. Stel je voor dat je een zoekopdracht binnen je bedrijf kunt opgeven om alle informatie over een bepaald onderwerp te vinden binnen je intranet, file systemen etc. en daarnaast gelijk kan zien wat er op dit moment speelt rondom dat onderwerp.
De Google Search Appliance heeft nu een uitbreiding om Twitter te doorzoeken op het voorkomen van tweets op basis van de zoekvraag die je ingeeft.
Je krijgt zo resultaten van de info binnen je bedrijf, maar ook de info die op dit moment op Twitter beschikbaar is!

vrijdag 30 juli 2010

Het intranet verandert. Vergeet niet de zoekmachine mee te veranderen

Artikel over het veranderen van het intranet van een communicatie en opslag tool naar een informatieplatform en samenwerkingstool: http://thenextweb.com/socialmedia/2010/07/14/the-intranet-is-dead-long-live-the-intranet.

Er wordt veel opgegeven over de social media en collaborative tools zoals Blogs, Bookmarking (del.icio.us), Real time document creation (Google Docs) en mengvormen daarvan (Google Wave) etc.

Wil het veranderende intranet de vergelijking met de mogelijkheden op internet doorstaan, dan is een uitstekende zoekmachine van groot belang. Want, wees nou eerlijk. Wat zijn al die Web 2.0 hulpmiddelen waard zonder dat de informatie via Google kunt vinden?

Het verschil tussen het "oude" intranet het het "nieuwe" intranet zit hem nl. ook in het niet meer in één systeem hebben van alle informatie. Je gebruikt een bepaalde toepassing voor één bepaald doel. Niet alle functies hoeven door één systeem ondersteund te worden.

Het hebben van een goede zoekoplossing is noodzakelijk om alle informatie toch integraal binnen handbereik te hebben.

IA-Search is going dutch!

I decided to continue this site in English. The reason for this is simple. My main focus is the Netherlands and I think I can reach more people in Holland when I start writing in Dutch.

woensdag 21 april 2010

Apache Lucene gives birth on triples... and...

The Lucid Imagination Blog has today posted a blog about the exciting new thing in Lucene, namely "Triplets". The only thing that I mis in this blogpost is any information about "Apache Lucene gives birth to triplets!".

The blog is nothing more than a requiem to the Apache / Lucene project. How fantastic is the ongoing development of Lucene and Lucene and of course... Solr...

Come on LucidImagineers... you can do more than that... give us an insight in the meaning of this new exciting feature.

Source: http://www.lucidimagination.com/blog/2010/04/21/news-flash-apache-lucene-gives-birth-to-triplets/

vrijdag 19 maart 2010

What Some people think of Enterprise search...

Is sometimes not what you think they mean.

Read the Article "Google Dominates Enterprise Level Search" (http://www.marketingpilgrim.com/2010/03/google-dominates-enterprise-level-search.html).

No... you have not been sleeping for 3 year while Google has penetrated the Enterprise. It is just a wrong use of the words "Enterprise search". The author means "the use of web search from within the enterprise to fullfil a business information need".

Here is my comment on the Article I posted on the site:

Hi,

When I saw the title of this article I was suprised. How can it be that, as an expert on enterprise search, I missed this disruptive news?

On the other hand I thought this must be a Google Sponsored Article because Google is just starting its search business in the enterprise.

Reading the article it came clear that you don't use the concept "Enterprise search" as we search experts do. I will make myself clear.

Enterprise search is all about making the information that is stored in databases, CMS's, DMS's, Fileshares etc. etc. WITHIN the ENTERPRISE accessible for SEARCHin and data discovery. Also see the definition on Wikipedia: http://en.wikipedia.org/wiki/Enterprise_search.
Players in this field are for instance Autonomy, Exalead, FAST (Microsoft), Attivio, Coveo and Endeca.
Google is big in internet search but still building its enterprise search proposition and market share. The Google Search Appliance (GSA) lacks some important functional aspects that are need in the enterprise search environment. Yahoo simple doesn't exist in the enterprise domain and neither is Bing.

What you mean in your article is the share that web search engines like Google, Yahoo and Bing have when searching business related information. When someone in a company while doing his or her work, searches for information on the internet, you call that enterprise search.

You should call it business-use of web search engines.

woensdag 17 maart 2010

Blending "Real-time" results in resultlist

Today I had a meeting with a client and an external party that is going to develop the internal and external websites (intranet and internet) for them.

Subject of the meeting was the relation that the (re)development of the websites have with "search" functionality. We tried to get a vision on how the sites could evolve with the use of search technology.

Of course the issue of relevancy came to speak and I explained that relevancy is subjective and influenced or even driven by the context of the user and even of the context of the information it self (the semantics of information but it would be off-topic to get into that in this post).

One aspect of relevancy, "freshness" or "new information" triggered the person from the webdesign / webbuilding company to refer to Google and the integration of real-time result into their search results. He presented this as if current information came op automatically when searching for something and that this proved that the freshness of information was taken into account when calculating the relevance of all the results.

I would like to differ on this statement. When I look at the integration of real-time information in search results, Google has made a seperate section to show results from real-time sources like Twitter. This is the only way to do this, because there is a difference between "freshness" of a result and the "information value" of other results. You can not mix 140 character messages about, for instance, the islam with a profound description of what the Islam is al about.

Just watch this video and you will see what I mean. The Google interface serves 2 information needs by "clustering" the results in a "real-time"-section and a "contents / value based"-section.

maandag 15 maart 2010

Search initiative Newssift is no more

Just read an article about pulling the plug on the advanced search feature of the Financial Times: Newssift.
About one year ago the interface got some pretty positive reactions, due to the fact that this kind of user interface was fearly new on a public website.
Powered by Endeca the functionality heavily made use of facets. All categories, subjects etc. were made-up of metadata in the content, were the values are used to generate the lists of choices.

It's a shame that FT has no logging or other results as to the way the service was used and if the users liked or disliked this new way of searching FT content.

As a search professional I am always looking out for experiences with non-conventional search interfaces. Could this experiment have been more succesfull when guided and analyzed?

See: http://paidcontent.co.uk/article/419-financial-times-shuts-semantic-search-service-newsift/

donderdag 11 maart 2010

Excellent clarification on semantic search

Today I received an update on one of te discussions on Semantic search on LinkedIn. Charlie Hull put up an excellent example on how semantic search works. This has to do with the capabilities of the search technology that is used in a specific situation, but also with the fact that a search application has to engage in a dialog with the user to assess his meaning or context. This has to do with the fact that most users just use 1 to 3 words to formulate a query. There's not much you can do with such a query in the first try. But... the search application has to pick up on those keywords and try to make something out of it.
The next step is to try to ask the user what he means.

Semantic search technology - does it actually exist? 33 comments »

Started by Charlie Hull

At Expert System we have been building semantic search systems for 20 years. Here is what we learned in serving 100's of corporate customers. A semantic search system must establish and store the CONTEXT of content. Then you need an interface to choose the CONTEXT you would like so a match can be made.

Establishing CONTEXT means the following processes must be followed. 1). word morphology (e.g. stems), 2). word roles (e.g. nouns, verbs, etc.) 3). word logic (e.g. subject - verb - object reduction) and 4). sense disambiguation (e.g. assignment of a definition for each word based on the best fit from available alternatives and in the context of the rest of the sentence(s). All 4 of these methods require the use of a semantic network that is both broad - covers the majority of the language to be used and deep - has many ways in which words relate to one another.

With the above approach will you reach a precision ("accuracy") and recall ("completeness") in search beyond the 80% mark. With further customization a 90% mark is easily achieved. Systems that rely on statistical / heuristic methods typically fall far short of these benchmarks. This is true since statistical / heuristic methods cannot fully establish logic and disambiguation.

Finally the interface must be constructed in a way that allows the user to tell the system what CONTEXT the query is in. Full natural language questions using the above methods can do this automatically. But the reality is we live in a 1-3 query word world. So allowing the user to select the word sense of one or more of the query words gives the system much more "to chew on" and is not generally an intrusion for the user. Similar user interface interactions include showing categorical, domain, people, places, organization outcomes from a search which are clickable, showing lists of semantic triples (subject-predicate-object) from which to choose, etc. All of these are at most 1-2 more clicks than a normal keyword based search but improve the experience immensely.

Such interfaces also allow what we call a 3-step walk through search where step 1 is about precision - less of a list, step 2 is an expansion of concepts - to include things related but that you did not know about, and step 3 another step of precision. This "ratcheting" effect therefore begins to bring into the Enterprise Search function other important aspect of corporate work like discovery, exploration and analysis. http://www.expertsystem.net By Brooke Aker

donderdag 25 februari 2010

Semantic Search Engine: Inbeta... as in "not Alpha"?

Through a discussion on LinkedIn about real life examples of semantic search, I was pointed to the existence of Inbeta. I company with a curious name because it says that the company has a "Bèta"-status. I can't image what that says about there offerings.
But now for the offerings of the company. On their product page they have many products listed.
Of course the first one caught my eye because a "semantic search engine" is something that everybody dreams of. Imagine a search engine that gives you insight and context regarding the query of the user in relation to the information at hand and maybe also on external resources by using the sematic relations between information....

But wait... Before you think I found the holy grail of search, The sentence

"Natural Language: user will not need to search for keywords anymore, our Semantic Search understands the aim of every search query and suggests results that are relevant, thus increasing cross-selling and saving customer care costs"

had my feet put on the ground again.

This proposition on using natural language as query input and giving back relevant results based on the combinaton of words that most likely exist in the available search index, is something that has been here for years. Autonomy has marketed that concept with the name Meaning Based Computing. It all revolves around the concept of terms and weights withing documents and in relation to the words in the entire index (corpus) and matching the queried words to these calculated figures.

For a serious search engine a regard this technique almost as a must have.

But, back to the semantic side of this... Where is it?

When you want a good example of what semantics can do within a search application, take look at http://www.freebase.com/view/en/barack_obama.

It has everything to do with the context of the concepts that can be derived from a query. People have roles and jobs, names can be linked to artists, historical data etc.

dinsdag 23 februari 2010

Query-time JOIN operator

Everyone who is active in the information access business knows that it is sometimes very necessary to combine the data from two resultsets into one.

Example:
The main search focusses on finding information from within a document. A document can have relations with many other datatype like geographical data like authors. Authors can have metadata themselves, like age, hobbies etc.

Now let say you want to find documents that contain the keyword "snow" and that are written by authors that have the hobby "skydiving", or you want to show the hobby of the author of a book in the result list.

For this to be searched in a search engine that doesn't have the possibility to combine the two types of information, you have two options:
1. to make this kind of search the data has to be flattened. With this we mean that all the information that can be related to a document must be indexed along with that document. This means that the fact that author X has the hobby skydiving, this must be stored with every instance where the author is X, while we already know this. This can lead to a dramatic expanding index.
2. If you want to show information from another recordset, then you have to make 1 extra query for each result in your main resultset (documents and authors) to find the hobby of an author.

In this day and age we are trying to make information more accessible and usefull by showing relations that search results have with other types of information so that users get more insight.
Especially within BI applications this functionality is needed because the type of systems that have to be connected is very divers and the data can not always be flattened for reasons of diversity or just because this would mean that lot's of information has to be duplicated...

The query-time JOIN function is very powerfull to make this possible.

Not very many search vendors have this function in there product. I know that Attivio and Exalead are capable of doing this.

vrijdag 12 februari 2010

Misconception on Google's revenues

In the article Is Google moving too far (from search) too fast? there is a misconception:

That expansion has some analysts wondering whether Google is in danger of losing focus on what made it such a profitable company, even as those same analysts say it can't rely on search as its only avenue for making money. Right now, Google relies on search for 95% of its revenue, according to Karsten Weide, an analyst with IDC.

Google does not rely on search for it's 95% revenue. They rely on advertisements (Adwords / Adsense) around there search product on the internet. They do not sell their search service, they sell space for advertisements.
Of course they are trying to get into the enterprise where they will be making money with their search product, the GSA. But for now, those sales are just a fraction of the rest.

woensdag 10 februari 2010

Autonomy acquires MicroLink

Yesterday the guys from New Idea Engineering came with the rumour of a "large search vendor" buying in on a "partner" that has big business within the intelligence industry...

Today the Guardian releases the news that Autonomy has acquired Microlink.

MicroLink is a very valued Autonomy Partner that only 2 years ago was "partner of the year" because of their big revenues on selling IDOL licenses. MicroLink is the partner that implements Autonomy (IDOL) software within American intelligence organizations.
Big business for Autonomy and strategically important.

The question of the acquisition is: "Why, Why, Why?".

This move is not in line with the buying of Interwoven or Zantaz. Those companies / technologies added a complete suite of functionalities to penetrate other markets than just the enterprise search one.

Could it be that Autonomy wants to have more influence on the implementations with those important clients? Or could it be that MicroLink has some brilliant technological invention???

dinsdag 9 februari 2010

Lucid Imagination Launched new website

A year after Lucid Imagination launched their activities they are now ready to have a face-lift.

The company has set a name for marketing, implementing and supporting open source software based on Solr/Lucene.

For what it's worth... the new site is less "techie". The former homepage has a lot of content derived from blogs and Twitter-accounts. They cut most of those feeds. A good choice in my opinion because all the discussions and "RE:" messages on the homepage didn't add any value to it.

The company clearly is positioning itself as a separate entity with it's own face and stories, apart of the Lucene and Solr community. They are now focusing more on business value of search based solutions and added new content to the site which proves that.

Rumours from NIE

Today I found a blogpost in my feeds from New Idea Engineering (one of my most favorite search consulting companies) stating that they heard of a search vendor that wants to acquire a consulting firm for it's marketing channel.
This an example of the way the search vendors (like Autonomy) are broadening their activities beyond making and selling dedicated software. The vendors start to realize that the search proposition alone is not going to get them more marketing share.
They need to start upselling and incorporate more knowledge of their customers and added value that partners are providing.

dinsdag 26 januari 2010

To search... or not to search

There is no time like the New Year to rethink everything and take a step back in an attempt to see the proverbial forest from the trees. Often this comes to me in the form of wondering where the words we use come from.

The verb "search" comes through the Old French circare, meaning to "go about, wander, traverse," from the Latin circus or circle - A very fitting description indeed. The term comes to be known in the early fourteenth century and would exactly describe the process of looking for something or someone. An individual actually had to go about and wander around looking for what they sought.

Contrast this with the expectations placed on search engines today. Users expect the engine to know immediately upon asking, often using a query of less than two words, where the exact piece of content is they are looking for. If it does require a bit of wandering or traversing it seems to immediately frustrate the user. The desire is that the document most relevant to them is returned in the top results every time. Very little wandering or going about is expected by the user.

In reality the user is not performing a search but instructing the search engine to do so - yet we say "I am going to search for X." We query but the engine does all the "going about, wandering and traversing." This usage is very telling - the engine and the process of its searching has become an extension of the user. The expectation is that the engine, being a natural extension of themselves, knows their every desire and what they consider important.

In light of this should we should not be surprised at user's constant complaints regarding their search experience. Yet the industry seems to keep churning out more and more algorithms that focus on natural language processing, semantic search and other content focused approaches. Vendors seem to neglect and purchasers of enterprise engines keep pushing back deploying any sort of relevance methods that actually focus on understanding the user and fall for the newest vendor jargon year after year.

In this coming year I do not doubt we will see some very interesting technologies brought to market. They will undoubtedly allow us to find experts, tag results, star them and move them around, share them and socialize them - but will they seek to understand what is relevant to an individual searcher? Search profiles on a individual level do exist in some engines but usually remain fixed and static - ignoring context and behavior altogether.

I am putting in an early request - all I want for Christmas is my enterprise search engine to pay attention to me this year.

maandag 11 januari 2010

Search User Interface and User Experience

Just found a webpage that I want to share with you: http://www.searchtools.com/info/user-interface.html

Just one page but with a treasure of information about designs and patterns made for search.

IT has not enabled counter terrorisme

The article on The Standard today about IT failing in helping intelligence agencies with discovering relations between pieces of information and “connecting the dots” pretty much amazed me.

As we all (at least us information access professionals) know, the intelligence agencies in the US (and other countries) use search engine software intensively.
The homepage of Autonomy states clearly that the Department of Homeland security uses their Intelligent Data Operating Layer (IDOL). I think it’s safe to presume that the agencies not only use Autonomy software exclusively. Home grown solitions are of course combined with the best of the breed search engines and discovery tools.

Clearly the companies that are implementing the IT solutions have not done their work well or the major search vendors are promising more than they can live up to.

From experience I know that a top search engine can be made useless because of a bad implementation. On the other hand I must say that the vendors almost always paint an over simplifide picture when it comes to getting the most out of their software.

The field of search and information discovery is a daring one. One thing I hope is that the experiences and insights that are given on past events are being used to find the wholes and causes for not seeing connections in advance. That kind of knowledge can help everyone in building better solutions and information processes that feed the systems.

I know a company that can help intelligence agencies getting the most out of their search and discovery environment ;-)

LucidImagination releases LucidWorks for Solr 1.4

Finally a couple of months after the release of the Solr 1.4 distribution by Apache, LucidImagination – a company that deliveres commercial level support for the open source Solr engine – releases their certified version of the Solr 1.4 Enterprise Search Server.

This distribution contains a comprehensive manual with lots of information on how to setup and use the search server.

The documentation is outstanding in quality and completeness. But that’s not all.

LucidImagination has succeeded in wrapping the Solr software in a user-friendly installer that makes it possible to get the search environment up and running in no time.

The thing that is missing at this moment is a “ready to run” indexing solution for a filesystem with binary documents like PDF’s and Office-type documents.
When they succeed in packaging that type of solution, combined with a ready to run HTTP crawling environment, LucidImagination has the potential of competing with the “big” and expensive enterprise search providers like Autonomy, Microsoft / FAST and Exalead.

Open source search is starting to pick up on the big pie of enterprise search solutions, but lacks the “click and run” possibility that other software vendors offer.

The complexity of getting a Solr solution up and running with real life data sources is what’s holding back the large adoption. It is still too much a toolbox.

donderdag 7 januari 2010

Missing vendors in article on Network computing

In the article “Best and worst of times for enterprise search” the author, Paul Korzeniowski, mentions some vendors of enterprise search / eDiscovery solutions.

The list of vendors is missing some very important players though, like Autonomy, Attivio en Microsoft / FAST.

I would advise the author to be more thorough the next time