This weekend I started exploring the Book "Solr 1.4 Enterprise Search Server".
Although Solr 1.4 is not actually released at the moment of writing, the latest nightly build does provide nearly all functionality and is very stable.
In some of the coming posts on this blog I will share my experiences with the reading of the book and working through the examples. I will also make comparisons with some of the other search engines that I use in my work, like Autonomy, Exalead and the Google Search Appliance.
The EBook is accompanied by example code, download-able from the PACKT website. That example code contains the version that was used while writing the book so every example in the book should work.
The example code contains a fully filled and configured Solr / Lucene instance. This instance consists of some hundred thousands of records pulled from the Musicbrainz database. To use this data you must have a development / test environment with enough diskspace and RAM.
One negative remark about this dataset is that is it mostly "structured database data": a lot of fields with small amounts of data.
Enterprise Search environments that I stumble upon mostly hold lots of unstructured information from documents from filesystems, DMS and CMS systems.
Of course database offloading is a hot topic in BI / enterprise search land, but most information that has to be searched and found come from unstructured documents.
Maybe the fact that database data was chosen says something about the field of operation of Solr / Lucene in the real world.
It would be nice if I could use a more representative data set I could work with. This would make the examples more usefull.
Nieuws, achtergronden en opinie over de onderwerpen Information Access an Enterprise Search, gepresenteerd in korte artikelen, webclippings en links naar interessante bronnen. -- News, backgrounds and opinions on the subject of Information Access and Enterprise Search, brought to you in short articles, clippings and links to interesting sites related to the subject.
dinsdag 13 oktober 2009
Abonneren op:
Reacties posten (Atom)
1 opmerking:
I wouldn't read too much into the dataset chosen for the book. I've seen/used Lucene/Solr for plenty of unstructured text, ranging in size from a few hundred words of unstructured text to book length.
I'd say most cases of any search system are a bunch of metadata fields accompanied by 1-5 unstructured fields, but, of course, YMMV.
Een reactie posten