Information Access and (Enterprise) Search / Zoeken en Vinden: 2009

zaterdag 12 december 2009

OpenPipeline dead?

A year ago DieselPoint ("search and navigation technology") was promoting their open source project OpenPipeline. When I was reading the promising intention I was enthousiast about this initiative.

OpenPipeline intented to break open the closed source solutions to connect to different information sources and file formats. That would break down the walls of the vendors like Autonomy that protect their technology that enables the extraction of data from a variaty of sources and fileformats.

Now, a few months later, there is no more information coming from this project.

I wonder why the project is now inactive.
- was it a marketing trick from DieselPoint to get attention?
- Has Dieselpoint come to the conclusion that this problem of connectors can not be solved so easily?
- Has DieselPoint decided that they should not be so open?
- Is there not enough willingness from the community to participate?

dinsdag 1 december 2009

Google Wave and Microsoft Sharepoint... NOT alike

Today I read an article with the title "Can Google succeed outside of search?"

One piece of information got my attention a little bit more:

But Google keeps pushing: Its still-in-beta Google Wave promises to attack one of Microsoft's most beloved products, its SharePoint collaboration software.

To my knowlegde and experience with Google Wave, Wave is an online community platform and not so much a collaboration platform. Yes people can talk to each other, send messages, maybe even share a document through some link, but that is to simple to be compared with sharepoint.

The review of Google Wave vs. Microsoft Sharepoint on the Bamboo team blog makes the difference even clearer:

First of all, the question, "Is Google Wave a SharePoint Killer" is a completely artificial, nonsense argument dreamed up by either the Google PR team and/or the media to generate buzz. Wave is an email client (very tempted to say "consumer email client")... an exciting and innovative one, but it's not a collaboration platform... at least if your definition of collaboration is more sophisticated than sharing pictures from a boating trip. Wave doesn't compete with SharePoint, it competes with Outlook.

That being said, it has occurred to me recently that one of the weaknesses of SharePoint is its failure to account for the fact that email is and will continue to be the fundamental central connective tissue to all collaborative activies. As you might imagine, Bamboo has one of the most sophisticated, tricked-out SharePoint portals in the world, and yet even here, people continue to exchange documents in email and communicate issues in email that *should* be written into a wiki or discussion board. Why? Convenience. So long as it is more convenient to fire off an email than to upload a document or navigate to a message board, that is what people will do. Wave does address this problem head on, and I think introduces a new paradigm that Outlook will have to adopt, quickly.

But an enterprise collaboration platform? Hardly. There is no sign of document management, content management, workflow, data integration, dashboarding etc. etc. If Google truly believes that Wave is a platform for enterprise collaboration, it only serves as proof that they really don't understand the business. Maybe if they fully integrated Wave, Google Sites, Google Apps, Google Gears, Google Calendar in a comprehensive platform... then you might have something. But Microsoft has a big head start in lashing together all of these disparate capabilities and a huge installed base of customers to drive innovation based on real world feedback.

I sometimes wonder if the "reporters" who send out this kind of information to the world, just lack somekind of insight or that thay deliberately say things that are just not correct. Double check your sources and trust your own knowlegde.

maandag 23 november 2009

phpQuery: using Google as an information source

For some project I have to get search results from Google and follow the links on the result page to index those valuable resources.
Google doesn't supply an XML-interface to it's search results.

Bing (www.bing.com) and Yahoo (www.yahoo.com through an API key) do. Aside to the subject of this blog, I find this monopolistic behaviour of Google repulsive... They index al OUR content and resources, make money of it through advertising, and then they stop the "open source" character of their activities. Once our content enters their databases, the content is theirs!
Start using Bing more often!

Because of Google does not support an XML-response interface, I have find a way to "crawl" our own content.

I have found the "phpQuery" framework, ironicaly hosting on Google code.

What I wanted to do is get all the relevant links from the search results page and follow those links to make the information aivailable within the context of the search application.

I want to share the basic code that I have come with to establish getting the links from a webpage through the use of phpQuery:

require_once('phpQuery/phpQuery.php');

//The URL of the webpage to fetch
//$link = "http://www.emidconsult.com";
$link = "http://www.google.nl/search?q=multiple+sclerose";

//Get the whole HTML contents
$page = file_get_contents($link);
//Put the HTML content in a phpQuery object
phpQuery::newDocumentHTML($page, $charset = 'utf-8');

//Get the [body] contents out of the phpQuery object
//$body = pq('body');
//print $body;

//iterate through every link (a element) in the webpage.
foreach(pq('a') as $ref) {
//Get the link
$href = pq($ref)->attr('href');
//Show the url in the a element, but only if it contains external links
$pos1 = strpos($href, 'http://');
//and has no reference to Google own search commands
$pos2 = strpos($href, '/search?q');

if (($pos1 === false) || ($pos2 > 0)) {
//DO NOTHING
} else {
echo $href."\n";
//Now do some things with the href...
}
}
?>

donderdag 19 november 2009

Google and making multimedia searchable

Today I saw the announcement "Google adds automatic captions to YouTube". I really like this addition to the possibilities of Youtube.

But as we all know Google as a gigantic eposure in all thing they do.

This feature seems a neat thing that makes it easier for consumers to upload there video's to Youtube regardless of language and eliminating the need for adding manual captions or descriptions to there video's: Google converts the speech to text automaticaly...

As i am more involved in information access and enterprise search I place this news in another context: The possibility of making video and audio searchable.

Within the enterprise this issue of making video and audio searchable is getting more attention. Autonomy as well as Exalead are also focussing on this part of enterprise search.

To see the capabilities of Exalead you can visit their labs on http://labs.exalead.com/experiments/voxalead.html.

Autonomy also is very active on the market of media indexing. They have solutions like Virage that to the same thing. I find it very disapointing that Autonomy has no demo of their capabilities.

Bottom line?
Google is doing the same thing as the large search technology providers. The "public" will say that Google is "ahead of the pack". The advantage that Google has is a large community that follows everything that they do and spread the word fast.
Other search providers like Autonomy and Exalead can do and are doing the same thing as Google is doing now (and are maybe better), but are not in the position to reach the audience that Google can.

dinsdag 3 november 2009

Coveo into free enterprise search?

Short Post:
Today I received an email from KMWorld with an advertisement by Coveo.

They are issuing a free version of their Enterprise search suite with the name "Expresso".

While they are not the only one (Omnifind yahoo Edition and Microsoft Search Server Express do the same) I still think that wrapping something as "enterprise search" in a free gift is not a good idea.
As we all know Enterprise search is more than a software solution. Off course you can plug in a filesystem, but that not will solve most of your information needs.

Still it is something to play with.

Read more

maandag 2 november 2009

Microsoft's FAST and sharepoint integration (SharePoint 2010)

Stephen Arnold has written up an excellent analysis of the information that was given about the coming release of "Fast Search Server 2010 for SharePoint".

Third, there is a reminder to me that SharePoint is a work in progress. I think someone told me that it is the next operating system from Microsoft. The Fast component will be called Fast Search Server 2010 for SharePoint. Now the story gets interesting. Here’s what’s coming:

He then sums up some aspects of the features:

A content processing pipeline.
Metadata extraction.
Structured data search.
Visual search.
Advanced linguistics.
Best bets.
Development platform.
Customization.

Microsoft has a lot of work to do while their were no live demo's or example that could show these features in action.

woensdag 21 oktober 2009

Solr 1.4 Enterprise Search Server Book review

Today I came across a blog posting about the Book "Solr 1.4 Enterprise Search Server ". As I mentioned earlier I am in the process of reading, dicing and slicing the book.
I will post updates on my journey through the examples and findings, but I thought it would be nice to see a short reviews of the book in de mean time:

http://happygiraffe.net/blog/2009/10/20/book-review-solr-1-4-enterprise-search-server/

This review says more about the structure and topics in the book as were I will give more information on experiences while using the book.

Google GSA gets smart (and smarter automatically)

Google announced some updates for the generation 6 of the GSA. They added a functionality that boosts relevance for documents / terms automatically when the link is clicked in the search results.

From InformationWeek:

Chief among the additions is a new algorithm
called Self-Learning Scorer, which analyzes employee clicks and behavior to
improve the relevance of search results.
If, for example, most employees searching for a given term click on the third search result, the GSA will place that term higher on the search results page for future searches.
On the Internet, Google gets highly relevant search results in part because of its
PageRank algorithm, which analyzes Web links and treats them as votes for
relevance. On corporate intranets, a rich link structure is absent, which makes search relevance harder.
The Self-Learning Scorer system should help compensate for the absence of Web links inside corporate firewalls.
[...]
"In other words, the GSA is moving upstream, while keeping the price low, and continuing to disrupt the confused search market," says Feldman in the report.

Google makes good use of "User Generated Content", in this case the fact that a clicked result is the "proof" of the document being relevant for that particular search.

It is good to see that the difference between internet search and enterprise search is becoming clearer.

dinsdag 20 oktober 2009

New technology vs. old pattern matching?

Just yesterday I blogged about the article by Stephen Arnold in which he refers to new players and technology like Exalead in relation to the "old" players with their technology like Autonomy.

A moment ago I received an e-mail alert from Reuters with the title "Autonomy Celebrates a Decade of Automatically Detecting and Acting on Patterns in Information".

This press release by Autonomy states the state of the art solutions that they have build upon pattern matching techniques:

Autonomy brings advanced technology to executives looking to gain competitive
advantage by reinventing how they leverage information through a pattern-based
strategy.  The company has hundreds of patents in complex and advanced
technology in the areas of pattern matching, pattern recognition,
probabilistic analysis, clustering, visualization, eduction, time analysis,
and sentiment analysis.  Unlike legacy systems that require complex modelling,
manual programming and data integration efforts, Autonomy's patterns
technology automatically understands all forms of information including social
media, audio and video and databases.

Is this a coincidence? Based on the examples in the press-release I am still impressed by the way Autonomy is operating.

maandag 19 oktober 2009

Tactical choices for a FAST ESP / Sharepoint investment

A smart analysis of possibilities and choices you have when faced with an upgrade or change in requirements regarding your "old" Fast ESP (OEM) environment. The relationship with Sharepoint is an interesting one.

From: http://arnoldit.com/wordpress/2009/10/19/reflections-on-sharepoint-and-search/

What can customers in a SharePoint environment do with a Fast ESP legacy system? I know what I would do. I would ask that the Microsoft SharePoint engineers find a certified third party who can hook the Fast ESP system into the SharePoint 10 system once these products become available in November 2009. If these experts cannot do the job within the time and budget limits of the organization, I would get a newer system. I know I would look at high profile modern systems such as Exalead’s, and I would check out relative newcomers like Gaviri. In fact, I would do some proofs of concept and pick the best system for my needs. I know I would not consider the older systems that are on the market; for example, the BASIS technology or the Bayesian systems. I want 64 bit, smart systems, not the pains of the past.

I find it very interesting how Stephen Arnold favours the technology of Exalead over the "old" players on the market like Recommind and Autonomy that have based their "magic" on Bayesion probabilistic pattern-recognition.

woensdag 14 oktober 2009

More Google browser specific choices

From http://www.washingtonpost.com/wp-dyn/content/article/2009/10/14/AR2009101400362.html

Our tipster says that he's only seeing the new ads in the developer version
of Chrome, but I'm seeing them as well in Safari, though some TechCrunch staff
aren't seeing them in any browser. Google is always switching up ad placement
and formats in various bucket tests, some of which are browser-specific, so the
inconsistency isn't surprising.

It seems that Google has some favorites when it comes to testing ;-).

Database technology vs. Search technology

Today I read this article on Beyond Search about strugling with large datasets in a database.

Search technology can scale very easy with products like Autonomy IDOL and Exalead.
Oracle RDBMS is good (the best?) at managing structured data but is not capable of handling transactions AND queries at the same time.

Using search technology in conjunction with database technology simplifies the managing of an Oracle environment. Much time is spend in optimizing the databases to serve both transactions and querying.
The path that is needed for updating and inserting records is totally different from the path that must be followed to find information. Many hours of database administrators are spend on this subject.

My advise:Invest in optimizing the RDBMS to handle transactions and VERY specific questions on detailed data. Invest in a good and scalable solution based on search technology to handle the queries.
With this "database offloading" strategy you will be able to let your dataset grow and still be able to manage it as well as make it usable for all processes and users in your company that need the data.

dinsdag 13 oktober 2009

Experiences with Solr 1.4 Enterprise Search Server (Part 0)

This weekend I started exploring the Book "Solr 1.4 Enterprise Search Server".

Although Solr 1.4 is not actually released at the moment of writing, the latest nightly build does provide nearly all functionality and is very stable.

In some of the coming posts on this blog I will share my experiences with the reading of the book and working through the examples. I will also make comparisons with some of the other search engines that I use in my work, like Autonomy, Exalead and the Google Search Appliance.

The EBook is accompanied by example code, download-able from the PACKT website. That example code contains the version that was used while writing the book so every example in the book should work.
The example code contains a fully filled and configured Solr / Lucene instance. This instance consists of some hundred thousands of records pulled from the Musicbrainz database. To use this data you must have a development / test environment with enough diskspace and RAM.

One negative remark about this dataset is that is it mostly "structured database data": a lot of fields with small amounts of data.

Enterprise Search environments that I stumble upon mostly hold lots of unstructured information from documents from filesystems, DMS and CMS systems.
Of course database offloading is a hot topic in BI / enterprise search land, but most information that has to be searched and found come from unstructured documents.

Maybe the fact that database data was chosen says something about the field of operation of Solr / Lucene in the real world.

It would be nice if I could use a more representative data set I could work with. This would make the examples more usefull.

Information Access is all about the interfaces

Just read this post on Beyond Search about the difference between Google/Apple interfaces and other interfaces in the enterprise that give you access to information. The Image is below.

This funny little comic states what I have always said when talking about interfaces that must help people to find information. They must be simple, yet powerfull.

In my opinion is it necessary to start a dialog based on the initial search. Compare it with a store that you go and ask the sales person "I want a gift". Based on that query, search engines would respond with "0 results". A "gift" is not concrete enough.

The sales person would start asking you questions: "For what occasion?", "Is it for a woman or a man?", etc.

So: Start simple and guide the searching person through the possible answers.

Visualisation of Enterprise Search ROI with Google GSA

The dutch company VLC has a great little application on their website that lets you interactively determine the ROI you will have when investing in a Google Search Appliance (GSA). They make use of the following factors:

Documents
Employees
% of knowledge workers
Cost per employee

Below is the text on the website of VLC (in dutch):

Bereken de voordelen van een goede zoekoplossing

Met een simpele formule is te berekenen hoeveel geld bespaard kan worden bij het goed doorzoekbaar maken van informatie met een Enterprise Search oplossing.

# medewerkers × (gemiddeld jaarsalaris / 2,080 uur) × (besparende minuten / medewerker) = € besparing per jaar.

Onderstaande applicatie voert deze berekening uit. Sleep de balletjes op de balkjes heen en weer en stel dit in op de informatie van de organisatie. Zo is te zien dat er snel een hoge productiviteitsbesparing te behalen valt bij het inzetten van een goede zoekoplossing.

vrijdag 9 oktober 2009

Going from one monopolist to another

http://blog.seattlepi.com/microsoft/archives/180794.asp

Google is offering Office productivity tools in de cloud through Google Apps. To my knowlegde they are the only one with such a rich online solution.

When more and more people are going to use Google Apps, the suite is going to be a new standard. That will attract more users because it easier to exchange files in the same format.

And what about the "open-ness" of the Google Apps document format. Sure you can save a document in another format and then open it in that program (let say "Word"). But when you have edited that document with Word, you can not save it as an Google Apps document without trouble. You can upload the file as a new document, but that's not what you want. Also the layout of the document will be messed up...
I say: when you chose for Google Apps you are bound by it, as is the case with the choice for another office suite.

When Google has the biggest user base in the world, will they be not the next monopolist in the market?

Consider this:
The fact that all your data is on their servers is even more creepier.

With the offline Microsoft Office solution your data is still safe in your own network.

At the moment we use Google Apps only for "work in progress" and notes that must be shared or colaborated on.
The final documents and reports we still make and save with Microsoft Office (and Open Office although that causes compatibility issues). The files are safely kept under our own influence.

donderdag 8 oktober 2009

In the field of defying standards, Google is following Microsoft

I have said it before, and I will say it again: Google is growing out te be the next Microsoft. Driven by shareholders value the company is no longer the underdog. They are starting to set their own standard. The public opinion is changing...

By implementing currently non-standard features on their homepage, Google are
sending out a strong message on what they believe the new standard features
should be, and not coincidently, it is the features that their own browser
implements and supports. This is not the first time Google has sent a
wrecking-ball into the standards process. Google Gears was launched long before
Chrome as a way to implement proposed HTML5 standards, such as offline caching,
into browsers (see my NextGenWeb series from last year). It was born out of frustration
for the slow and beurocratic standardization process ¿ something that Google
couldn't afford to wait for as their web applications could not advance further
without a non-aligned platform to build them on.
A large part of the
anti-trust case against Microsoft was that with combined desktop, browser and server
market dominance the company could abuse that position to make the web a
Microsoft web by implementing Microsoft-only features. Google are using their
dominance to force an issue that has been stalled for far too long ¿ but the
difference is that they are using their force for potentially a greater good (I
hope). The theoretical Microsoft web would have been "this website only supports
Internet Explorer", whereas with Google so far we have "this website is a lot
better, and has sexy buttons, if you use Chrome (which btw is open source)".

source: http://www.washingtonpost.com/wp-dyn/content/article/2009/10/08/AR2009100800192.html

woensdag 30 september 2009

Is Autonomy a Microsoft Target?

Must be something going wrong with implementing Fast.

Talk is swirling around the largest listed company in the Cambridge cluster of
high tech companies, Autonomy
Corporation plc, that it is an imminent target of a takeover offer from
Microsoft.
Talk in financial markets, (reported here by Reuters),
is that the Seattle giant may make a bid at around 2,800p a share, which would
value Autonomy
at about £6.7 billion, about 75% above its current share price of 1513p and
market capitalisation of £3.9 billion.
Autonomy's
share price has been up between 3.5% and 5% in Friday morning trading.

Source: http://www.siliconfenbusiness.com/articles/Is-Autonomy-a-Microsoft-target-at-north-of-6-billion/734

dinsdag 22 september 2009

Google and Meta tags...

Today the whole blogosphere buzzed about the fact that Google does not take into account the fact that website owners use keywords in the HTML meta-tags to describe the content.

Even Beyond search has quaked about it.

Google has reacted to this news very quickly, stating that their Enterprise solution does use this kind of metatags.

The reason for not using user generated meta tags in the ranking mechanism in Google websearch is simple... it is sensitive for manipulation.
Google websearch relies on the popularity of webpages and websites. Furthermore Google believes in the volume of data, not per sé the quality, to be able to find usefull and recent subjects that determine the relevancy of content for users.

As we al know, the volume and popularity factors are not very usefull in an enterprise environment. There, it revolves around the finding of one specific document or relevant information that has been revised or tagged by other humans.

In the enterprise metadata is simply necessary to categorize and give context.

Still we must make sure take this kind of news does not make the enterprise user even more "dumb" by expecting to find relevant results in there company while they don't invest any effort in "upgraing" the quality of the data....
They all want it "Google-like" but in this case they do not understand what that means.

woensdag 16 september 2009

Google Fast Flip is essentially Google News in a different format

Today Google announced a new Labs experiment: Google Fast Flip.

Google already has an outlet for news, namely Google News.

In my opinion Fast Flip is the same news outlet, only in a different interface. An intuitive interface I must add.
But the Google News interface already has a "newspaper-like" shape. To scan the headlines I like Google News better. This also has to do with the fact that I can add subjects and content myself.
Further more I can arrange the frontpage the way I like it.

What's your opinion? Just another funny interface the guys from Google are playing with, or is it really revolutionary???

dinsdag 15 september 2009

Reasons for choosing Solr on all for good twisted

Today I read a blogpost on the Lucid Imagination blog about the fact that a Google employee chose Solr as the search engine for the site.
The blogpost cited a part of the testimonial on the allforgood site on with I have to disagree with the reason for choosing Solr

The problem they had was:

One of the top concerns we’ve been hearing from nonprofit organizations who list
volunteer opportunities on All for Good is that their opportunities aren’t
updated on the site as frequently as they need. This happens because All for
Good doesn’t directly receive volunteer opportunities from nonprofits – we crawl
feeds from partners like VolunteerMatch and Idealist just like Google web search
crawls web pages. Crawlers don’t immediately update, they take time to find new
information.

The solution stated:

Today, we’re rolling out improvements to All for Good that will help solve this
problem and improve search quality for users. The biggest change, which you
won’t see directly, is that our search engine is now powered by SOLR, an
incredible open source project that will allow us to provide higher quality and
more up-to-date opportunities. Nonprofits should start seeing their
opportunities indexed faster, and users should see more relevant and complete
results.

Now... why do I disagree with the way the choice for Solr is argumented?

It is the fact that the use of Solr solves their problem of "latency". Remember the biggest problem was that the indexed information was not up to date.
Solr doesn't solve that. Solr is just a service around Lucene. Solr doesn't take care of the crawling part of the problem.

Us experts on search applications and information access solutions know that it is the combination of crawling frequency, the accessibily of the source that has to be indexed (RSS, Web, document repositories, databases etc.) the preprocessing of those diverse formats that determine the speed of the indexing and thereby, search process.

In this case probably Nutch will take care of the crawling part, so the frequency with which updates are processed rely on the speed of that part of the solution. Not the fact that Solr is used...

donderdag 3 september 2009

Solr next new thing for Europian Parliaments

According to some representatives of European Parliaments (UK, Europe, Belgium) , Solr is the next new thing in the field of search solutions for their websites and intranet search.

Today I was at a workshop focussed on the topic of search. The workshop was organized by the dutch Parliament and there were representatives of 8 parliaments of other European countries (Belgium, UK, Denmark, Norway, Sweden, Finland, Austria, Israel).

What stroke me there was the real interest in Solr as a solution for there search needs. I know that Solr is really upcoming but still has a "techy" ring about it... so I thought.

It seems that Solr is on its way to becoming a standard in government land. Faster than I had thought it would be.

Driven by the need for Open Source software and lower license costs, many organisations are willing to try out Solr. They take the absence of support from the vendor (there is no) for granted and are investing in own knowledge or that of implementation partners.

One reason for this "gamble" was very sensible in my opinion:

The field of search is really evolving. Not only from a technical point of view but also from a vendor and marketing angle.
We choose Solr because it is cheap, it is still being developed very fast, and our need for functionality is simple now. At the time we want more functionality, the product probably will have it. It therefore grows with our needs...
When after 3 years it seems that we have made the wrong decision we still haven't thrown away much money. We can than always switch.

I must agree with the choice. Why pay EUR 100.000,00 for a product that promises everything while you are using part of it.
Of course Solr is free but the implementation will cost you money. But even if you spend 50.000,00 on customising Solr it it still half as cheap.

Seems like the way to go for me...

woensdag 26 augustus 2009

LexisNexis expanding their services

Beeing driven in a corner...
LexisNexis is forced to search for other ways to keep up revenue. Users want cheeper solutions and are starting to turn to free content that is often published on the web, in stead of paying for it through the subscription services of LexisNexis.

Now LexisNexis offers an integrated search solution that not only searches the LexisNexis content but also has the capability to plug into local sources making it an enterprise search solution.

The new offering integrates Lexis Search Advantage® content and services
accessed through lexis.com with MindServer™ Search, Recommind’s enterprise search
platform. It provides a one-stop destination combining access to documents
and information from both a firm’s internal sources as well as trusted
LexisNexis® content, delivering search results that are more complete, efficient
and actionable.

The search engine is delivered by Recommind.

vrijdag 21 augustus 2009

Enterprise Search – It’s all about the interfaces

Now that search is becoming a commodity – at least the core search engine is – it is possible to focus more on presenting the relevant information in a user-friendly and usable way.

Tools like tagging, rating, recommendations etc. must be incorporated into the interface without cluttering the information.

Companies like Vivisimo and Exalead understand this change illustrated by there publications on the website and in the way that they distribute their solutions. They have some pretty neat interfaces that come out-of-the-box.

Autonomy on the other hand still has no usable GUI for their IDOL product and the interface that the Google GSA comes with is to simple.

The interesting thing about Solr is its open source character. Because the community loves this, they are developing interfaces and frameworks for use with Solr. Examples are Blacklight and VuFind.

dinsdag 18 augustus 2009

Book Solr 1.4 Enterprise Search server is finished

Just received a Tweet from lucene_solr stating that the book about implementing the forthcoming Solr 1.4 as an enterprise search server, is finished. The authors expect the shipping to be within 2 weeks.

I pre-ordered the book several weeks ago and now am awaiting the delivery!!!!

Tags van Technorati: solr

maandag 3 augustus 2009

Fuzz about Google search options

The past few weeks I see so much news and blog articles about the "Search options" that Google has rolled out on their image search application. They added this feature a couple of months ago for there regular websearch to.
They are following the example of Bing who has had this from the start.

I don't see the big news here because "search options" are a standard feature of any enterprise search solution (Autonomy, Solr, Exalead). If you don't have faceted / parametric search the the search solution is useless.

The fact that so many websites see this feature as news has opened my eyes to the fact that so few people are used to real and usable search solutions at work. This strikes me as odd, because how are they able to find the info within their organisation then?

No wonder that the enterprise search / information access market has an estimate share of billions of dollars.

vrijdag 31 juli 2009

Google and Games?

Stephen Arnold just posted an article that is a write-down of one of his talks about the Google and the search market.

This time the angle is Google as an application and gaming platform.

First, Google is a platform, and it offers a range of software development kits, application programming interfaces, and “sandbox” toys. The idea is that a developer with online basic programming skills can use the Google platform. At the other end of the spectrum, a professional developer or a company focused on game development can create applications that run on the Google platform.

And then something about the way Google operates:

Second, I think it is important to recognize that Google moves in small, incremental steps. The company does this in order to avoid alerting competitors to its broader strategy and to minimize antitrust actions. Nevertheless, you should plan on allocating your time based on how the Google market shapes up. This means that delay in learning how to code for Google is a bad idea. Among the technologies to learn are SketchUp (Google’s drawing program), Android (the visible part of the Google operating system), Wave (collaborative spaces), and Google Apps and OneBox APIs. These functions are, at a minimum, the way in which to obtain the Googley expertise you need.

I think this is another one of Stephen’s brilliant takes on what’s happening at Google’s HQ.

For me it is again proof that the Google is into nearly everything and that’s why it is bound to lose focus and will get more and more (negative) attention.

Tags van Technorati: google,platform,cloudcomputing,applicationprovider,games

donderdag 30 juli 2009

Unified Search for Firefox

Gives me insight on how search results from Google compare to results that Big comes up with.

Unified Search is a FireFox Addon thats let’s you not only compare Google or Bing, but also searches WIkipedia, Delicious and Wolfram.

It seems that everyday the relevancy of Bing is getting better. Better of course if the results that Google comes up with is the “standard”. But how do I know if the results that Google serves are the best?

I have gotten to trust Google so much that I am not discussing that anymore, which frightens me. Am I so blinded by the Google?

That is exactly why I am using Bing side by side with Google. I don’t want to blindly trust something when I don’t know is there’s something else to trust better.

Tags van Technorati: google,bing

Why is everybody picking on Google

I said it a while ago and now Google is negatively in the news again. The Google is getting to big.

In one of my previous posts I stated that whenever a company or person gets to much influence, it’s only a natural reaction of people to oppose to that force.

On the other hand when a company gets to big, it’s harder to keep focussing on the goals and primairy activities that support those goals.

It’s just a matter of time until the Google breaks up (forced or not) into smaller manageable business parts:

Office productivity (Apps)
Advertising
Enterprise / Internet Search technology
Cloud computing

Anyone sees other business units?

Tags van Technorati: google,business,public

woensdag 29 juli 2009

Twitter has evolved into…. a search engine

Twitter started as an internet equivalent of a short message service. The concept of beeing able to follow other people and to be followed by people that want to know what you’re up to has gotten millions of user to start twittering.

Twitter got in the news when media started to realize that real-time breaking stories were beeing spread through message of Twitter users.

Now Twitter has started turning this “real time news search” as their new “reason for living”.

Of course they have an advantage over Google and Bing to being able to index, enrich and explore their own data.

Also read “”It’s Time To Start Thinking Of Twitter As A Search Engine”

Tags van Technorati: twitter

Yahoo search coming to an end?

Just read the post on Beyond Search about Yahoo giving te Search to Bing…

Will this really be the end of Yahoo as a First Real Internet Search Engine?

Luckaly Yahoo has some other great offerings like the news portal, Email and other productivity tools.

Tags van Technorati: yahoo,google,microsoft,bing,advertising

dinsdag 28 juli 2009

Relevance problems over simplified

A few days ago there was an article on CMS Watch about relevance ranking.

Grant Ingersoll of Lucid imagination reacted to that post:

I couldn’t agree more with Therea Regli’s excellent discussion of relevance, especially the point to be “skeptical” of why results are the way they are. This is definitely true for search application developers too. The problem is, if you’re using a proprietary vendor, the only thing you can do about such skepticism is bang your head against the wall. For Apache Lucene and Solr, on the other hand, understanding why something scored the way it did is as simple as making an API call (Lucene) or adding &debugQuery (Solr) to your input and you get, in full unadultered glory every last detail about why a particular document scored a particular way for a given query. Furthermore, if that doesn’t satisfy you, just pop open the source code or ask us or ask on the mailing list!

I must react to the technical over simplification of this relevancy problem. Off course there are options in several search engines that explain why a document has a high relevance score in relation to the query. Solr / Lucene obviously has excelent functions to show it, but I must say that even Autonomy IDOL displays links, scores, terms and weights that explain this also.

Lucid must be carefull not to react to every case about search and relevancy ranking with the "Open source is open so you can build your own search engine" attitude. Let's be realistic about the fact that implementing Search is not cheap and transparent. That's exactly why Lucid entered the business.

donderdag 9 juli 2009

Chome OS... De Google val slaat dicht

Much fuzz about the announcement of the Google Chrome OS the past few days.

I must say that I am intrigued by the concept myself. My first reaction was like "Off course, after the Google Apps, Chrome Browser and Android this is the next step".

Google is revealing itself more and more like the other established players as Microsoft, IBM, Oracle etc. They are actually building a stack of software and platforms that lock in any company that is standardising on their software.

Google is more and more becoming a "normal" IT-company that sells software and tries to bind customers to their solutions.
There are 3 aspects to this issue:

Google still is a company that gets it's revenues from advertising
Google is starting to dominate (monopoly??) some markets (advertising)
Google is known for giving away free services

I wonder where this is going... Google is not the David fighting Goliath with small stones anymore. They are a David on steroids trying to get Achilles and Goliath at the same time.

woensdag 1 juli 2009

Boost for Open Source Search through availability of connectors

Just read the Blog post on Beyond Search about the fact that ISYS is letting its connector technology be reselled by Lucid Imagination.

According to Eric Gries, CEO of Lucid Imagination, "The combination of ISYS File Readers and Lucene/Solr offers a best-of-breed solution for enterprises that do not want to be held back by lack of functionality and the high costs associated with proprietary search products from infrastructure and platform vendors."

It is clear that Enterprise Search is all about the connectors, as stated in many articles and blog posts earlier. The lack of connectability of Solr/Lucene has, up till now, held back the large scale adaption of the open source platform.

I think this is very good news.

dinsdag 2 juni 2009

Google announces the 6.0 version of the GSA, Search experts are not impressed

Today Google announced the release of the 6.0 version of their Search Appliance
The main improvements are

better scalability
The Box can now hold up to 30 million documents and can be connected to more appliances to hold billions of documents
Flexible architecture
It is easier to connect more appliances
Social Search capabilities
Add page to result.

Some excerpts from other sources:

http://www.itworld.com/software/68712/google-search-appliance-now-can-index-billions-documents

In addition, GSA 6.0 provides end users with social search capabilities, such as the ability for them to add results.

"Google has matured in its thinking about how enterprises implement search," said Rebecca Wettemann, an analyst with Nucleus Research.

Enhancing the Search Appliance in areas like scale, customization and integration will resonate positively in the enterprise market, where Google's main challenge is being perceived primarily as a consumer-focused company, Wettemann said.

While the Search Appliance's functionality isn't still as sophisticated as that of high-end products from Autonomy and Microsoft's Fast Search, it has moved upstream with this latest round of enhancements, said IDC analyst Susan Feldman.

http://www.forbes.com/2009/06/02/google-enterprise-search-technology-cio-network-google.html

Forrester Research analyst Matthew Brown says the product improvements may be as dramatic as Google says they are. But taken in context, Google is still trying to play catch-up to a slew of start-ups and giants like Oracle, he says.
"I'm just kind of unimpressed with what they're announcing," says Brown. "They're coming to market so late, with requirements that were established years and years ago."

donderdag 28 mei 2009

federated search redefined... more like distributed indexing

Through Linked-In I was pointed to a blog about a new kind of search or another way that the term "federated search" could be defined.

“The triumph of the distributed Web.” He said the aggregate power of distributed human activity will trump centralized control. His main point was that Google, and other search engines that analyze the Web and links, are much less useful than a (theoretical) search engine that knows not what people have linked to (as Google does), but rather what pages are open on people’s browsers at the moment that people are searching. “All the problems of search would be solved if search relevance was ranked by what browsers were displaying,” he said.

source: http://federatedsearchblog.com/2009/05/27/a-new-paradigm-for-federated-search/

I like the idea but I am confused about the fact that they are calling this "federated search".

In the enterprise search world we define "Federated search" as the distribution of a search action over two or more search environments. The "distributed" engines deliver results and those results are aggregated by the centralized search engine and presented to the user.

I think it would be more appropriate to name the mentioned method of indexing "distributed indexing" or "federated indexing".

vrijdag 15 mei 2009

Enterprise search will never be able to search the enterprise

http://dominicfallows.co.uk/2009/05/14/it-is-getting-too-clever-by-half/

Users of enterprise search technologies report that the enterprise will never be searchable – there is just too much private (and valuable) information sitting on PCs and in obscure systems that use strange data formats. Service-oriented architectures typically fail to provide services and degrade into expensive mechanisms for providing limited interoperability between systems. And so on.
I am sure that there is some organisation somewhere that lives up to the glossy images we find in supplier brochures – where everything is under control, and senior management look on approvingly as some smart, attractive, thirty-something professionals adjust a few parameters in the business performance management system. It's just that I've never come across such a thing – and neither will you.

I must disagree with the author on this. Of course it is a challenge to make all data in an organization searchable. With the current indexing and search tools like Autonomy or Google most problems and obscure data formats can be handled. The experience that we have is that there a very few organizations that have a vision on that field.

donderdag 14 mei 2009

Google add a page to a result

I didn't know that this feature existed.
You can add a page that you missed in de result to the result page of a specific query.

Another brilliant idea from the boys of Google. This way you collect "best results" with the direct input from (active) users.
A similar functionality is the "SearchWiki" which let you add comments to pages in the results that are return by a query.

maandag 23 maart 2009

Rediscovery of Yahoo Search

Because of all the default settings in IE (Microsoft Live Search) and Firefox (Google Search) it is simple to forget that there is another good search engine that has been around from the "beginning of the WWW as we know it": Yahoo! search.
The strange thing is that what Microsoft does with IE and Livesearch, Mozilla is doing with Firefox and Google. Of course you can change your default search provider, but not everybody knows that and (wants to) use that.

Because of this blog about the good search suggestions that Yahoo! gives, I started to rediscover Yahoo!.
It has a nice clean interface and the results are much more relevant than the search with Live Search. The added "Search assist" is very powerfull and fast.

For some time now I want to use something else than the all-seeing Google. Understand me well: I have nothing against Google, I just think it is good to look at something else in stead of using something because everyone is using it.

maandag 16 februari 2009

Tegens van de Flex en Adobe Air apps in relatie tot beheerbaarheid en accessibility.

Onderstaand een verhaal over de tegens van een "fat client" oplossing als Adobe Air. Lean and mean weboplossingen lijken nog steeds de voorkeur te hebben vanwege update-problemen. Ik geloof nog steeds in het meer aansluiten op de desktop activiteiten van gebruikers....

Zie: http://www.cmswatch.com/Trends/1492-The-case-against-Flex-based-application-UIs?source=RSS

We're starting to see more vendors coming out with Flex-based user interfaces, sometimes extending them as full-blown desktop applications using the AIR runtime. For example, Documentum's D6 Web Publisher comes with a standalone Flex interface for certain tasks.

To me, turning to Flex for a content management interface is a cop-out. It creates nice demoware for the vendor, but long-term problems for you. I can see why Flex is alluring for vendors: maintaining consistent, cross-browser compatibility (especially with AJAX) is hard and expensive. But why does that mean that you the customer must give up the simplicity and supportability of a native browser-based interface?

Flex is essentially another semi-thick client akin to Java applets (or ActiveX controls). Let's review why the business world didn't like applets for application user interfaces when they were pervasive within enterprise web applications earlier this decade.

They almost always violate web accessibility guidelines. Sure, many thin-client application interfaces are not compliant either, but at least you have the opportunity to make them compliant -- and many platforms (e.g., Plone) do so.

They create support nightmares. Things like automatic updates, license-checking, and incompatibilities with the underlying virtual machine lead to many a help-desk call. As my colleague Kas notes, with AIR instead of applets you are just replacing Sun's virtual machine with Adobe's. The potential for trouble remains.

They are prone to performance problems. Flex applications are prone to the same memory leaks and CPU spikes that bedeviled applets for years. To be sure, I've seen some fat and ugly JavaScript-based interfaces too, but at least everyone can debug those openly. We've also had customers tell us that their Flex interfaces are unusually chatty on their networks.

You can't easily modify them. The vendor sets a unified interface and all their customers have to live with it. That might work for a one-dimensional tool, like a Twitter client, but what about more heterogeneous, multidimensional environments?

To my mind, this last problem is really crushing. I recently participated in a Web CMS vendor demo for a consulting client, where the vendor unveiled their Flex-based webpage-builder interface to initial nods of approval. But then we asked the vendor to continue along the scripted scenarios, and the headshaking started becoming more horizontal. The way some elements were laid out didn't make sense to the customer team, and the empty placeholder blocks had confusing iconography and signaling.

dinsdag 6 januari 2009

Learning to search

Mostly we are insulting normal users by saying that they just use a search engine wrong by not specifying the right search query for their needs...

Microsoft has developed a program that can help users in getting their search done right. The post on msdn.com on "how well do we use our search engines?" tells the story about a study on getting the search right by playing a game.

Do play the game... it is fun.

Tags van Technorati: search,usability

Google Apps is getting more and more users

According to this blog post by Google, they have more than 3 million users at educational organizations.

It is not so surprising that Google gets very much attention in this user base. Students nowadays grew up with one company they now best. It is the company that delivers the only internet searchengine they know. The company that delivers the E-mail application they use. The company that lets them navigate on their mobile phone. Read the post on Classroom collaboration to illustrate what I mean.

Computing in the clouds is so easy and Google delivers.

Just as a complete generation grew up with the products of Microsoft, A complete generation is now growing up with Google. I see no big difference. It is just a matter of time when Google gets the same negative public attention that Microsoft got. The reason is simple...

When a company gets to "big" or has to much influence the natural "big brother" reaction of the public takes control....