Monday, June 6, 2011

Text Search Engine

6-June-2011
Given huge text file or collection of documents. How will you search for a word in that collection?
Lucene is an open source library that indexes text for efficient information retrieval. Here is the project home page. Lucene can be effectively used to index words in any document format, that makes it flexible to adopt web applications. The story behind Lucene is interesting to read. The name Lucene is the name of better-half of the creator of Lucene software, named Dogg Cutting.
Lucene was originally implemented in Java, and there are port to other languages like CLucene in C++. Lucene can be used as information retrieval on small sites.
The Qt library includes Lucene port to Qt C++ under 3party libraries. We can experiment with the Qt code for information retrieval on small documents.