Sunday, December 4, 2011

Terminologies used in "Full Text Search"

Recently I and my Friend Sonali has completed a white paper on full text search engines with detail case study on SQL Server 2008 full text search & Lucene.net

The important thing to understand in full text search is, being aware of various terminologies used in full text search process. In below paragraph I am going to explain in short about what is full text search and in later paragraphs various terminologies used in the process.
This information is collected from various websites, from msdn, apache lucene.net formal website etc

Full Text Search
In text retrieval, full text search refers to a technique for searching a computer-stored document or database. In a full text search, the search engine examines all of the words in every stored document as it tries to match search words supplied by the user.
Full text search is often divided into two tasks: indexing and searching.
The indexing stage will scan the text of all the documents and build a list of search terms, often called an index.
When a query arrives, either programmatically or as a result of a user request, the full-text engine accesses the sorted and optimized word index to identify which documents contain the requested term(s). The engine creates a list of documents that qualify, typically provided as a list of pointers into the main document index.

General terms used in full text search engines...