Difference between revisions of "Full Text Index"

From Gcube Wiki
Jump to: navigation, search
(Implementation Overview)
(Implementation Overview)
Line 3: Line 3:
  
 
==Implementation Overview==
 
==Implementation Overview==
 +
===Services===
 
The full text index is implemented through three services. They are all implemented according to the Factory pattern:
 
The full text index is implemented through three services. They are all implemented according to the Factory pattern:
 
*The '''FullTextIndexManagement Service''' represents an index manager. There is a one to one relationship between an Index and a Management instance, and their life-cycles are closely related; an Index is created by creating an instance (resource) of FullTextIndexManagement Service, and an index is removed by terminating the corresponding FullTextIndexManagement resource. The FullTextIndexManagement Service should be seen as an interface for managing the life-cycle and properties of an Index, but it is not responsible for feeding or querying its index. In addition, a FullTextIndexManagement Service resource does not store the content of its Index locally, but contains references to content stored in Content Management Service.
 
*The '''FullTextIndexManagement Service''' represents an index manager. There is a one to one relationship between an Index and a Management instance, and their life-cycles are closely related; an Index is created by creating an instance (resource) of FullTextIndexManagement Service, and an index is removed by terminating the corresponding FullTextIndexManagement resource. The FullTextIndexManagement Service should be seen as an interface for managing the life-cycle and properties of an Index, but it is not responsible for feeding or querying its index. In addition, a FullTextIndexManagement Service resource does not store the content of its Index locally, but contains references to content stored in Content Management Service.
Line 22: Line 23:
 
| |
 
| |
 
|________________________________|
 
|________________________________|
 +
</pre>
 +
 +
===RowSet===
 +
The content to be fed into an Index, must be served as a ResultSet ([[ResultSet Framework]]) containing XML documents conforming to the ROWSET schema. This is a very simple schema, declaring that a document (ROW element) should contain of any number of FIELD elements with a name attribute and the text to be indexed for that field. The following is a simple but valid ROWSET containing two documents:
 +
<pre>
 +
<ROWSET>
 +
    <ROW id="doc1">
 +
        <FIELD name="title">How to create an Index</FIELD>
 +
        <FIELD name="contents">Just read the WIKI</FIELD>
 +
    </ROW>
 +
    <ROW id="doc2">
 +
        <FIELD name="title">How to create a Nation</FIELD>
 +
        <FIELD name="contents">Talk to the UN</FIELD>
 +
        <FIELD name="references">un.org</FIELD>
 +
    </ROW>
 +
</ROWSET>
 
</pre>
 
</pre>

Revision as of 14:50, 31 May 2007

Introduction

The Full Text Index is responsible for providing quick full text data retrieval capabilities in the DILIGENT environment.

Implementation Overview

Services

The full text index is implemented through three services. They are all implemented according to the Factory pattern:

  • The FullTextIndexManagement Service represents an index manager. There is a one to one relationship between an Index and a Management instance, and their life-cycles are closely related; an Index is created by creating an instance (resource) of FullTextIndexManagement Service, and an index is removed by terminating the corresponding FullTextIndexManagement resource. The FullTextIndexManagement Service should be seen as an interface for managing the life-cycle and properties of an Index, but it is not responsible for feeding or querying its index. In addition, a FullTextIndexManagement Service resource does not store the content of its Index locally, but contains references to content stored in Content Management Service.
  • The FullTextIndexBatchUpdater Service is responsible for feeding an Index. One FullTextIndexBatchUpdater Service resource can only update a single Index, but one Index can be updated by multiple FullTextIndexBatchUpdater Service resources. Feeding is accomplished by instantiating a FullTextIndexBatchUpdater Service resources with the EPR of the FullTextIndexManagement resource connected to the Index to update, and connecting the updater resource to a ResultSet containing the content to be fed to the Index.
  • The FullTextIndexLookup Service is responsible for creating a local copy of an index, and exposing interfaces for querying and creating statistics for the index. One FullTextIndexLookup Service resource can only replicate and lookup a single instance, but one Index can be replicated by any number of FullTextIndexLookup Service resources. Updates to the Index will be propagated to all FullTextIndexLookup Service resources replicating that Index.

It is important to note that none of the three services have to reside on the same server; they are only connected through WebService calls and the DILIGENT CMS. The following illustration shows the information flow and responsibilities for the different services used to implement the Full Text Index:

(illustration will be improved shortly...)

			 ________________________________
			|				 |
			|				 |
			|				 |
			|				 |
			|    So Pretty Index Design...   |
			|				 |
			|				 |
			|				 |
			|________________________________|

RowSet

The content to be fed into an Index, must be served as a ResultSet (ResultSet Framework) containing XML documents conforming to the ROWSET schema. This is a very simple schema, declaring that a document (ROW element) should contain of any number of FIELD elements with a name attribute and the text to be indexed for that field. The following is a simple but valid ROWSET containing two documents:

<ROWSET>
    <ROW id="doc1">
        <FIELD name="title">How to create an Index</FIELD>
        <FIELD name="contents">Just read the WIKI</FIELD>
    </ROW>
    <ROW id="doc2">
        <FIELD name="title">How to create a Nation</FIELD>
        <FIELD name="contents">Talk to the UN</FIELD>
        <FIELD name="references">un.org</FIELD>
    </ROW>
</ROWSET>