Difference between revisions of "Geographical - Spatial Index"

From Gcube Wiki
Jump to: navigation, search
(Packaging plugins)
(Query language)
Line 86: Line 86:
  
 
===Query language===
 
===Query language===
 +
A query is specified through a SearchPolygon object, containing the points of the vertices of the query region, an optional RankingRequest object and an optional list of RefinementRequest objects. A RankingRequest object contains the String ID of the RankEvaluator to use, along with an optional String array of arguments to be used by the specified RankEvaluator. Similarly, the RefinementRequest contains the String ID of the Refiner to use, along with an optional String array of arguments to be used by the specified Refiner
 +
 +
+ how to specify a rectangle
  
 
==Dependencies==
 
==Dependencies==

Revision as of 15:08, 8 June 2007

Services

The geo index is implemented through three services, in the same manner as the full text index. They are all implemented according to the Factory pattern:

  • The GeoIndexManagement Service represents an index manager. There is a one to one relationship between an Index and a Management instance, and their life-cycles are closely related; an Index is created by creating an instance (resource) of GeoIndexManagement Service, and an index is removed by terminating the corresponding GeoIndexManagement resource. The GeoIndexManagement Service should be seen as an interface for managing the life-cycle and properties of an Index, but it is not responsible for feeding or querying its index. In addition, a GeoIndexManagement Service resource does not store the content of its Index locally, but contains references to content stored in Content Management Service.
  • The GeoIndexBatchUpdater Service is responsible for feeding an Index. One GeoIndexBatchUpdater Service resource can only update a single Index, but one Index can be updated by multiple GeoIndexBatchUpdater Service resources. Feeding is accomplished by instantiating a GeoIndexBatchUpdater Service resources with the EPR of the GeoIndexManagement resource connected to the Index to update, and connecting the updater resource to a ResultSet containing the content to be fed to the Index.
  • The GeoIndexLookup Service is responsible for creating a local copy of an index, and exposing interfaces for querying and creating statistics for the index. One GeoIndexLookup Service resource can only replicate and lookup a single instance, but one Index can be replicated by any number of GeoIndexLookup Service resources. Updates to the Index will be propagated to all GeoIndexLookup Service resources replicating that Index.

It is important to note that none of the three services have to reside on the same node; they are only connected through WebService calls and the DILIGENT CMS. The following illustration shows the information flow and responsibilities for the different services used to implement the Geo Index:

(illustration will be improved shortly... )

			 ________________________________
			|				 |
			|•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘|
			|•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘|
			|•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘|
			|    So Pretty Index Design...   |
			|•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘|
			|•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘|
			|•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘|
			|________________________________|

RowSet

The content to be fed into a Geo Index, must be served as a ResultSet containing XML documents conforming to the GeoROWSET schema. This is a very simple schema, declaring that an object (ROW element) should containan id, start and end X coordinates (x1-mandatory and x2-set to equal x1 if not provided) as well as start and end Y coordinates (y1-mandatory and y2-set to equal y1 if not provided). In addition, and of any number of FIELD elements containing a name attribute and information to be stored and perhaps used for refinement of a query or ranking of results. As opposed to the ROWSETs used for fulltext indices, all rows in a GeoROWSET must contain all fields specified in the IndexType. The following is a simple but valid GeoROWSET containing two objects:

<ROWSET>
    <ROW id="doc1" x1="4321" y1="1234">
        <FIELD name="StartTime">2001-05-27T14:35:25.523</FIELD>
        <FIELD name="EndTime">2001-05-27T14:38:03.764</FIELD>
    </ROW>
    <ROW id="doc1" x1="1337" x2="4123" y1="1337" y2="6534">
        <FIELD name="StartTime">2001-06-27</FIELD>
        <FIELD name="EndTime">2001-07-27</FIELD>
    </ROW>
</ROWSET>

GeoIndexType

Which fields should be present in the RowSet, and how these fields are to be handled by the Geo Index is specified through a GeoIndexType; an XML document conforming to the GeoIndexType schema. Which GeoIndexType to use for a specific GeoIndex instance, is specified by supplying a GeoIndexType ID during initialization of the GeoIndexManagement resource. A GeoIndexType contains a field list which contains all the fields which should be stored in order to be presented in the query results or used for refinement. The following is a possible IndexType for the type of ROWSET shown above:

    <index-type>
        <field-list>
            <field name="StartTime">
                <type>date</type>
                <return>yes</return>
            </field>
            <field name="EndTime">
                <type>date</type>
                <return>yes</return>
            </field>
        </field-list>
    </index-type>

Fields present in the ROWSET but not in the IndexType will be skipped. Fields present in the IndexType but not in a ROW in the ROWSET will cause an exception. The two elements under each "field" element are used to define that field should be handled. The meaning and expected content of each of them is explained bellow:

  • type specifies the data type of the field. Accepted values are:
    • SHORT - A number fitting into a Java "short"
    • INT - A number fitting into a Java "short"
    • LONG - A number fitting into a Java "short"
    • DATE - A date in the format yyyy-MM-dd'T'HH:mm:ss.s where only yyyy is mandatory
    • FLOAT - A decimal number fitting into a Java "float"
    • DOUBLE - A decimal number fitting into a Java "double"
    • STRING - A string with a maximum length of 40 (or so...)
  • return specifies whether the field should be returned in the results from a query. "yes" and "no" are the only accepted values.

Plugin Framework

As explained in the GeoIndexType section, which fields a GeoIndex instance should contain can be dynamically specified through a GeoIndexType provided during GeoIndexManagement initialization. However, since new GeoIndexTypes can be added at any time with any number of new fields, there is no way for the GeoIndex itself to know how to use the information in such fields in any meaningful manner when processing a query; a static generic algorithm for processing such information would drastically limit the usefulness of the information. In order to allow for dynamic introduction of field evaluation algorithms capable of handling the dynamic nature of IndexTypes, a plugin framework was introduced. The framework allows for the creation of GeoIndexType-specific evaluators handling ranking and refinement.

Ranking

The results of a query are sorted according to their rank, and their ranks are also returned to the caller. A RankEvaluator plugin is used to determine the rank of objects. It is provided with the query region, Object data, GeoIndexType and an optional set of plugin specific arguments, and is expected to use this information in order to return a meaningful rank of each object.

Refinement

The GeoIndex uses TwoStep processing in order to process a query. Firstly, a very efficient filtering step will all possible hits (along with some false hits) using the minimal bouning rectangle (mbr) of the query region. Then, a more costly refinement step will use additional object and query information in order to eliminate all the false hits. While the filtering step is handled internally in the index, the refinement step is handled by a refiner plugin. It is provided with the query region, Object data, GeoIndexType and an optional set of plugin specific arguments, and is expected to use this information in order to determine whether an object is whithin a query or not.

Creating a Rank Evaluator

(partial sorting allows for heavy evaluators) SpanSizeRanker

Creating a Refiner

SpanSizeRefiner

Packaging plugins

Will be filled out shortly

Query language

A query is specified through a SearchPolygon object, containing the points of the vertices of the query region, an optional RankingRequest object and an optional list of RefinementRequest objects. A RankingRequest object contains the String ID of the RankEvaluator to use, along with an optional String array of arguments to be used by the specified RankEvaluator. Similarly, the RefinementRequest contains the String ID of the Refiner to use, along with an optional String array of arguments to be used by the specified Refiner

+ how to specify a rectangle

Dependencies

Will be filled out shortly

Usage Example

Create a Management Resource

Create a Updater Resource and start feeding

Create a Lookup resource and perform a query