Difference between revisions of "Geographical - Spatial Index"

From Gcube Wiki
Jump to: navigation, search
(Plugin Framework)
(Creating a Rank Evaluator)
Line 78: Line 78:
  
 
====Creating a Rank Evaluator====
 
====Creating a Rank Evaluator====
 +
A RankEvaluator plugin has to extend the abstract class org.diligentproject.indexservice.geo.ranking.RankEvaluator which contains three abstract methods:
 +
 +
*abstract public boolean isIndexTypeCompatible(GeoIndexType indexType) -- should be able to determine whether this plugin can be used by an index conforming to the GeoIndexType argument
 +
*abstract public double rank(Object entry) -- the method that calculates the rank of an entry.
 +
*abstract public void setArguments(String args[]) -- a method called during the initiation of the RankEvaluator plugin, providing the plugin with any arguments provided in the code. All arguments are given as Strings, and it's up to the plugin to parse the string into the datatype needed by the plugin. (will be handled by initialize() in the future)
 +
 +
In addition, the RankEvaluator abstract class implements 3 methods
 +
*public void setQuery(Polygon q) -- is called during the initiation of the RankEvaluator plugin, and stores the query polygon and bounding box. (will be handled by initialize() in the future)
 +
*public void setGeoIndexType(GeoIndexType indexType) -- is called during the initiation of the RankEvaluator plugin and stores the GeoIndexType. (will be handled by initialize() in the future)
 +
*protected Object getDataField(String field, Data data) -- a class used to retrieve a the contents of a specific GeoIndexType field from a org.geotools.index.Data object conforming to the GeoIndexType used by the plugin.
 +
 +
 +
Ok, simple enough... So let's create a RankEvaluator plugin. We'll assume that for a certain use case, entries which span over a long period of time are of less interest than objects with span over a short period of time. Since we're dealing with TimeSpans, we'll assume that the data stored in the index will have a "StartTime" field and an "EndTime" field, in accordance with the [[Geographical/Spatial Index#GeoIndexType|GeoIndexType]] shown earlier.
 +
 +
 +
 
(partial sorting allows for heavy evaluators)
 
(partial sorting allows for heavy evaluators)
 
SpanSizeRanker
 
SpanSizeRanker

Revision as of 13:18, 19 June 2007

Services

The geo index is implemented through three services, in the same manner as the full text index. They are all implemented according to the Factory pattern:

  • The GeoIndexManagement Service represents an index manager. There is a one to one relationship between an Index and a Management instance, and their life-cycles are closely related; an Index is created by creating an instance (resource) of GeoIndexManagement Service, and an index is removed by terminating the corresponding GeoIndexManagement resource. The GeoIndexManagement Service should be seen as an interface for managing the life-cycle and properties of an Index, but it is not responsible for feeding or querying its index. In addition, a GeoIndexManagement Service resource does not store the content of its Index locally, but contains references to content stored in Content Management Service.
  • The GeoIndexBatchUpdater Service is responsible for feeding an Index. One GeoIndexBatchUpdater Service resource can only update a single Index, but one Index can be updated by multiple GeoIndexBatchUpdater Service resources. Feeding is accomplished by instantiating a GeoIndexBatchUpdater Service resources with the EPR of the GeoIndexManagement resource connected to the Index to update, and connecting the updater resource to a ResultSet containing the content to be fed to the Index.
  • The GeoIndexLookup Service is responsible for creating a local copy of an index, and exposing interfaces for querying and creating statistics for the index. One GeoIndexLookup Service resource can only replicate and lookup a single instance, but one Index can be replicated by any number of GeoIndexLookup Service resources. Updates to the Index will be propagated to all GeoIndexLookup Service resources replicating that Index.

It is important to note that none of the three services have to reside on the same node; they are only connected through WebService calls and the DILIGENT CMS. The following illustration shows the information flow and responsibilities for the different services used to implement the Geo Index:

(illustration will be improved shortly... )

			 ________________________________
			|				 |
			|•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘|
			|•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘|
			|•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘|
			|    So Pretty Index Design...   |
			|•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘|
			|•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘|
			|•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘|
			|________________________________|

RowSet

The content to be fed into a Geo Index, must be served as a ResultSet containing XML documents conforming to the GeoROWSET schema. This is a very simple schema, declaring that an object (ROW element) should containan id, start and end X coordinates (x1-mandatory and x2-set to equal x1 if not provided) as well as start and end Y coordinates (y1-mandatory and y2-set to equal y1 if not provided). In addition, and of any number of FIELD elements containing a name attribute and information to be stored and perhaps used for refinement of a query or ranking of results. As opposed to the ROWSETs used for fulltext indices, all rows in a GeoROWSET must contain all fields specified in the IndexType. The following is a simple but valid GeoROWSET containing two objects:

<ROWSET>
    <ROW id="doc1" x1="4321" y1="1234">
        <FIELD name="StartTime">2001-05-27T14:35:25.523</FIELD>
        <FIELD name="EndTime">2001-05-27T14:38:03.764</FIELD>
    </ROW>
    <ROW id="doc1" x1="1337" x2="4123" y1="1337" y2="6534">
        <FIELD name="StartTime">2001-06-27</FIELD>
        <FIELD name="EndTime">2001-07-27</FIELD>
    </ROW>
</ROWSET>

GeoIndexType

Which fields should be present in the RowSet, and how these fields are to be handled by the Geo Index is specified through a GeoIndexType; an XML document conforming to the GeoIndexType schema. Which GeoIndexType to use for a specific GeoIndex instance, is specified by supplying a GeoIndexType ID during initialization of the GeoIndexManagement resource. A GeoIndexType contains a field list which contains all the fields which should be stored in order to be presented in the query results or used for refinement. The following is a possible IndexType for the type of ROWSET shown above:

    <index-type>
        <field-list>
            <field name="StartTime">
                <type>date</type>
                <return>yes</return>
            </field>
            <field name="EndTime">
                <type>date</type>
                <return>yes</return>
            </field>
        </field-list>
    </index-type>

Fields present in the ROWSET but not in the IndexType will be skipped. Fields present in the IndexType but not in a ROW in the ROWSET will cause an exception. The two elements under each "field" element are used to define that field should be handled. The meaning and expected content of each of them is explained bellow:

  • type specifies the data type of the field. Accepted values are:
    • SHORT - A number fitting into a Java "short"
    • INT - A number fitting into a Java "short"
    • LONG - A number fitting into a Java "short"
    • DATE - A date in the format yyyy-MM-dd'T'HH:mm:ss.s where only yyyy is mandatory
    • FLOAT - A decimal number fitting into a Java "float"
    • DOUBLE - A decimal number fitting into a Java "double"
    • STRING - A string with a maximum length of 40 (or so...)
  • return specifies whether the field should be returned in the results from a query. "yes" and "no" are the only accepted values.

Plugin Framework

As explained in the GeoIndexType section, which fields a GeoIndex instance should contain can be dynamically specified through a GeoIndexType provided during GeoIndexManagement initialization. However, since new GeoIndexTypes can be added at any time with any number of new fields, there is no way for the GeoIndex itself to know how to use the information in such fields in any meaningful manner when processing a query; a static generic algorithm for processing such information would drastically limit the usefulness of the information. In order to allow for dynamic introduction of field evaluation algorithms capable of handling the dynamic nature of IndexTypes, a plugin framework was introduced. The framework allows for the creation of GeoIndexType-specific evaluators handling ranking and refinement.

DIS plugin information...

Ranking

The results of a query are sorted according to their rank, and their ranks are also returned to the caller. A RankEvaluator plugin is used to determine the rank of objects. It is provided with the query region, Object data, GeoIndexType and an optional set of plugin specific arguments, and is expected to use this information in order to return a meaningful rank of each object.

Refinement

The GeoIndex uses TwoStep processing in order to process a query. Firstly, a very efficient filtering step will all possible hits (along with some false hits) using the minimal bouning rectangle (mbr) of the query region. Then, a more costly refinement step will use additional object and query information in order to eliminate all the false hits. While the filtering step is handled internally in the index, the refinement step is handled by a refiner plugin. It is provided with the query region, Object data, GeoIndexType and an optional set of plugin specific arguments, and is expected to use this information in order to determine whether an object is whithin a query or not.

Creating a Rank Evaluator

A RankEvaluator plugin has to extend the abstract class org.diligentproject.indexservice.geo.ranking.RankEvaluator which contains three abstract methods:

  • abstract public boolean isIndexTypeCompatible(GeoIndexType indexType) -- should be able to determine whether this plugin can be used by an index conforming to the GeoIndexType argument
  • abstract public double rank(Object entry) -- the method that calculates the rank of an entry.
  • abstract public void setArguments(String args[]) -- a method called during the initiation of the RankEvaluator plugin, providing the plugin with any arguments provided in the code. All arguments are given as Strings, and it's up to the plugin to parse the string into the datatype needed by the plugin. (will be handled by initialize() in the future)

In addition, the RankEvaluator abstract class implements 3 methods

  • public void setQuery(Polygon q) -- is called during the initiation of the RankEvaluator plugin, and stores the query polygon and bounding box. (will be handled by initialize() in the future)
  • public void setGeoIndexType(GeoIndexType indexType) -- is called during the initiation of the RankEvaluator plugin and stores the GeoIndexType. (will be handled by initialize() in the future)
  • protected Object getDataField(String field, Data data) -- a class used to retrieve a the contents of a specific GeoIndexType field from a org.geotools.index.Data object conforming to the GeoIndexType used by the plugin.


Ok, simple enough... So let's create a RankEvaluator plugin. We'll assume that for a certain use case, entries which span over a long period of time are of less interest than objects with span over a short period of time. Since we're dealing with TimeSpans, we'll assume that the data stored in the index will have a "StartTime" field and an "EndTime" field, in accordance with the GeoIndexType shown earlier.


(partial sorting allows for heavy evaluators) SpanSizeRanker

Creating a Refiner

SpanSizeRefiner

Packaging plugins

Will be filled out shortly

loading of plugins

Query language

A query is specified through a SearchPolygon object, containing the points of the vertices of the query region, an optional RankingRequest object and an optional list of RefinementRequest objects. A RankingRequest object contains the String ID of the RankEvaluator to use, along with an optional String array of arguments to be used by the specified RankEvaluator. Similarly, the RefinementRequest contains the String ID of the Refiner to use, along with an optional String array of arguments to be used by the specified Refiner

+ how to specify a rectangle

Dependencies

Will be filled out shortly

Usage Example

Create a Management Resource

Create a Updater Resource and start feeding

Create a Lookup resource and perform a query