Geographical - Spatial Index

From Gcube Wiki
Revision as of 13:22, 8 June 2007 by Msibeko (Talk | contribs) (RowSet)

Jump to: navigation, search

Services

The geo index is implemented through three services, in the same manner as the full text index. They are all implemented according to the Factory pattern:

  • The GeoIndexManagement Service represents an index manager. There is a one to one relationship between an Index and a Management instance, and their life-cycles are closely related; an Index is created by creating an instance (resource) of GeoIndexManagement Service, and an index is removed by terminating the corresponding GeoIndexManagement resource. The GeoIndexManagement Service should be seen as an interface for managing the life-cycle and properties of an Index, but it is not responsible for feeding or querying its index. In addition, a GeoIndexManagement Service resource does not store the content of its Index locally, but contains references to content stored in Content Management Service.
  • The GeoIndexBatchUpdater Service is responsible for feeding an Index. One GeoIndexBatchUpdater Service resource can only update a single Index, but one Index can be updated by multiple GeoIndexBatchUpdater Service resources. Feeding is accomplished by instantiating a GeoIndexBatchUpdater Service resources with the EPR of the GeoIndexManagement resource connected to the Index to update, and connecting the updater resource to a ResultSet containing the content to be fed to the Index.
  • The GeoIndexLookup Service is responsible for creating a local copy of an index, and exposing interfaces for querying and creating statistics for the index. One GeoIndexLookup Service resource can only replicate and lookup a single instance, but one Index can be replicated by any number of GeoIndexLookup Service resources. Updates to the Index will be propagated to all GeoIndexLookup Service resources replicating that Index.

It is important to note that none of the three services have to reside on the same node; they are only connected through WebService calls and the DILIGENT CMS. The following illustration shows the information flow and responsibilities for the different services used to implement the Geo Index:

(illustration will be improved shortly... )

			 ________________________________
			|				 |
			|•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘|
			|•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘|
			|•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘|
			|    So Pretty Index Design...   |
			|•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘|
			|•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘|
			|•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘|
			|________________________________|

RowSet

The content to be fed into a Geo Index, must be served as a ResultSet containing XML documents conforming to the GeoROWSET schema. This is a very simple schema, declaring that an object (ROW element) should containan id, start and end X coordinates (x1-mandatory and x2-set to equal x1 if not provided) as well as start and end Y coordinates (y1-mandatory and y2-set to equal y1 if not provided). In addition, and of any number of FIELD elements containing a name attribute and information to be stored and perhaps used for refinement of a query. As opposed to the ROWSETs used for fulltext indices, all rows in a GeoROWSET must contain all fields specified in the IndexType. The following is a simple but valid GeoROWSET containing two objects:

<ROWSET>
    <ROW id="doc1" x1="4321" y1="1234">
        <FIELD name="StartTime">2001-05-27T14:35:25.523</FIELD>
        <FIELD name="EndTime">2001-05-27T14:38:03.764</FIELD>
    </ROW>
    <ROW id="doc1" x1="1337" x2="4123" y1="1337" y2="6534">
        <FIELD name="StartTime">2001-06-27</FIELD>
        <FIELD name="EndTime">2001-07-27</FIELD>
    </ROW>
</ROWSET>

GeoIndexType

Which fields should be present in the [[[Geographical/Spatial Index#RowSet|RowSet]]], and how these fields are to be handled by the Geo Index is specified through a GeoIndexType; an XML document conforming to the GeoIndexType schema. A GeoIndexType contains a field list which contains all the fields which should be stored in order to be presented in the query results or used for refinement. The following is a possible IndexType for the type of ROWSET shown above:

    <index-type>
        <field-list>
            <field name="StartTime">
                <type>date</type>
                <return>yes</return>
            </field>
            <field name="EndTime">
                <type>date</type>
                <return>yes</return>
            </field>
        </field-list>
    </index-type>

Fields present in the ROWSET but not in the IndexType will be skipped. Fields present in the IndexType but not in a ROW in the ROWSET will cause an exception. The two elements under each "field" element are used to define that field should be handled. The meaning and expected content of each of them is explained bellow:

  • type specifies the data type of the field. Accepted values are:
    • SHORT - A number fitting into a Java "short"
    • INT - A number fitting into a Java "short"
    • LONG - A number fitting into a Java "short"
    • DATE - A date in the format yyyy-MM-dd'T'HH:mm:ss.s where only yyyy is mandatory
    • FLOAT - A decimal number fitting into a Java "float"
    • DOUBLE - A decimal number fitting into a Java "double"
    • STRING - A string with a maximum length of 40 (or so...)
  • return specifies whether the field should be returned in the results from a query. "yes" and "no" are the only accepted values.