Geographical - Spatial Index

From Gcube Wiki
Revision as of 20:22, 27 November 2007 by Msibeko (Talk | contribs) (Create a Management Resource)

Jump to: navigation, search

Services

The geo index is implemented through three services, in the same manner as the full text index. They are all implemented according to the Factory pattern:

  • The GeoIndexManagement Service represents an index manager. There is a one to one relationship between an Index and a Management instance, and their life-cycles are closely related; an Index is created by creating an instance (resource) of GeoIndexManagement Service, and an index is removed by terminating the corresponding GeoIndexManagement resource. The GeoIndexManagement Service should be seen as an interface for managing the life-cycle and properties of an Index, but it is not responsible for feeding or querying its index. In addition, a GeoIndexManagement Service resource does not store the content of its Index locally, but contains references to content stored in Content Management Service.
  • The GeoIndexBatchUpdater Service is responsible for feeding an Index. One GeoIndexBatchUpdater Service resource can only update a single Index, but one Index can be updated by multiple GeoIndexBatchUpdater Service resources. Feeding is accomplished by instantiating a GeoIndexBatchUpdater Service resources with the EPR of the GeoIndexManagement resource connected to the Index to update, and connecting the updater resource to a ResultSet containing the content to be fed to the Index.
  • The GeoIndexLookup Service is responsible for creating a local copy of an index, and exposing interfaces for querying and creating statistics for the index. One GeoIndexLookup Service resource can only replicate and lookup a single instance, but one Index can be replicated by any number of GeoIndexLookup Service resources. Updates to the Index will be propagated to all GeoIndexLookup Service resources replicating that Index.

It is important to note that none of the three services have to reside on the same node; they are only connected through WebService calls and the DILIGENT CMS. The following illustration shows the information flow and responsibilities for the different services used to implement the Geo Index:

Generic Editor

RowSet

The content to be fed into a Geo Index, must be served as a ResultSet containing XML documents conforming to the GeoROWSET schema. This is a very simple schema, declaring that an object (ROW element) should containan id, start and end X coordinates (x1-mandatory and x2-set to equal x1 if not provided) as well as start and end Y coordinates (y1-mandatory and y2-set to equal y1 if not provided). In addition, and of any number of FIELD elements containing a name attribute and information to be stored and perhaps used for refinement of a query or ranking of results. As opposed to the ROWSETs used for fulltext indices, all rows in a GeoROWSET must contain all fields specified in the IndexType. The following is a simple but valid GeoROWSET containing two objects:

<ROWSET>
    <ROW id="doc1" x1="4321" y1="1234">
        <FIELD name="StartTime">2001-05-27T14:35:25.523</FIELD>
        <FIELD name="EndTime">2001-05-27T14:38:03.764</FIELD>
    </ROW>
    <ROW id="doc1" x1="1337" x2="4123" y1="1337" y2="6534">
        <FIELD name="StartTime">2001-06-27</FIELD>
        <FIELD name="EndTime">2001-07-27</FIELD>
    </ROW>
</ROWSET>

GeoIndexType

Which fields should be present in the RowSet, and how these fields are to be handled by the Geo Index is specified through a GeoIndexType; an XML document conforming to the GeoIndexType schema. Which GeoIndexType to use for a specific GeoIndex instance, is specified by supplying a GeoIndexType ID during initialization of the GeoIndexManagement resource. A GeoIndexType contains a field list which contains all the fields which should be stored in order to be presented in the query results or used for refinement. The following is a possible IndexType for the type of ROWSET shown above:

    <index-type>
        <field-list>
            <field name="StartTime">
                <type>date</type>
                <return>yes</return>
            </field>
            <field name="EndTime">
                <type>date</type>
                <return>yes</return>
            </field>
        </field-list>
    </index-type>

Fields present in the ROWSET but not in the IndexType will be skipped. Fields present in the IndexType but not in a ROW in the ROWSET will cause an exception. The two elements under each "field" element are used to define that field should be handled. The meaning and expected content of each of them is explained bellow:

  • type specifies the data type of the field. Accepted values are:
    • SHORT - A number fitting into a Java "short"
    • INT - A number fitting into a Java "short"
    • LONG - A number fitting into a Java "short"
    • DATE - A date in the format yyyy-MM-dd'T'HH:mm:ss.s where only yyyy is mandatory
    • FLOAT - A decimal number fitting into a Java "float"
    • DOUBLE - A decimal number fitting into a Java "double"
    • STRING - A string with a maximum length of 40 (or so...)
  • return specifies whether the field should be returned in the results from a query. "yes" and "no" are the only accepted values.

Plugin Framework

As explained in the GeoIndexType section, which fields a GeoIndex instance should contain can be dynamically specified through a GeoIndexType provided during GeoIndexManagement initialization. However, since new GeoIndexTypes can be added at any time with any number of new fields, there is no way for the GeoIndex itself to know how to use the information in such fields in any meaningful manner when processing a query; a static generic algorithm for processing such information would drastically limit the usefulness of the information. In order to allow for dynamic introduction of field evaluation algorithms capable of handling the dynamic nature of IndexTypes, a plugin framework was introduced. The framework allows for the creation of GeoIndexType-specific evaluators handling ranking and refinement.

DIS plugin information...

Ranking

The results of a query are sorted according to their rank, and their ranks are also returned to the caller. A RankEvaluator plugin is used to determine the rank of objects. It is provided with the query region, Object data, GeoIndexType and an optional set of plugin specific arguments, and is expected to use this information in order to return a meaningful rank of each object.

Refinement

The GeoIndex uses TwoStep processing in order to process a query. Firstly, a very efficient filtering step will all possible hits (along with some false hits) using the minimal bouning rectangle (mbr) of the query region. Then, a more costly refinement step will use additional object and query information in order to eliminate all the false hits. While the filtering step is handled internally in the index, the refinement step is handled by a refiner plugin. It is provided with the query region, Object data, GeoIndexType and an optional set of plugin specific arguments, and is expected to use this information in order to determine whether an object is whithin a query or not.

Creating a Rank Evaluator

A RankEvaluator plugin has to extend the abstract class org.diligentproject.indexservice.geo.ranking.RankEvaluator which contains three abstract methods:

  • abstract public void initialize(String args[]) -- a method called during the initiation of the RankEvaluator plugin, providing the plugin with any arguments provided in the code. All arguments are given as Strings, and it's up to the plugin to parse the string into the datatype needed by the plugin.
  • abstract public boolean isIndexTypeCompatible(GeoIndexType indexType) -- should be able to determine whether this plugin can be used by an index conforming to the GeoIndexType argument
  • abstract public double rank(Object entry) -- the method that calculates the rank of an entry.


In addition, the RankEvaluator abstract class implements two other methods worth noting

  • final public void init(Polygon polygon, InclusionType containmentMethod, GeoIndexType indexType, String args[]) -- initialized the protected variables Polygon polygon, Envelope envelope, InclusionType containmentMethod and GeoIndexType indexType, before calling initialize() using the last argument. This means that all the four protected variables are available in the initialize() method.
  • protected Object getDataField(String field, Data data) -- a method used to retrieve a the contents of a specific GeoIndexType field from a org.geotools.index.Data object conforming to the GeoIndexType used by the plugin.


Ok, simple enough... So let's create a RankEvaluator plugin. We'll assume that for a certain use case, entries which span over a long period of time are of less interest than objects wich span over a short period of time. Since we're dealing with TimeSpans, we'll assume that the data stored in the index will have a "StartTime" field and an "EndTime" field, in accordance with the GeoIndexType created earlier.

The first thing we need to do, is to create a class which extends RankEvaluator:

package org.mojito.ranking;
import org.diligentproject.indexservice.geo.ranking.RankEvaluator;

public class SpanSizeRanker extends RankEvaluator{
    
}

Next, we'll implement the isIndexTypeCompatible method. To do this, we need a way of determine if the fields we need are present in the GeoIndexType argument. Luckily, GeoIndexType contains a method called containsField which expects the String name and GeoIndexField.DataType (date, double, float, int, long, short or string) type of the field in question as arguments. In addition, we'll implement the initialize() method, which we'll leave empty as the plugin we are creating doesn't need to handle any arguments.

package org.mojito.ranking;

import org.diligentproject.indexservice.common.GeoIndexField;
import org.diligentproject.indexservice.common.GeoIndexType;
import org.diligentproject.indexservice.geo.ranking.RankEvaluator;

public class SpanSizeRanker extends RankEvaluator{
    public void initialize(String[] args) {}

    public boolean isIndexTypeCompatible(GeoIndexType indexType) {
        return indexType.containsField("StartTime", GeoIndexField.DataType.DATE) && 
                indexType.containsField("EndTime", GeoIndexField.DataType.DATE);
    }    
}

Last, but not least... We need to implement the Rank() method. This is of course the method which calculates a rank for an entry, based on the query polygon, any extra arguments and the different fields of the entry. In our implementation, we'll simply calculate the timespan, and devide 1 by this number in order to get a quick and dirty rank. Keep in mind that this method is not called for all the entries resulting from the R-Tree filtering step, but only a subset roughly fitting the resultset page size. This means that somewhat computationally heavy operation can be performed (if needed) without drastically lowering response time. Please also note how the getDataField() method is used in order retrieve the evaluated fields from the entry data, and how the result is cast to Long (even though we are dealing with dates). The reason for this is that the GeoIndex internally represents a date as a long containing the number of seconds from the Epoch. If we wanted to evaluate the Minimal Bouning Rectangle (MBR) of the entries, we could access them through entry.getBounds().

package org.mojito.ranking;

import org.diligentproject.indexservice.common.GeoIndexField;
import org.diligentproject.indexservice.common.GeoIndexType;
import org.diligentproject.indexservice.geo.ranking.RankEvaluator;
import org.geotools.index.Data;
import org.geotools.index.rtree.Entry;


public class SpanSizeRanker extends RankEvaluator{
    public void initialize(String[] args) {}

    public boolean isIndexTypeCompatible(GeoIndexType indexType) {
        return indexType.containsField("StartTime", GeoIndexField.DataType.DATE) && 
                indexType.containsField("EndTime", GeoIndexField.DataType.DATE);
    }
    
    public double rank(Object obj){
        Entry entry = (Entry)obj;
        Data data = (Data)entry.getData();
        Long entryStartTime = (Long) this.getDataField("StartTime", data);
        Long entryEndTime = (Long) this.getDataField("EndTime", data);
        long spanSize = entryEndTime - entryStartTime;
        
        return 1/(spanSize + 1);
    }
    
}


And there we are! Our first working RankEvaluator plugin.

Creating a Refiner

A Refiner plugin has to extend the abstract class org.diligentproject.indexservice.geo.refinement.Refiner which contains three abstract methods:

  • abstract public void initialize(String args[]) -- a method called during the initiation of the RankEvaluator plugin, providing the plugin with any arguments provided in the code. All arguments are given as Strings, and it's up to the plugin to parse the string into the datatype needed by the plugin.
  • abstract public boolean isIndexTypeCompatible(GeoIndexType indexType) -- should be able to determine whether this plugin can be used by an index conforming to the GeoIndexType argument
  • abstract public List<Entry> refine(List<Entry> entries); -- the method responsible for refining a list of results.


In addition, the Refiner abstract class implements two other methods worth noting

  • final public void init(Polygon polygon, InclusionType containmentMethod, GeoIndexType indexType, String args[]) -- initialized the protected variables Polygon polygon, Envelope envelope, InclusionType containmentMethod and GeoIndexType indexType, before calling the abstract initialize() using the last argument. This means that all the four protected variables are available in the initialize() method.
  • protected Object getDataField(String field, Data data) -- a method used to retrieve a the contents of a specific GeoIndexType field from a org.geotools.index.Data object conforming to the GeoIndexType used by the plugin.


Quite similar to the RankEvaluator isn't it?... So let's create a Refiner plugin to go with the previously created RankEvaluator. We'll still assume that the data stored in the index will have a "StartTime" field and an "EndTime" field, in accordance with the GeoIndexType created earlier. The "shorter is better" notion from the RankEvaluator example still holds true, and we want to create a plugin which refines a query by removing all objects wich span over a time bigger than a maxSpanSize value, avoiding those ridiculous everlasting objects... The maxSpanSize value will be provided to the plugin as an initialization argument.

The first thing we need to do, is to create a class which extends Refiner:

package org.mojito.refinement;
import org.diligentproject.indexservice.geo.refinement.Refiner;

public class SpanSizeRefiner extends Refiner{
    
}

The isIndexTypeCompatible method is implemented in a similar manner as for the SpanSizeRanker. However in this plugin we have to pay closer attention to the initialize() function, since we expect the maxSpanSize to be given as an argument. Since maxSpanSize is the only argument, the String array argument of initialize(String[] args) will contain a single element which will be a String representation of the maxSpanSize. In order for this value to be usable, we will parse it to a long, which will represent the maxSpanSize in milliseconds.

package org.mojito.refinement;

import org.diligentproject.indexservice.common.GeoIndexField;
import org.diligentproject.indexservice.common.GeoIndexType;
import org.diligentproject.indexservice.geo.refinement.Refiner;

public class SpanSizeRefiner extends Refiner {
        private long maxSpanSize;

        public void initialize(String[] args) {
            this.maxSpanSize = Long.parseLong(args[0]);
        }
        
        public boolean isIndexTypeCompatible(GeoIndexType indexType) {
            return indexType.containsField("StartTime", GeoIndexField.DataType.DATE) && 
                    indexType.containsField("EndTime", GeoIndexField.DataType.DATE);
        } 
}

And once again we've saved the best, or at least the most important, for last; the refine() implementation is where we decide how to refine the query results. It takse a list of Entry objects as an argument, and is expected to return a similar (though usually smaller) list of Entry objects as a result. As with the RankEvaluator, the synchronization with the ResultSet page size allows for quite computationally heavy operations, however we have little use for that in this example. We will simply calculate the time span of each entry in the argument List and compare it to the maxSpanSize value. If it is smaller or equal, we'll add it to the results List.

package org.mojito.refinement;

import java.util.ArrayList;
import java.util.List;

import org.diligentproject.indexservice.common.GeoIndexField;
import org.diligentproject.indexservice.common.GeoIndexType;
import org.diligentproject.indexservice.geo.refinement.Refiner;
import org.geotools.index.Data;
import org.geotools.index.rtree.Entry;


public class SpanSizeRefiner extends Refiner {
        private long maxSpanSize;

        public void initialize(String[] args) {
            this.maxSpanSize = Long.parseLong(args[0]);
        }
        
        public boolean isIndexTypeCompatible(GeoIndexType indexType) {
            return indexType.containsField("StartTime", GeoIndexField.DataType.DATE) && 
                    indexType.containsField("EndTime", GeoIndexField.DataType.DATE);
        }
        
        public List<Entry> refine(List<Entry> entries){
            ArrayList<Entry> returnList = new ArrayList<Entry>();
            Data data;
            Long entryStartTime = null, entryEndTime = null;
            

            for(Entry entry : entries){
                    data = (Data)entry.getData();
                    entryStartTime = (Long) this.getDataField("StartTime", data);
                    entryEndTime = (Long) this.getDataField("EndTime", data);
                    
                    if (entryEndTime < entryStartTime){
                        long temp = entryEndTime;
                        entryEndTime = entryStartTime;
                        entryStartTime = temp; 
                    }
                    if (entryEndTime - entryStartTime <= maxSpanSize){
                        returnList.add(entry);
                    }
            }
            return returnList;
        }
    }

And that's all there is to it! We have created our first Refinement plugin, capable of getting rid of those annoying long-lived objects.

Packaging plugins

Will be filled out shortly

loading of plugins

Query language

A query is specified through a SearchPolygon object, containing the points of the vertices of the query region, an optional RankingRequest object and an optional list of RefinementRequest objects. A RankingRequest object contains the String ID of the RankEvaluator to use, along with an optional String array of arguments to be used by the specified RankEvaluator. Similarly, the RefinementRequest contains the String ID of the Refiner to use, along with an optional String array of arguments to be used by the specified Refiner

+ how to specify a rectangle

Dependencies

Will be filled out shortly

Usage Example

Create a Management Resource

//Get the factory portType
String geoManagementFactoryURI = "http://some.domain.no:8080/wsrf/services/diligentproject/index/GeoIndexManagementFactoryService";
GeoIndexManagementFactoryServiceAddressingLocator geoManagementFactoryLocator = new GeoIndexManagementFactoryServiceAddressingLocator();

geoManagementFactoryEPR = new EndpointReferenceType();
geoManagementFactoryEPR.setAddress(new Address(geoManagementFactoryURI));
geoManagementFactory = geoManagementFactoryLocator
             .getGeoIndexManagementFactoryPortTypePort(managementFactoryEPR);

//Create generator resource and get endpoint reference of WS-Resource.
org.diligentproject.indexservice.fulltextindexmanagement.stubs.CreateResource managementCreateArguments =
                           new org.diligentproject.indexservice.fulltextindexmanagement.stubs.CreateResource();
managementCreateArguments.setIndexTypeName(new URI(
                            "http://www.diligentproject.org/index/type/" + indexType));

managementCreateArguments.setIndexTypeID(indexType);//Optional (only needed if not provided in RS)
managementCreateArguments.setIndexID(indexID);//Optional (should usually not be set, and the service will create the ID)
managementCreateArguments.setCollectionID(new String[] {collectionID});
managementCreateArguments.setGeographicalSystem("WGS_1984");
managementCreateArguments.setUnitOfMeasurement("DD");
managementCreateArguments.setNumberOfDecimals(4);

org.diligentproject.indexservice.geoindexmanagement.stubs.CreateResourceResponse geoManagementCreateResponse = 
                                                                                 geoManagementFactory.createResource(generatorCreateArguments);
geoManagementInstanceEPR = geoManagementCreateResponse.getEndpointReference();
String indexID = geoManagementCreateResponse.getIndexID();

Create an Updater Resource and start feeding

EndpointReferenceType geoUpdaterFactoryEPR = null;
EndpointReferenceType geoUpdaterInstanceEPR = null;
GeoIndexUpdaterFactoryPortType geoUpdaterFactory = null;
GeoIndexUpdaterPortType geoUpdaterInstance = null;
GeoIndexUpdaterServiceAddressingLocator geoUpdaterInstanceLocator = new GeoIndexUpdaterServiceAddressingLocator();
GeoIndexUpdaterFactoryServiceAddressingLocator updaterFactoryLocator = new GeoIndexUpdaterFactoryServiceAddressingLocator();

//Get the factory portType
String geoUpdaterFactoryURI = "http://some.domain.no:8080/wsrf/services/diligentproject/index/GeoIndexUpdaterFactoryService"; //could be on any node
geoUpdaterFactoryEPR = new EndpointReferenceType();
geoUpdaterFactoryEPR.setAddress(new Address(geoUpdaterFactoryURI));
geoUpdaterFactory = updaterFactoryLocator
                      .getGeoIndexUpdaterFactoryPortTypePort(geoUpdaterFactoryEPR);


//Create updater resource and get endpoint reference of WS-Resource
org.diligentproject.indexservice.geoindexupdater.stubs.CreateResource geoUpdaterCreateArguments =
                                              new org.diligentproject.indexservice.geoindexupdater.stubs.CreateResource();

updaterCreateArguments.setMainIndexID(indexID);
                        

//Now let's insert some data into the index... Firstly, get the updater EPR.
org.diligentproject.indexservice.geoindexupdater.stubs.CreateResourceResponse geoUpdaterCreateResponse = updaterFactory
                                       .createResource(geoUpdaterCreateArguments);
geoUpdaterInstanceEPR = geoUpdaterCreateResponse.getEndpointReference() 


//Get updater instance PortType
geoUpdaterInstance = geoUpdaterInstanceLocator.getGeoIndexUpdaterPortTypePort(geoUpdaterInstanceEPR);
 

//read the EPR of the ResultSet containing the ROWSETs to feed into the index                        
BufferedReader in = new BufferedReader(new FileReader(eprFile));
String line;
resultSetLocator = "";
while((line = in.readLine())!=null){
    resultSetLocator += line;
}
                       
//Tell the updater to start gathering data from the ResultSet
geoUpdaterInstance.process(resultSetLocator);

Create a Lookup resource and perform a query

//Let's put it on another node for fun...
String geoLookupFactoryURI = "http://another.domain.no:8080/wsrf/services/diligentproject/index/GeoIndexLookupFactoryService";
EndpointReferenceType geoLookupFactoryEPR = null;
EndpointReferenceType geoLookupEPR = null;
GeoIndexLookupFactoryServiceAddressingLocator geoFactoryLocator = new GeoIndexLookupFactoryServiceAddressingLocator();
GeoIndexLookupServiceAddressingLocator geoLookupInstanceLocator = new GeoIndexLookupServiceAddressingLocator();
GeoIndexLookupFactoryPortType geoIndexLookupFactory = null;
GeoIndexLookupPortType geoIndexLookupInstance = null;

//Get factory portType
geoLookupFactoryEPR = new EndpointReferenceType();
geoLookupFactoryEPR.setAddress(new Address(geoLookupFactoryURI));
geoLookupFactory = geoIndexFactoryLocator.getGeoIndexLookupFactoryPortTypePort(geoLookupFactoryEPR);

//Create resource and get endpoint reference of WS-Resource
org.diligentproject.indexservice.geoindexlookup.stubs.CreateResource geoLookupCreateResourceArguments = 
						new org.diligentproject.indexservice.geoindexlookup.stubs.CreateResource();
org.diligentproject.indexservice.geoindexlookup.stubs.CreateResourceResponse geoLookupCreateResponse = null;
 
geoLookupCreateResourceArguments.setMainIndexID(indexID);    
geoLookupCreateResponse = geoLookupFactory.createResource( geoLookupCreateResourceArguments);
geoLookupEPR =  geoLookupCreateResponse.getEndpointReference(); 

//Get instance PortType
geoLookupInstance = geoLookupInstanceLocator.getGeoIndexLookupPortTypePort(geoLookupInstanceEPR);

//Start creating the query
SearchPolygon search = new SearchPolygon();

Point[] vertices = new Point[] {new Point(-100, 11), new Point(-100, -100),
                                    new Point(100, -100), new Point(100, 11)};

//A request to rank by the ranker created in the previous example
RankingRequest ranker = new RankingRequest(new String[]{}, "SpanSizeRanker");

//A request to use the refiner created in the previous example. 
//Please make note of the refiner argument in the String array.
RefinementRequest refinement = new RefinementRequest(new String[]{100000}, "SpanSizeRefiner");

//Perform the query
search.setVertices(vertices);
search.setRanker(ranker);
search.setRefinementList(new RefinementRequest[]{refinement});
search.setInclusion(InclusionType.contains);
String resultEpr = geoIndexLookupInstance.search(search);

//Print the results to screen. (refer to the ResultSet Framework page for a more detailed explanation)
RSXMLReader reader=null;
ResultElementBase[] results;

try{
    //create a reader for the ResultSet we created
    reader = RSXMLReader.getRSXMLReader(new RSLocator(resultEpr)); 

    //Print each part of the RS to std.out
    System.out.println("<Results>");
    do{
        System.out.println("    <Part>");
        if (reader.getNumberOfResults() > 0){
            results = reader.getResults(ResultElementGeneric.class);
            for(int i = 0; i < results.length; i++ ){
                System.out.println("        "+results[i].toXML());
            }
        }
        System.out.println("    </Part>");
        if(!reader.getNextPart()){
            break;
        }
    }
    while(true);
    System.out.println("</Results>");
}
catch(Exception e){
    e.printStackTrace();
}

--Msibeko 15:49, 20 June 2007 (EEST)