Difference between revisions of "IS-Collector"

Revision as of 18:22, 12 April 2011

Role

The IS-InformationCollector is a gCube service in charge of aggregating the information published by the gCube services belonging an infrastructure or a subset of it (this depends on the infrastructure configuration).

Major features of the service are:

storage, indexing, and management of gCube Resources profiles
storage, indexing, and management of instances' states in the form of WS-ResourceProperties documents
storage, indexing, and management of well-formed XML documents
full XQuery 1.0 support over the collected information
remote management of the underlying XMLStorage

It plays a crucial role on an infrastructure, since it provides to clients a continuously updated picture of the infrastructure and its state. Responsiveness is also a key point of the service: many queries are sent to the InformationCollector during online operations and any delay introduces by the query processing is immediately reflected on any aspect of the system.

Historically, the service was conceived as an aggregator service, able to create Aggregator Sinks that query remote Aggregator Sources (in particular QueryAggregatorSources, as those created by the IS-Publisher) to harvest resource properties.

Version 3.0 features a new WS-DAIX compliant interface proving a more general approach to the feeding phase.

Documents

As wrote, IS-InformationCollector is capable to handle three class of documents. Two of them, gCube Resource profiles and instance's states, have a specific semantic in the infrastructure since:

gCube Resource profiles are the manifestation of gCube Resource and allow interested client to discover such resources
instance's states, in the form of WS-ResourceProperties document, are views on the state of instance of gCube services

Because of this semantic, the two classes of resource require some extra-information for their correct management. In particular, the to-be-stored documents has to come along with a metadata record reporting information about the publisher and the resource lifetime. This is an example of such a record:

<?xml version="1.0" encoding="UTF-8"?>
<Metadata>
  <Type>Profile|InstanceState</Type>
  <Source>http://...

 <TimeToLive>600</TimeToLive>
 <GroupKey>MyGroupKey</GroupKey>
 <EntryKey>MyEntryKey</EntryKey>
 <Key>MyKey</Key>
 <Namespace> </Namespace>
 <PublicationMode>push|pull<7PublicationMode>

</Metadata> </source>

This metadata record gives the service about:

the type of the resource (Profile or Instance State)
the URL of the publisher service (Source)
the lifetime (TimeToLive) of the resource, which is added to the time the service receives the document (this is valid only for instance states published in push mode)
the service group resource, identified by the GroupKey and EntryKey element (this is valid only for instance states published through the Sink port-type)
the namespace of the WSDL in which the resource was defined (this is valid only for instance states)
the publication mode, push if the resource is pushed at each change, pull if it is periodically sent to the service (this is valid only for instance states)

Then, the source document is wrapped with this information and indexed.

For instance, if a client wants to store the following document as gCube Resource:

<Resource xmlns="" version="0.4.x">
      <ID>3bb6e850-94d2-11df-8d06-8e825c7c7b8d</ID>
      <Type>MetadataCollection</Type>
      <Scopes>
        <Scope>/d4science.research-infrastructures.eu/FARM/FCPPS</Scope>
      </Scopes>
      <Profile>
        <Description/>
        <Name>Annotations_Collection</Name>
        <IsUserCollection value="true"/>
        <IsIndexable value="true"/>
        <IsEditable value="true"/>
        <CreationTime>2010-07-21T15:14:13+01:00</CreationTime>
        <Creator>unknown</Creator>
        <NumberOfMembers>0</NumberOfMembers>
        <LastUpdateTime>2010-07-21T15:14:29+01:00</LastUpdateTime>
        <PreviousUpdateTime>2010-07-21T15:14:29+01:00</PreviousUpdateTime>
        <LastModifier>unknown</LastModifier>
        <OID>5ec18cb0-94d2-11df-b077-8e28f9a7444e</OID>
        <RelatedCollection>
          <CollectionID>d14baeb0-f162-11dd-96f7-b87cc0f0b075</CollectionID>
          <SecondaryRole>is-annotated-by</SecondaryRole>
        </RelatedCollection>
        <MetadataFormat>
          <SchemaURI>http://gcube.org/AnnotationFrontEnd/AFEAnnotationV2</SchemaURI>
          <Language>en</Language>
          <Name>Annotations</Name>
        </MetadataFormat>
      </Profile>
    </Resource>

Once the Collector receives both the documents (via the XMLCollectionAccess portType), it produces and indexes the following integrated document:

<?xml version="1.0" encoding="UTF-8"?>
<Document>
  <ID>3bb6e850-94d2-11df-8d06-8e825c7c7b8d</ID>
  <Source>http://source

 <SourceKey>MyKey</SourceKey>
 <CompleteSourceKey/>
 <EntryKey>MyEntryKey</EntryKey>
 <GroupKey>MyGroupKey</GroupKey>
 <TerminationTime>1288103198680</TerminationTime>
 <TerminationTimeHuman>Tue Oct 26 15:26:38 GMT+01:00 2010</TerminationTimeHuman>
 <LastUpdateMs>1288102598680</LastUpdateMs>
 <LastUpdateHuman>Tue Oct 26 15:16:38 GMT+01:00 2010</LastUpdateHuman>
 
   <Resource xmlns="" version="0.4.x">
     <ID>3bb6e850-94d2-11df-8d06-8e825c7c7b8d</ID>
     <Type>MetadataCollection</Type>
     <Scopes>
       <Scope>/d4science.research-infrastructures.eu/FARM/FCPPS</Scope>
     </Scopes>
     <Profile>
       <Description/>
       <Name>Annotations_Collection</Name>
       <IsUserCollection value="true"/>
       <IsIndexable value="true"/>
       <IsEditable value="true"/>
       <CreationTime>2010-07-21T15:14:13+01:00</CreationTime>
       <Creator>unknown</Creator>
       <NumberOfMembers>0</NumberOfMembers>
       <LastUpdateTime>2010-07-21T15:14:29+01:00</LastUpdateTime>
       <PreviousUpdateTime>2010-07-21T15:14:29+01:00</PreviousUpdateTime>
       <LastModifier>unknown</LastModifier>
       <OID>5ec18cb0-94d2-11df-b077-8e28f9a7444e</OID>
       <RelatedCollection>
         <CollectionID>d14baeb0-f162-11dd-96f7-b87cc0f0b075</CollectionID>
         <SecondaryRole>is-annotated-by</SecondaryRole>
       </RelatedCollection>
       <MetadataFormat>
         <SchemaURI>http://gcube.org/AnnotationFrontEnd/AFEAnnotationV2</SchemaURI>
         <Language>en</Language>
         <Name>Annotations</Name>
       </MetadataFormat>
     </Profile>
   </Resource>

</Document> </source>

This allows to:

have more enriched queries based also on the publisher's data
manage the termination time of the resource by relying on the time machine of the node hosting the Collector instance (the Termination Time in seconds is added to the last update time in order to obtain an absolute expiration time of the document), by avoiding problems due to different timezones in the infrastructure.

About the target collection in which the document will be stored:

the collection is automatically derived (and created, if needed) starting from the document type and the information included in the document itself
the collection name reported in the invocation to the AddDocuments operation is therefore ignored

About the metadata record, do note:

the Type may assume the following two values:

Profile for a gCube Resource
InstanceState for a piece of state of a gCube service instance

the TerminationTime's value must be expressed in seconds and it will be computed by the time the Collector receives the document

If a document is not stored with an associated metadata record, it is treated as a WS-DAIX resource.

Generic DAIX resources

[TBC]

XML Indexing

The IS-InformationCollector uses an embedded instance of eXist 1.2 to index XML data according to the XML data model and offers efficient, index-based XQuery processing. The documents are stored in collections following this structure:

gcube://
|
|- Profiles
|    |-<type1>
|    |-<type2>
|    |- ..
|    |-<typeN>
|
|- Properties
|
|- <User defined collection(s)>
|    |- <User defined sub-collection(s)>

Profiles and Properties collections are reserved to store respectively gCube resource's profiles and instance's states. Under the Profiles collection, a sub-collection with the resource type name is created whenever a new type is detected in a to-be-stored profile. Moreover, a client may define his own structure of collection and sub-collections and store, index and query there his documents via the XMLCollectionAccess port-type. This hierarchy of collections and sub-collections must be kept in mind when constructing a query expression to execute via the XMLQueryAccess.

Periodic exports (as zipped archives) of the XML database content are performed for backup purposes. The behavior of this activity can be configured in the JNDI file. The following section of the JNDI file shows the parameters that may be used to define where, when and how many backups are managed:

<service name="gcube/informationsystem/collector">
 
    <!-- ... -->
 
	<environment name="backupDir" value="existICBackups" type="java.lang.String"
			override="false" />
 
	<environment name="maxBackups" value="10" type="java.lang.String"
			override="false" />
 
	<environment name="scheduledBackupInHours" value="12"
			type="java.lang.String" override="false" />
</service>

In this example, backups are done every 12 hours and stored under the $HOME/.gcore/... /existICBackups folder. 10 backups are maintained and when a new one is available, the oldest one is discarded. If an absolute path is indicated as backupDir, the backups are not stored under the gCore persistent folder.

Design

The functionalities delivered by the service are logically organized across 5 port-types:

Figure 1. IS-Collector port-types

where:

XMLCollectionAccess, Sink and SinkEntry are dedicated to the publishing phase
XMLQueryAccess allows to execute XQuery expression over the instance's content
XMLStorageAcess is used to remotely manage the XML Storage underlying the service instance

XMLCollectionAccess portType

[TBD]

XQueryAccess portType

Querying the data

The XQueryExecute operation exposed by the XQueryAccess portType to query the IS-Collector can be invoked as follows:

package org.gcube.informationsystem.collector.testsuite;
 
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.net.URL;
import java.rmi.RemoteException;
 
import org.gcube.common.core.contexts.GCUBERemotePortTypeContext;
import org.gcube.common.core.scope.GCUBEScope;
import org.gcube.common.core.utils.logging.GCUBEClientLog;
import org.gcube.informationsystem.collector.stubs.XQueryAccessPortType;
import org.gcube.informationsystem.collector.stubs.XQueryExecuteRequest;
import org.gcube.informationsystem.collector.stubs.XQueryExecuteResponse;
import org.gcube.informationsystem.collector.stubs.XQueryFaultType;
import org.gcube.informationsystem.collector.stubs.service.XQueryAccessServiceLocator;
 
/**
 * Tester for <em>XQueryExecute</em> operation of the <em>gcube/informationsystem/collector/XQueryAccess</em> portType
 * 
 * @author Manuele Simi (ISTI-CNR)
 * 
 */
public class XQueryExecuteTester {    
 
    private static GCUBEClientLog logger = new GCUBEClientLog(XQueryExecuteTester.class);
    /**
     * @param args 
     *  <ul>
     *   <li> IC host
     *   <li> IC port
     *   <li> Caller Scope
     *   <li> File including the XQuery to submit
     *  </ul>
     */
    public static void main(String[] args) {
 
	if (args.length != 4) {
	    logger.fatal("Usage: XQueryExecuteTester <host> <port> <Scope> <XQueryExpressionFile>" );
	    return;
	}
	String portTypeURI = "http://"+args[0]+":"+ args[1]+"/wsrf/services/gcube/informationsystem/collector/XQueryAccess";
 
	XQueryAccessPortType port = null;
	try {
	    port = new XQueryAccessServiceLocator().getXQueryAccessPortTypePort(new URL(portTypeURI));
	    port = GCUBERemotePortTypeContext.getProxy(port, GCUBEScope.getScope(args[2]));
	} catch (Exception e) {
	    logger.error(e);
	}
 
	XQueryExecuteRequest request = new XQueryExecuteRequest();
	request.setXQueryExpression(readQuery(args[3]));
	try {
	    logger.info("Submitting query in scope " + GCUBEScope.getScope(args[2]).getName() + "....");
	    XQueryExecuteResponse response = port.XQueryExecute(request);
	    logger.info("Number of returned records: " + response.getSize());
	    logger.info("Dataset: \n" + response.getDataset());
 
	} catch (XQueryFaultType e) {
	    logger.error("XQuery Fault Error received", e);	    	    
	} catch (RemoteException e) {
	    logger.error(e);	    
	}
 
    }
    /** reads the input fileaname*/
    private static String readQuery(final String filename) {
	String queryString = null;
	try {
	    BufferedReader input =  new BufferedReader(new FileReader(filename));
	    StringBuilder contents = new StringBuilder();
	    String line;
	    while (( line = input.readLine()) != null){
	          contents.append(line);
	          contents.append(System.getProperty("line.separator"));
	    }
	    input.close();
	    queryString = contents.toString();
	} catch (FileNotFoundException e1) {
	    logger.fatal("invalid file: " + filename);
	} catch (IOException e) {
	    logger.fatal("an error occurred when reading " + filename);
	}
	return queryString;
 
    }
 
}

Sample queries

...

Sink portType

Registering a new resource

A sample registration file:

<ServiceGroupRegistrationParameters
    xmlns:sgc="http://mds.globus.org/servicegroup/client"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/03/addressing"
    xmlns:agg="http://mds.globus.org/aggregator/types"
    xmlns="http://mds.globus.org/servicegroup/client">    
 
    <!-- Specifies that the registration will be renewed every 60 seconds -->
    <RefreshIntervalSecs>60</RefreshIntervalSecs>    
 
    <!-- <Content> specifies registration specific information -->
    <Content xsi:type="agg:AggregatorContent" xmlns:agg="http://mds.globus.org/aggregator/types">        
        <agg:AggregatorConfig>
            <agg:GetMultipleResourcePropertiesPollType 
                xmlns:metadatamanager="http://gcube-system.org/namespaces/metadatamanagement/metadatamanager">
                <!-- Polling interval -->
                <agg:PollIntervalMillis>60000</agg:PollIntervalMillis>
                <!-- Resource names-->
                <agg:ResourcePropertyNames>metadatamanager:MetadataCollectionID</agg:ResourcePropertyNames>
                <agg:ResourcePropertyNames>metadatamanager:LastUpdateTime</agg:ResourcePropertyNames>                                               
            </agg:GetMultipleResourcePropertiesPollType>
        </agg:AggregatorConfig>        
       <agg:AggregatorData/>        
    </Content>
 
</ServiceGroupRegistrationParameters>

A sample Registration class:

import org.globus.wsrf.encoding.ObjectDeserializer;
import org.globus.wsrf.impl.servicegroup.client.ServiceGroupRegistrationClient;
import org.globus.wsrf.impl.servicegroup.client.ServiceGroupRegistrationClientCallback;
import org.globus.wsrf.utils.XmlUtils;
import org.globus.mds.servicegroup.client.ServiceGroupRegistrationParameters;
 
import org.apache.axis.message.addressing.EndpointReferenceType;
 
import commonj.timers.Timer;
 
 
/**
 * Registration class for Aggregator Resources
 * 
 * @author Manuele Simi (ISTI-CNR)
 * 
 */
public class Registration implements ServiceGroupRegistrationClientCallback {
 
 private final EndpointReferenceType source; // the SOURCE endpoint
 
 private final EndpointReferenceType sink; // the SINK endpoint, e.g. http://dlib27.isti.cnr.it:8000/wsrf/services/gcube/informationsystem/collector/Sink
 
 private String registrationFile = "...";// the content of the registration file
 
 public void register() {
	byte[] bArray =  this.registrationFile.getBytes();
	ByteArrayInputStream bais = new ByteArrayInputStream(bArray);
	Document doc = XmlUtils.newDocument(bais);
 
	 // setting parameters for this registration
	ServiceGroupRegistrationParameters params = (ServiceGroupRegistrationParameters) ObjectDeserializer.toObject(doc.getDocumentElement(),
		    ServiceGroupRegistrationParameters.class);
	params.setRegistrantEPR(this.source);
	params.setServiceGroupEPR(this.sink);
 
	 // create a serviceGroupRegistration Client for registration
	ServiceGroupRegistrationClient client = new ServiceGroupRegistrationClient();	    
 
	 // subscribe to receive registration status messages
	client.setClientCallback(this);
	 /*
	  * The client creates a SinkEntry in the appropriate target WS-ServiceGroup. Then, it periodically attempts to renew WS-ResourceLifetime lifetime extension on the
	  * SinkEntry. If the client detects that the SinkEntry no longer available, it will create a new one.
	  */	 
	 Timer timer = client.register(params, 1000);
 
     }
 
     /**
     * {@inheritDoc}
     * @see org.globus.wsrf.impl.servicegroup.client.ServiceGroupRegistrationClientCallback#setRegistrationStatus(
     *    org.globus.mds.servicegroup.client.ServiceGroupRegistrationParameters, boolean, boolean,java.lang.Exception)
     */
    public boolean setRegistrationStatus(ServiceGroupRegistrationParameters regParams, boolean wasSuccessful, boolean wasRenewalAttempt, Exception exception) {
 
      if (wasSuccessful) {
	    if (wasRenewalAttempt) {
		logger.trace("Renewal of the existing registration completed successfully");
	    } else {
		logger.trace("New registration completed successfully");
	    }
	} else {
            logger.error("Registration failed");
	}
    }
}

Sample Usage

TBP

Test-suite

The IS-Collector comes with a test-suite package allowing to test its administration functionalities (mainly the XMLStorageAccess portType).The test-suite is completely independent and does not require any other gCube package, except than a local gCore installation. The package is composed by a set of classes, sample configuration files and scripts ready to be executed.

|-lib
|--org.gcube.informationsystem.collector.testsuite.jar
|
|-backup.sh
|-restore.sh
|-shutdown.sh
|-connect.sh

Each script allows to test a different service's operation or group of operations logically related. In the following, an explanation of each script and its usage is provided.

Storing new Documents

Storing a new gCube Document

Invoke the AddDocument.sh tester with the following arguments:

addDocument.sh <host> <port> <scope> <ProfileID> <filename> Profile

Storing a new Instance State Document

Invoke the addDocument.sh tester with the following arguments:

addDocument.sh <host> <port> <scope> <StateID> <filename> InstanceState

Storing a new DAIX Resource Document

Invoke the AddDocument.sh tester with the following arguments:

addDAIXDocument.sh <host> <port> <scope> <ProfileID> <filename> <targetCollectionName>

Retrieving Documents

Getting a gCube Document

Invoke the getDocument.sh tester with the following arguments:

getDocumentsTester.sh <host> <port> <scope> gcube://Profiles/<ProfileType> <ProfileID>

Getting an Instance State Document

Invoke the getDocument.sh tester with the following arguments:

getDocument.sh <host> <port> <scope> gcube://Properties <StateID>

Getting an DAIX Resource Document

Invoke the getDocument.sh tester with the following arguments:

getDocument.sh <host> <port> <scope> gcube://<collectionName> <ResourceID>

Managing the XML Storage

Requesting the XML Storage backup

The backup.sh script invokes the Backup operation of the XMLStorageAccess portType and requests an immediate backup of the XML Storage. It has to be executed as follows:

./backup.sh <IS-Collector host> <IS-Collector port> <scope>

Restoring the latest XML Storage backup

The restore.sh script invokes the Restore operation of the XMLStorageAccess portType and requests the restore of the latest backup of the XML Storage available. It has to be executed as follows:

./restore.sh <IS-Collector host> <IS-Collector port> <scope>

Shutting down the XML Storage

The shutdown.sh script invokes the Shutdown operation of the XMLStorageAccess portType and requests the shutdown of any connection with the XML Storage available. This could be helpful to call before to stop the container on the node in order to have a gently shutdown of the instance. It has to be executed as follows:

./shutdown.sh <IS-Collector host> <IS-Collector port> <scope>

Reconnecting the XML Storage

The connect.sh script invokes the Connect operation of the XMLStorageAccess portType and requests to reopen the needed connections to the XML Storage previously closed with a Shutdown call. It has to be executed as follows:

./coonect.sh <IS-Collector host> <IS-Collector port> <scope>

Difference between revisions of "IS-Collector"

Revision as of 18:22, 12 April 2011

Contents

Role

Documents

Generic DAIX resources

XML Indexing

Design

XMLCollectionAccess portType

XQueryAccess portType

Querying the data

Sample queries

Sink portType

Registering a new resource

Sample Usage

Test-suite

Storing new Documents

Retrieving Documents

Managing the XML Storage

Navigation menu

Views

Personal tools

gCube Wiki

gCube features

gCube documentation

Integration and Distribution

Search

Tools