Difference between revisions of "Content Source Description"

From Gcube Wiki
Jump to: navigation, search
(Creating Descriptor)
(Usage Example)
Line 50: Line 50:
 
  DescriptorPortType descriptor = descriptorLocator.getDescriptorPortTypePort(epr);
 
  DescriptorPortType descriptor = descriptorLocator.getDescriptorPortTypePort(epr);
  
==== all the rest ====
 
  
 +
==== Reading Terms and Term Statistics from Description (Histogram) ====
  
String[] properties= {"DescriptionType","PersistentID", "LocalURI", "SourceURI","NumberOfDocuments","LastModificationDate"};
+
DescriptorPortType descriptor = descriptorLocator.getDescriptorPortTypePort(epr);
testMultipleRPs(epr, properties);
+
String descriptionType = descriptor.getDescriptionType(new VoidType());
+
String persistentID = descriptor.getPersistentID(new VoidType());
 
+
String descriptionLocalURI = descriptor.getLocalURI(new VoidType());
static void test(EndpointReferenceType epr) throws Exception {
+
String sourceURI = descriptor.getSourceURI(new VoidType());
  long time = Calendar.getInstance().getTimeInMillis();
+
Calendar lastModificationDate = descriptor.getLastModificationDate(new VoidType());
  DescriptorPortType descriptor = descriptorLocator.getDescriptorPortTypePort(epr);
+
int numberOfDocuments = descriptor.getNumberOfDocuments(new VoidType());
  String descriptionType = descriptor.getDescriptionType(new VoidType());
+
  String persistentID = descriptor.getPersistentID(new VoidType());
+
StringArray request = new StringArray();
  String descriptionLocalURI = descriptor.getLocalURI(new VoidType());  
+
String[] names={"edizione", "document", "doesnotexist"};
  String sourceURI = descriptor.getSourceURI(new VoidType());
+
request.setItems(names);
  Calendar lastModificationDate = descriptor.getLastModificationDate(new VoidType());
+
HistogramTermArray response = descriptor.getTerms(request);
  int numberOfDocuments = descriptor.getNumberOfDocuments(new VoidType());
+
HistogramTerm[] terms = response.getItems();
+
  StringArray request = new StringArray();
+
System.out.println("Description type:" + descriptionType);
  String[] names={"edizione","document","doesnotexist"};
+
System.out.println("Source URI:" + sourceURI);
request.setItems(names);
+
System.out.println("NumberOfDocuments:" + numberOfDocuments);
HistogramTermArray response = descriptor.getTerms(request);
+
System.out.println("Persistent ID:" + persistentID);
HistogramTerm[] terms = response.getItems();
+
System.out.println("Local URI:" + descriptionLocalURI);
+
System.out.println("Last modification date:" + DateFormat.getDateInstance().format(lastModificationDate.getTime()));
Utils.log("Description type:"+descriptionType);
+
Utils.log("Source URI:"+sourceURI);
+
// print terms and statistical information
Utils.log("NumberOfDocuments:"+numberOfDocuments);
+
if (terms != null) {
Utils.log("Persistent ID:"+persistentID);
+
  System.out.println("Terms:");
Utils.log("Local URI:"+descriptionLocalURI);
+
  for (HistogramTerm term : terms){
Utils.log("Last modification date:"+
+
    System.out.println("  term="+term.getName()+
DateFormat.getDateInstance().format(lastModificationDate.getTime()));
+
    "(document frequency="+term.getDocFrequency()+
+
      ",sourcefrequency="+term.getSourceFrequency()+")");
if (terms != null) {
+
  }
System.out.println("Terms:");
+
}
for (HistogramTerm term : terms){
+
Utils.log("  term="+term.getName()+
+
"(document frequency="+term.getDocFrequency()+
+
",sourcefrequency="+term.getSourceFrequency()+")");
+
}
+
}
+
+
time = Calendar.getInstance().getTimeInMillis() - time;
+
Utils.log ("Time:"+time);
+
}
+

Revision as of 18:02, 21 February 2007

Introduction

The Content Source Description (CSD) is a digital libarary service that supports the execution of content-based queries against a number of content sources (such as collections) that are associated with DILIGENT indices.

Implementation Overview

Among the many possible ways of implementing a content source description service, the provided reference CSD service is based on the representation of text sources as term histograms. A histogram basically contains the most representative words and phrases of a content source (i.e. a content collection) together with statistics information. To obtain these statistics, the reference CSD service interacts with index services in order to derive statistical information from full-text DILIGENT indices of internal sources and to subscribe for notifications should these indices change (notifications will be available in the beta-release of the project).

The CSD service operatates on a number of underlying component packages that provide a corse-grained division of functionality:

  • Core: This package groups components responsible for generating and exposing content source descriptions.
  • Handlers: This package contains a range of handlers that are used to specify the use of atomic and possibly stateful processes within the Content Source Description service - primarily during initialisation and update. This includes the atomic tasks of description generation and the publication of descriptions after its generation.
  • Notification: This package groups components that are responsible for monitoring external changes which are relevant to content source descriptions and for reflecting those changes onto the related descriptions in accordance with their update policies.

Dependencies

  • Java JDK 1.5
  • WS-Core
  • DiligentProvider
  • KXML (version 2.3.0)
  • Contentmanagement
  • DIRCommons library
  • Indexservice Generatorservice
  • Indexservice Lookupservice
  • DISHL client
  • DISIP


Usage Example

Creating Descriptor

// the host where the CSD service runs 
String myhost = "bob";

// the id of the collection from which a description is going to be built
String sourceURI = "ARTE_ArtiDellaMemoria";

String factoryURI = "http://" + myhost + ":8080/wsrf/services/diligentproject/CSDservice/DescriptionFactory";
EndpointReferenceType endpoint = new EndpointReferenceType();
endpoint.setAddress(new Address(factoryURI));
DescriptionFactoryServiceAddressingLocator factoryLocator = new DescriptionFactoryServiceAddressingLocator();
descriptorLocator = new DescriptorServiceAddressingLocator();
factory = factoryLocator.getDescriptionFactoryPortTypePort(endpoint);

DParams params = new DParams();
params.setSourceURI(sourceURI);
EPR eprWrapper = factory.createResource(params);
EndpointReferenceType epr = eprWrapper.getEndpointReference();
DescriptorPortType descriptor = descriptorLocator.getDescriptorPortTypePort(epr);


Reading Terms and Term Statistics from Description (Histogram)

DescriptorPortType descriptor = descriptorLocator.getDescriptorPortTypePort(epr);
String descriptionType = descriptor.getDescriptionType(new VoidType());
String persistentID = descriptor.getPersistentID(new VoidType());
String descriptionLocalURI = descriptor.getLocalURI(new VoidType());
String sourceURI = descriptor.getSourceURI(new VoidType());
Calendar lastModificationDate = descriptor.getLastModificationDate(new VoidType());
int numberOfDocuments = descriptor.getNumberOfDocuments(new VoidType());
		
StringArray request = new StringArray();
String[] names={"edizione", "document", "doesnotexist"};
request.setItems(names);
HistogramTermArray response = descriptor.getTerms(request);
HistogramTerm[] terms = response.getItems();
		
System.out.println("Description type:" + descriptionType);
System.out.println("Source URI:" + sourceURI);
System.out.println("NumberOfDocuments:" + numberOfDocuments);
System.out.println("Persistent ID:" + persistentID);
System.out.println("Local URI:" + descriptionLocalURI);
System.out.println("Last modification date:" + DateFormat.getDateInstance().format(lastModificationDate.getTime()));

// print terms and statistical information		
if (terms != null) {
  System.out.println("Terms:");
  for (HistogramTerm term : terms){
    System.out.println("   term="+term.getName()+
    "(document frequency="+term.getDocFrequency()+
      ",sourcefrequency="+term.getSourceFrequency()+")");
  }
}