Content Source Description

From Gcube Wiki
Revision as of 12:58, 1 March 2007 by Ralf (Talk | contribs) (Reading Terms and Term Statistics from Description (Histogram))

Jump to: navigation, search

Introduction

The Content Source Description (CSD) is a digital libarary service that supports the execution of content-based queries against a number of content sources (such as collections) that are associated with DILIGENT indices.

Implementation Overview

Among the many possible ways of implementing a content source description service, the provided reference CSD service is based on the representation of text sources as term histograms. A histogram basically contains the most representative words and phrases of a content source (i.e. a content collection) together with statistics information. To obtain these statistics, the reference CSD service interacts with index services in order to derive statistical information from full-text DILIGENT indices of internal sources and to subscribe for notifications should these indices change (notifications will be available in the beta-release of the project).

The CSD service operatates on a number of underlying component packages that provide a corse-grained division of functionality:

  • Core: This package groups components responsible for generating and exposing content source descriptions.
  • Handlers: This package contains a range of handlers that are used to specify the use of atomic and possibly stateful processes within the Content Source Description service - primarily during initialisation and update. This includes the atomic tasks of description generation and the publication of descriptions after its generation.
  • Notification: This package groups components that are responsible for monitoring external changes which are relevant to content source descriptions and for reflecting those changes onto the related descriptions in accordance with their update policies.

Dependencies

  • Java JDK 1.5
  • WS-Core
  • DiligentProvider
  • KXML (version 2.3.0)
  • Contentmanagement
  • DIRCommons library
  • Indexservice Generatorservice
  • Indexservice Lookupservice
  • DISHL client
  • DISIP


Usage Example

Creating Descriptor

// necessary imports
import org.diligentproject.CSDservice.impl.core.stubs.DescriptorPortType;
import org.diligentproject.CSDservice.impl.core.stubs.EPR;
import org.diligentproject.CSDservice.impl.core.stubs.DParams;
import org.diligentproject.CSDservice.impl.core.stubs.service.DescriptorServiceAddressingLocator;
import org.diligentproject.CSDservice.impl.core.stubs.service.DescriptionFactoryServiceAddressingLocator;
import org.apache.axis.message.addressing.Address;
import org.apache.axis.message.addressing.EndpointReferenceType;

// the host where the CSD service runs 
String myhost = "bob";

// the id of the collection from which a description is going to be built
String sourceURI = "ARTE_ArtiDellaMemoria";

String factoryURI = "http://" + myhost + ":8080/wsrf/services/diligentproject/CSDservice/DescriptionFactory";
EndpointReferenceType endpoint = new EndpointReferenceType();
endpoint.setAddress(new Address(factoryURI));
DescriptionFactoryServiceAddressingLocator factoryLocator = new DescriptionFactoryServiceAddressingLocator();
descriptorLocator = new DescriptorServiceAddressingLocator();
factory = factoryLocator.getDescriptionFactoryPortTypePort(endpoint);

DParams params = new DParams();
params.setSourceURI(sourceURI);
EPR eprWrapper = factory.createResource(params);
EndpointReferenceType epr = eprWrapper.getEndpointReference();
DescriptorPortType descriptor = descriptorLocator.getDescriptorPortTypePort(epr);


Reading Terms and Term Statistics from Description (Histogram)

// necessary imports
import java.text.DateFormat;
import java.util.Calendar;
import org.diligentproject.CSDservice.impl.core.stubs.DescriptorPortType;
import org.diligentproject.CSDservice.impl.core.stubs.StringArray;
import org.diligentproject.CSDservice.impl.core.stubs.VoidType;
import org.diligentproject.CSDservice.impl.histograms.stubs.HistogramTerm;
import org.diligentproject.CSDservice.impl.histograms.stubs.HistogramTermArray;

DescriptorPortType descriptor = descriptorLocator.getDescriptorPortTypePort(epr);
String descriptionType = descriptor.getDescriptionType(new VoidType());
String persistentID = descriptor.getPersistentID(new VoidType());
String descriptionLocalURI = descriptor.getLocalURI(new VoidType());
String sourceURI = descriptor.getSourceURI(new VoidType());
Calendar lastModificationDate = descriptor.getLastModificationDate(new VoidType());
int numberOfDocuments = descriptor.getNumberOfDocuments(new VoidType());
		
StringArray request = new StringArray();
String[] names={"edizione", "document", "doesnotexist"};
request.setItems(names);
HistogramTermArray response = descriptor.getTerms(request);
HistogramTerm[] terms = response.getItems();
		
System.out.println("Description type:" + descriptionType);
System.out.println("Source URI:" + sourceURI);
System.out.println("NumberOfDocuments:" + numberOfDocuments);
System.out.println("Persistent ID:" + persistentID);
System.out.println("Local URI:" + descriptionLocalURI);
System.out.println("Last modification date:" + DateFormat.getDateInstance().format(lastModificationDate.getTime()));

// print terms and statistical information		
if (terms != null) {
  System.out.println("Terms:");
  for (HistogramTerm term : terms){
    System.out.println("   term="+term.getName()+
    "(document frequency="+term.getDocFrequency()+
      ",sourcefrequency="+term.getSourceFrequency()+")");
  }
}


Accessing Resource Properties