Difference between revisions of "GCube Document Library (2.0)"

From Gcube Wiki
Jump to: navigation, search
(Preliminaries)
(Projections)
Line 20: Line 20:
  
 
== Projections ==
 
== Projections ==
 +
 +
A projection is a set of constraints over the properties of documents in the gDM. It can be used to ''match'' documents, i.e. identify documents whose properties satisfy the constraints of the projection.
 +
<br>
 +
Projections and matching are used in the [[#Reading Documents|read operations]] of the gDL:
 +
 +
* as a means to characterise relevant documents (''projections as types'');
 +
* as a means to specify what parts of relevant documents should be retrieved (''projections as retrieval directives'').
 +
 +
The constraints of a projection take accordingly two forms:
 +
 +
* '''include constraints''' apply to properties that must be matched ''and'' retrieved;
 +
* '''filter constraints''' apply to properties that must be matched but ''not'' retrieved.
 +
 +
As a simple example of the implications, a projection may define an include constraint over the name of metadata elements and a filter constraint over the time of their last update. It may then be used to:
 +
 +
* characterise documents with metadata elements that match both constraints;
 +
* retrieve of those documents only the name of matching metadata elements, excluding any other document property, including inner elements and their properties.
  
 
=== Simple Projections ===
 
=== Simple Projections ===

Revision as of 15:26, 9 February 2011

The gCube Document Library (gDL) is a client library for storing, updating, deleting and retrieving document description in a gCube infrastructure.

The gDL is a high-level component of the subsystem of gCube Information Services and it interacts with lower-level components of the subsystem to support document management processes within the infrastructure:

  • the gCube Document Model (gDM) defines the basic notion of document and the gCube Model Library (gML) implements that notion into objects;
  • the objects of the gML can be exchanged in the infrastructure as edge-labelled trees, and the Content Manager Library (CML) can model such trees as objects and dispatch them to the read and write operations of the Content Manager (CM) service;
  • the CM implements these operations by translating trees to and from the content models of diverse repository back-ends.

The gDL builds on the gML and the CML to implement a local interface of CRUD operations that lift those of the CM to the domain of documents, efficiently and effectively.

Preliminaries

The core functionality of the gDL lies in its operations to read and write documents. The operations trigger interactions with remote services and the movement of potentially large volumes of data across the infrastructure. This may have a non-trivial and combined impact on the responsiveness of clients and the overall load of the infrastructure. The operations have been designed to minimise this impact. In particular:

  • when reading, clients can qualify the documents that are relevant to their queries, and indeed what properties of relevant documents should be actually retrieved. These retrieval directives are captured in the gDL by the notion of document projections.
  • when reading and writing, clients can move large numbers of documents across the infrastructure. The gDL streams this I/O movements so as to make efficient use of local and remote resources. It then defines a facilities with which clients can conveniently consume input streams, produce output streams, and more generally filter one stream into an other regardless of their origin. The facilities are collected into the stream DSL, an embedded domain-specific language for stream processing.

Understanding document projections and the stream DSL is key to reading and writing documents effectively. We discuss these preliminary concepts first, and then consider their use as input and outputs of the operations of the gDL.

Projections

A projection is a set of constraints over the properties of documents in the gDM. It can be used to match documents, i.e. identify documents whose properties satisfy the constraints of the projection.
Projections and matching are used in the read operations of the gDL:

  • as a means to characterise relevant documents (projections as types);
  • as a means to specify what parts of relevant documents should be retrieved (projections as retrieval directives).

The constraints of a projection take accordingly two forms:

  • include constraints apply to properties that must be matched and retrieved;
  • filter constraints apply to properties that must be matched but not retrieved.

As a simple example of the implications, a projection may define an include constraint over the name of metadata elements and a filter constraint over the time of their last update. It may then be used to:

  • characterise documents with metadata elements that match both constraints;
  • retrieve of those documents only the name of matching metadata elements, excluding any other document property, including inner elements and their properties.

Simple Projections

Advanced Projections

Streams

Local and Remote Iterators

Stream Language

Pipes and Filters

Grouping and Unfolding

Operations

Reading Documents

Adding Documents

Updating Documents

Deleting Documents

Views

Transient Views

Persistent Views

Creating Views

Discovering Views

Using Views

Advanced Topics

Caches

Buffers