Difference between revisions of "GCube Document Library (2.0)"

From Gcube Wiki
Jump to: navigation, search
(Local and Remote Iterators)
Line 255: Line 255:
Stream processing raises significant opportunities for clients, as well as non-trivial challenges. In recognition of the difficulties, the gDL includes a set of general-purpose facilities for stream processing that simplify the tasks of converting, filtering, transforming, or otherwise processing streams. These facilities are available as an embedded, domain-specific language, the '''Stream DSL'''.
Stream processing raises significant opportunities for clients, as well as non-trivial challenges. In recognition of the difficulties, the gDL includes a set of general-purpose facilities for stream processing that simplify the tasks of converting, filtering, transforming, or otherwise processing streams. These facilities are available as an embedded, domain-specific language, the '''Stream DSL'''.
=== Local and Remote Iterators ===
=== Standard and Remote Iterators ===
As all the sentences of the Stream DSL take and return streams, we begin by looking look at how streams are represented in the language.
All the sentences of the Stream DSL take and return streams.
Streams have the interface of ''iterators'', i.e. yield elements on demand and typically  consumed within loops. There are two such interfaces:
Streams have the interface of ''iterators'', i.e. yield elements on demand and typically  consumed within loops. There are two such interfaces:
Line 264: Line 264:
* <code>RemoteIterator&lt;T&gt;</code>, a variation over <code>Iterator&lt;T&gt;</code> which makes explicit the remote origin of the stream.  
* <code>RemoteIterator&lt;T&gt;</code>, a variation over <code>Iterator&lt;T&gt;</code> which makes explicit the remote origin of the stream.  
In particular, a <code>RemoteIterator&lt;T&gt;</code> differs from a standard <code>Iterator&lt;T&gt;</code> in two respects:
In particular, a <code>RemoteIterator</code> differs from a standard <code>Iterator</code> in two respects:
** the method <code>next()</code> may throw a checked <code>Exception</code>. This witnesses to the fact that iterating over the stream involves fallible I/O operations;
** the method <code>next()</code> may throw a checked <code>Exception</code>. This witnesses to the fact that iterating over the stream involves fallible I/O operations;
** there is a method <code>locator()</code> that returns a reference to the remote stream as a plain <code>String</code> with implementation-specific syntax.
** there is a method <code>locator()</code> that returns a reference to the remote stream as a plain <code>String</code> with implementation-specific syntax.
Locators aside, the key difference between the two interfaces is in their assumptions about the possibility of failures during iteration. A standard <code>Iterator&lt;T&gt;</code> does not present failures to its clients other than for requests made past end of the stream (unchecked <code>NoSuchElementException</code>). This is  either because failures do not occur at all (e.g. the iteration is over an in-memory collection), or because they do occur but the iterators knows how handle them. In this sense, <code>Iterator&lt;T&gt;</code> assumes that all failure handling policies are responsibilities of its implementations. In contrast, a <code>RemoteIterator&lt;T&gt;</code> makes it clear that failures are likely to occur and that clients are expected to deal with them.  
Locators aside, the key difference between the two interfaces is in their assumptions about the possibility of failures during iteration. A standard <code>Iterator</code> does not present failures to its clients other than for requests made past end of the stream (an unchecked <code>NoSuchElementException</code>). This may be because failures do not occur at all, e.g. the iteration is over an in-memory collection; it may also be because failures can occur but the iterator knows how to handle them. In this sense, <code>Iterator&lt;T&gt;</code> may well be defined over external, even remote collections, but it assumes that all failure handling policies are responsibilities of its implementations. In contrast, <code>RemoteIterator&lt;T&gt;</code> makes it clear that failures are likely to occur and that clients are expected to deal with them.  
The operations of the gDL make use of both interfaces:
The operations of the gDL make use of both interfaces:
* when they ''take'' streams in input they expect them a standard <code>Iterator&lt;T&gt;</code>s;
* when they ''take'' streams in input they expect them a standard <code>Iterator</code>s;
* when they ''return'' streams in output the provide them as <code>RemoteIterator&lt;T&gt;</code>s.
* when they ''return'' streams in output the provide them as <code>RemoteIterator</code>s.
This choice emphasises two points:
This choice emphasises two points:
* streams that are provided by clients are of unknown origin, those provided by the library originate in remote services of the gCube Content Management infrastructure.  
* streams that are provided by clients are of unknown origin, those provided by the library originate in remote services of the gCube Content Management infrastructure.  
* all fault handling policies are in the hands of clients, where they should be. When they provide an <code>Iterator&lt;T&gt;</code> to the library, they will have embedded a fault handling policy in its implementation. When they receive a <code>RemoteIterator&lt;T&gt;</code> from the library, they will apply a fault handling policy at the point of stream consumption.
* all fault handling policies are in the hands of clients, where they should be. When they provide an <code>Iterator</code> to the library, they will have embedded a fault handling policy in its implementation. When they receive a <code>RemoteIterator</code> from the library, they will apply a fault handling policy at the point of stream consumption.
=== Stream Conversions and Fault Handlers ===
The sentences of the stream DSL begin with 'verbs', which can be statically imported from the <code>Streams</code> class:
<source lang="java5">
import static org.gcube.contentmanagement.gcubedocumentlibrary.streams.dsl.Streams.*;
The verb <code>convert</code> introduces the simplest sentences, those that convert between <code>Iterator</code>s and <code>RemoteIterator</code>s. The following example shows the conversion of an <code>Iterator</code> into a <code>RemoteIterator</code>:
<source lang="java5" highlight="4">
import static org.gcube.contentmanagement.gcubedocumentlibrary.streams.dsl.Streams.*;
Iterator<SomeType> it = ...
RemoteIterator<SomeType> rit = convert(it);
The result is a <code>RemoteIterator</code> that promises to return failures but never does. The implementation is just a wrapper around the standard <code>Iterator</code> which returns <code>it.toString()<code> as the locator of the underlying collection.
Converting a code>RemoteIterator</code> to an <code>Iterator</code> is more interesting because it requires the encapsulation of a fault handling policy. The following example shows the possibilities:
<source lang="java5" highlight="6,9,12,14">
import static org.gcube.contentmanagement.gcubedocumentlibrary.streams.dsl.Streams.*;
RemoteIterator<SomeType> rit = ...
//iterator will return any fault raised by the remote iterator
Iterator<SomeType> it1 = convert(rit).with(IGNORE_POLICY);
//iterator will stop at the first fault raised by the remote iterator
Iterator<SomeType> it2 = convert(rit).with(FAILFAST_POLICY);
//iterator will handle fault as specified by given policy
FaultPolicy policy = new FaultPolicy() {...};
Iterator<SomeType> it3 = convert(rit).with(policy);
In this example, the clause <code>with</code> introduces the fault handling policy to encapsulate in the resulting <code>Iterator</code>. Two common policies are predefined and can be named directly, as shown for <code>it1</code> and <code>it2</code> above:
* <code>IGNORE_POLICY</code>: any faults raised by the <code>RemoteIterator</code> are discarded by the resulting <code>Iterator<code>, which will ensure that <code>hasNext()>/code> and <code>next()</code> behave as if they had not occurred;
* <code>FAILFAST_POLICY</code>: the first fault raised by the <code>RemoteIterator</code> halts the resulting <code>Iterator</code>, which will ensure that <code>hasNext()>/code> and <code>next()</code> behave as if they stream had reached its natural end;
Custom policies can be defined by implementing the interface <code>FaultPolicy</code>:
<source lang="java5" highlight="3">
public interface FaultPolicy ... {
boolean onFault(Exception e, int count);
In <code>onFault()</code>, clients are passed the fault raised by the <code>RemoteIterator</code>, as well as the count of faults raised so far during the iteration (this will be greater than <code>1</code> only if the policy will have tolerated some previous faults during the iteration). Clients apply the policy and return <code>true</code> if the fault should be tolerated and the iteration continue, <code>false</code> if they instead wish the iteration to stop.  Here's an example of a fault handling policy that tolerates only the first error and uses two aliases for the boolean values to improve the legibility of the policy (<code>CONTINUE</code> and <code>STOP</code>, also defined in the <code>Streams</code> class and statically imported):
<source lang="java5">
import static org.gcube.contentmanagement.gcubedocumentlibrary.streams.dsl.Streams.*;
FaultPolicy policy = new FaultPolicy() {
      public boolean onFault(Exception e, int count) {
            if (count=1) {
                  ....dealing with fault ...
  return CONTINUE;
                  return STOP;
Finally, we note that the <code>IGNORE_POLICY</code> is considered the default policy and that clients can avoid naming it using the clause <code>withDefaults()</code>.
<source lang="java5" highlight="6">
import static org.gcube.contentmanagement.gcubedocumentlibrary.streams.dsl.Streams.*;
RemoteIterator<SomeType> rit = ...
The sentences of the Stream DSL can take streams under either one of two interfaces and return streams under the same or a different interface.  
//iterator will handle faults with the default policy: IGNORE_POLICY
Iterator<SomeType> it = convert(rit).withDefaults();
=== Pipes and Filters ===
=== Pipes and Filters ===

Revision as of 11:13, 14 February 2011

The gCube Document Library (gDL) is a client library for storing, updating, deleting and retrieving document description in a gCube infrastructure.

The gDL is a high-level component of the subsystem of gCube Information Services and it interacts with lower-level components of the subsystem to support document management processes within the infrastructure:

  • the gCube Document Model (gDM) defines the basic notion of document and the gCube Model Library (gML) implements that notion into objects;
  • the objects of the gML can be exchanged in the infrastructure as edge-labelled trees, and the Content Manager Library (CML) can model such trees as objects and dispatch them to the read and write operations of the Content Manager (CM) service;
  • the CM implements these operations by translating trees to and from the content models of diverse repository back-ends.

The gDL builds on the gML and the CML to implement a local interface of CRUD operations that lift those of the CM to the domain of documents, efficiently and effectively.


The core functionality of the gDL lies in its operations to read and write documents. The operations trigger interactions with remote services and the movement of potentially large volumes of data across the infrastructure. This may have a non-trivial and combined impact on the responsiveness of clients and the overall load of the infrastructure. The operations have been designed to minimise this impact. In particular:

  • when reading, clients can qualify the documents that are relevant to their queries, and indeed what properties of relevant documents should be actually retrieved. These retrieval directives are captured in the gDL by the notion of document projections.
  • when reading and writing, clients can move large numbers of documents across the infrastructure. The gDL streams this I/O movements so as to make efficient use of local and remote resources. It then defines a facilities with which clients can conveniently consume input streams, produce output streams, and more generally filter one stream into an other regardless of their origin. The facilities are collected into the stream DSL, an embedded domain-specific language for stream processing.

Understanding document projections and the stream DSL is key to reading and writing documents effectively. We discuss these preliminary concepts first, and then consider their use as input and outputs of the operations of the gDL.


A projection is a set of constraints over the properties of documents in the gDM. It can be used to match documents, i.e. identify documents whose properties satisfy the constraints of the projection.
Projections and matching are used in the read operations of the gDL:

  • as a means to characterise relevant documents (projections as types);
  • as a means to specify what parts of relevant documents should be retrieved (projections as retrieval directives).

The constraints of a projection take accordingly two forms:

  • include constraints apply to properties that must be matched and retrieved;
  • filter constraints apply to properties that must be matched but not retrieved.

note: in both cases, the constraints take the form of 'predicates' of the Content Manager Library] (CML). The projection itself converts into a complex predicate which is amenable for processing by the Content Manager service in the execution of retrieval operations. In this sense, projections are a key part of the document-oriented layer that the gDL defines over lower-level components of the service subsystem for content management.

As a simple example, a projection may define an include constraint over the name of metadata elements and a filter constraint over the time of their last update.
It may then be used to:

  • characterise documents with metadata elements that match both constraints;
  • retrieve of those documents only the name of matching metadata elements, excluding any other document property, including other inner elements and their properties.

All projections in the gDL have the Projection interface, which can be used in element-generic computations to access their constraints. To build projections, however, clients deal with one of the following implementation of the interface:

  • DocumentProjection
  • MetadataProjection
  • AnnotationProjection
  • PartProjection
  • AlternativeProjection

A further implementation of the interface:

  • PropertyProjection

allows clients to express constraints on the generic properties of any of the elements of the gDM.

Simple Projections

Clients create projections with the factory methods of the Projections companion class (a static import improves legibility and is recommended):

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;...
DocumentProjection dp = document();
MetadataProjection mp = metadata();
AnnotationProjection annp = annotation();
PartProjection pp = part();
AlternativeProjection altp = alteranative();

The projections above do not specify any include or filter constraints on the elements of the corresponding type. For example, dp matches all documents, regardless of their properties, inner elements, and properties of their inner elements. Similarly, mp matches all metadata elements of any document, regardless of their properties, and pp matches all the parts of any document, regardless of their properties. Thus the factory methods of the Projections class return empty projections.

Clients may add include constraints to a projection with the method with() declared by all projection classes. For document projections, for example:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
DocumentProjection dp = document().with(NAME);

With the above, the client adds the simplest form of constraint, an existence constraint that requires the target elements to have given properties, here the document to have name. Since this is an include constraint, the client is expressing an interest only in this property, regardless of the existence and values of other properties. Used as a parameter in the read operations of the gDL, this projection is translated into a directive to retrieve only the names of document(s) that have one.

note: properties are conveniently represented by constants in the Projections class. The constants are not strings, however, but dedicated Property objects that are specific to the type of projection. Trying to use properties that are undefined for the type of elements targeted by the projection is illegal and the error is detected statically.

Existence constraints may be expressed at once on multiple properties, e.g.:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
DocumentProjection dp = document().with(NAME,LANGUAGE,BYTESTREAM);

Besides inclusion constraints, clients may specify filter constraints with the method where() on projections, e.g:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
DocumentProjection dp = document().where(NAME,LANGUAGE);

Now, the client still requires documents to have a name and a language but he retains an interest in the other properties of matching documents. Used as a parameter in the read operations of the gDL, this projection is translated into a directive to retrieve all the properties of documents with a name.

Include and filter constraints can be combined, and the projections classes follow a builder pattern to add readability to the combinations. In particular, with() and where() return the very projection on which they are invoked. They may then be used as follows:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
DocumentProjection dp = document().with(NAME,SCHEMA_URI)

Here, the client requires documents to have a name and embed a bytestream that conforms to a schema, but he has an interest in processing only document names and schema URIs (e.g. for display purposes). Used as a parameter in the read operations of the gDL, this projection retrieves the requested information but avoids the transmission of bytestreams.

Optional Modifiers

Moving now beyond the simple existence of properties, another common requirement is to indicate the optionality of properties. Clients may wish to include certain properties, or equivalently filter by certain properties, if and only if these actually exists. In this case, clients can use the opt() of the Projections class as a constraint modifier, as this example illustrates:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
DocumentProjection dp = document().with(NAME,opt(SCHEMA_URI))

This projection differs from the previous one only because of the optionality constraint on the existence of a schema for the document's bytestream. Used as a parameter in the read operations of the gDL, this projection retrieves the name all documents that include a bytestream, but also their schema URI if they happen to have one.

A common use of optional modifier is with bytestream, which clients may wish either to find included in the document or else referred to with a URL:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
DocumentProjection dp = document().with(opt(BYTESTREAM),opt(URL));

Used as a parameter in the read operations of the gDL, this projection retrieves at most the bytestream and its URL for those documents that have both, only one of the two if the other is missing, and nothing at all if they are both missing.

note: The API allows optional modifiers in filter constraints too, but their application is rather pointless in this context (they will never elements from retrieval).

Deep Projections

In the examples above, we have considered existence constraints on simple element properties. The examples generalise easily to repeated structured properties, such as generic properties for all elements and inner element properties for documents.

Consider the following example:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
DocumentProjection dp = document().with(PART, opt(METADATA), PROPERTY);

Here the client adds three include constraints to the projection, all three for the existence of repeated properties. Documents that match this projection have at least one part, at least one generic property, and zero or more metadata elements. Used as a parameter in the read operations of the gDL, this projection retrieves all' the parts and all the generic properties of documents that have at least one of each, as well as all of their the metadata elements if they happen to have some.

Repeated properties such as generic properties and inner elements are also structured, i.e. have properties of their own. Clients that wish to constrain those properties too can use deep projections, i.e. embed within the projection of a given type one or more projections built for the structured properties of elements of that type. The following example illustrates the concept for metadata elements:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
MetadataProjection mp = meatadata().with(LANGUAGE).where(BYTESTREAM);
DocumentProjection dp = document().with(NAME, PART)

The first projection constraints the existence of language and bytestream for metadata elements. The second projection constraints the existence of name and parts for document, as well as the existence of metadata elements that match the constraints of the first projection. The usual implications of include constraints and filter constraints apply. Used as a parameter in the read operations of the gDL, this projection retrieves the name, parts, and metadata elements of documents that have a name, at least one part, and at least one metadata element that includes a bystream. For the metadata elements, in particular, it retrieves only the language property.

Note that optionality constraints apply to deep projections as well as they apply to flat projections, as the following example shows:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
MetadataProjection mp = meatadata().with(LANGUAGE).where(BYTESTREAM);
DocumentProjection dp = document().with(NAME, PART)

This projection differs from the previous one only because the existence of on metadata elements that match the inner projection is optional. Documents that have a name and at least one part match the outer projection even if the have no metadata elements that match the inner projection (or no metadata elements at all).

Projections over Generic Properties

Generic properties are repeated and structured properties common to all elements. As for other properties with these characteristics, clients may wish to build deep projections that constraints their inner properties. For this purpose, the class Projections includes a dedicated factory method property(), as well as as specialised methods to express constraints. The following example illustrates the approach:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
PropertyProjection pp = property().withKey("somekey").with(PROPERTY_TYPE);
DocumentProjection dp = document().with(NAME, PART)

Here, the client creates a document projection and embeds in it an inner projection that constrains its generic properties. The inner projection uses the method with() to add an include constraint for the existence of a type for the generic property, as usual. It also adds an include constraint to specify an exact value for the key of a generic property of interest. This relies on a method withKey() which is specific to projection over generic properties of elements. The reason for this specific construct is that, differently from other constrainable properties of elements, they key of a generic property serves as its identifier.

For the rest, property projections behave like other projections (e.g. can be used with optional modifiers). Used as a parameter in the read operations of the gDL, the projection above matches documents with a name, at least one part, and a property with key somekey and some type.

Advanced Projections

In more advanced forms of projections, clients may wish to specify constraints on properties other than mere existence. In these cases, they can use overloads of with() and where() that take as parameters Predicates that capture the desired constraints. As mentioned above, predicates are defined in the CML and gDL clients need to become acquainted with the range of available predicates and how to build them.

note: Deep projections already make use of this customisability. When clients embed a projection into another, they constrain the corresponding structured property with the predicate into which the inner projection translates.

Commonly, clients may wish to constrain the value of a property, as in the following example:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
import static org.gcube.contentmanagement.contentmanager.stubs.model.constraints.Constraints.*;import static org.gcube.contentmanagement.contentmanager.stubs.model.predicates.Predicates.*;...
DocumentProjection p = document().with(LANGUAGE,text(is("it"));

The client uses here the predicate text(is("it")) to constrain the language of documents to match the ISO639 code for the Italian language. As documented in the CML, the client builds the predicate with the static methods of the Predicates and Constraints classes, which he previously imports.

note: in building predicate expressions with the API of the CML, clients take responsibility for associating properties with predicates that are compatible with their type. In the example above, the language of an element is a textual property and thus only text()-based predicates can successfully match it. The gDL relinquishes the ability to ensure the correct construction of projections so as to allow clients to use the full expressiveness of the predicate language of the CML.

The type of constraints that can be expressed on properties is thus bound by the expressiveness of the predicate language of the CML. We include here another example to illustrate some of the possibilities:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
import static org.gcube.contentmanagement.contentmanager.stubs.model.constraints.Constraints.*;
import static org.gcube.contentmanagement.contentmanager.stubs.model.predicates.Predicates.*;
Calendar from = ...
Calendar to = ....
DocumentProjection p = document().with(URL,uri(matches("^ftp.*")));

This projection is matched by documents that have been created at some point in between two dates, and with a bytestream available at some ftp server. Used as a parameter in the read operations of the gDL, the projection would retrieve only the URL of (the bytestream of) matching documents.


In some of its operations, the gDL relies on streams to model, process, and transfer inputs and outputs of potentially large size. Streams may consist of document descriptions, document identifiers, document updates, and more generally the outcomes of operations that take in turn large-scale inputs. Streamed processing makes efficient use of both local and remote resources, from local memory to network bandwidth, promoting the overall responsiveness of clients and services through reduced latencies.

Clients that make use of these operations will need to route streams towards and across the operations of the gDL, converting across stream interfaces and application logic in the process. As a common example, a client may need to route a remote result set of document identifiers to the read operations of the gDL, process the descriptions of the returned documents so as to update some of their properties, then feed the modified document descriptions to the write operations of the gDL so as to update them within the system, and finally inspect the outcomes of the updates so as to report or otherwise handle the failures that may have occurred in the process.

Throughout the workflow, it is important that the client remains within the paradigm of streamed processing, avoiding the accumulation of data in memory in all cases but where strictly induced by processing requirements. Document identifiers will be streaming from the remote location of the original result set as documents descriptions will be flowing back from yet another remote location, updated document descriptions will be leaving towards the same remote location as failures will be steadily coming back for handling.

Stream processing raises significant opportunities for clients, as well as non-trivial challenges. In recognition of the difficulties, the gDL includes a set of general-purpose facilities for stream processing that simplify the tasks of converting, filtering, transforming, or otherwise processing streams. These facilities are available as an embedded, domain-specific language, the Stream DSL.

Standard and Remote Iterators

As all the sentences of the Stream DSL take and return streams, we begin by looking look at how streams are represented in the language.

Streams have the interface of iterators, i.e. yield elements on demand and typically consumed within loops. There are two such interfaces:

  • Iterator<T>, the standard Java interface for iterations.
  • RemoteIterator<T>, a variation over Iterator<T> which makes explicit the remote origin of the stream.

In particular, a RemoteIterator differs from a standard Iterator in two respects:

    • the method next() may throw a checked Exception. This witnesses to the fact that iterating over the stream involves fallible I/O operations;
    • there is a method locator() that returns a reference to the remote stream as a plain String with implementation-specific syntax.

Locators aside, the key difference between the two interfaces is in their assumptions about the possibility of failures during iteration. A standard Iterator does not present failures to its clients other than for requests made past end of the stream (an unchecked NoSuchElementException). This may be because failures do not occur at all, e.g. the iteration is over an in-memory collection; it may also be because failures can occur but the iterator knows how to handle them. In this sense, Iterator<T> may well be defined over external, even remote collections, but it assumes that all failure handling policies are responsibilities of its implementations. In contrast, RemoteIterator<T> makes it clear that failures are likely to occur and that clients are expected to deal with them.

The operations of the gDL make use of both interfaces:

  • when they take streams in input they expect them a standard Iterators;
  • when they return streams in output the provide them as RemoteIterators.

This choice emphasises two points:

  • streams that are provided by clients are of unknown origin, those provided by the library originate in remote services of the gCube Content Management infrastructure.
  • all fault handling policies are in the hands of clients, where they should be. When they provide an Iterator to the library, they will have embedded a fault handling policy in its implementation. When they receive a RemoteIterator from the library, they will apply a fault handling policy at the point of stream consumption.

Stream Conversions and Fault Handlers

The sentences of the stream DSL begin with 'verbs', which can be statically imported from the Streams class:

import static org.gcube.contentmanagement.gcubedocumentlibrary.streams.dsl.Streams.*;

The verb convert introduces the simplest sentences, those that convert between Iterators and RemoteIterators. The following example shows the conversion of an Iterator into a RemoteIterator:

import static org.gcube.contentmanagement.gcubedocumentlibrary.streams.dsl.Streams.*;
Iterator<SomeType> it = ...
RemoteIterator<SomeType> rit = convert(it);

The result is a RemoteIterator that promises to return failures but never does. The implementation is just a wrapper around the standard Iterator which returns it.toString()<code> as the locator of the underlying collection.

Converting a code>RemoteIterator to an Iterator is more interesting because it requires the encapsulation of a fault handling policy. The following example shows the possibilities:

import static org.gcube.contentmanagement.gcubedocumentlibrary.streams.dsl.Streams.*;
RemoteIterator<SomeType> rit = ...
//iterator will return any fault raised by the remote iterator
Iterator<SomeType> it1 = convert(rit).with(IGNORE_POLICY); 
//iterator will stop at the first fault raised by the remote iterator
Iterator<SomeType> it2 = convert(rit).with(FAILFAST_POLICY); 
//iterator will handle fault as specified by given policy
FaultPolicy policy = new FaultPolicy() {...}; 
Iterator<SomeType> it3 = convert(rit).with(policy);

In this example, the clause with introduces the fault handling policy to encapsulate in the resulting Iterator. Two common policies are predefined and can be named directly, as shown for it1 and it2 above:

  • IGNORE_POLICY: any faults raised by the RemoteIterator are discarded by the resulting Iterator<code>, which will ensure that <code>hasNext()>/code> and <code>next() behave as if they had not occurred;
  • FAILFAST_POLICY: the first fault raised by the RemoteIterator halts the resulting Iterator, which will ensure that hasNext()>/code> and <code>next() behave as if they stream had reached its natural end;

Custom policies can be defined by implementing the interface FaultPolicy:

public interface FaultPolicy ... {
	boolean onFault(Exception e, int count); 

In onFault(), clients are passed the fault raised by the RemoteIterator, as well as the count of faults raised so far during the iteration (this will be greater than 1 only if the policy will have tolerated some previous faults during the iteration). Clients apply the policy and return true if the fault should be tolerated and the iteration continue, false if they instead wish the iteration to stop. Here's an example of a fault handling policy that tolerates only the first error and uses two aliases for the boolean values to improve the legibility of the policy (CONTINUE and STOP, also defined in the Streams class and statically imported):

import static org.gcube.contentmanagement.gcubedocumentlibrary.streams.dsl.Streams.*;
FaultPolicy policy = new FaultPolicy() {
       public boolean onFault(Exception e, int count) {
             if (count=1) {
                   ....dealing with fault ...
		   return CONTINUE;
                  return STOP;	

Finally, we note that the IGNORE_POLICY is considered the default policy and that clients can avoid naming it using the clause withDefaults().

import static org.gcube.contentmanagement.gcubedocumentlibrary.streams.dsl.Streams.*;
RemoteIterator<SomeType> rit = ...
//iterator will handle faults with the default policy: IGNORE_POLICY
Iterator<SomeType> it = convert(rit).withDefaults();

Pipes and Filters

Grouping and Unfolding


Reading Documents

Adding Documents

Updating Documents

Deleting Documents


Transient Views

Persistent Views

Creating Views

Discovering Views

Using Views

Advanced Topics

