Content Manager: Content Model

From Gcube Wiki
Jump to: navigation, search

Architectural considerations aside, the most distinguished element in the design of the Content Manager is its content model. Rather than settle for a fixed set of document structures, the service adopts a generic structure that can act as a 'carrier' for an arbitrary number of concrete document models. In particular, the service deals with edge-labelled and node-attributed trees, the gDoc trees.

The expectation here is that producers (service plugins) and consumers (service clients) will convene on concrete document models and exchange gDoc trees with an agreed shape. The agreement may be bilateral or involve any number of parties, and it may apply to the entire document or to distinguished parts of it (e.g. document metadata, annotations, raw content packaging, etc). For maximum decoupling between consumers and producers, the agreement may reflect system-wide conventions and result in canonical tree forms.

gDoc Trees

A gDoc tree has the following properties:

  • its nodes may have an identifier and a number of uniquely named attributes;
  • its edges have a label;
  • its leaf nodes may have a value;
  • its root may identify the collection of the corresponding document.

In particular:

  • identifiers, attributes, and leaves have text values;
  • attribute names and labels may be qualified with a namespace.

The figure below uses a graphical representation to show an example of a gDoc tree.


A sample gDoc tree


gDoc trees serialise to XML documents for exchange over the network. In particular:


For example, the gDoc tree above serialises as:

A sample gDoc tree XML serialisation

note gDoc trees inherit constraints from their XML serialisation. In particular, the names of edges, the names of attributes, the values of attributes, and the values of leaves are regulated by the definition of the format.

gDoc API

The XML serialisation of gDoc trees is 'natural', in that it does not employ dedicated element structures for the representations nodes, edges, attributes, etc. This streamlines its manipulation with standard XMl technologies (e.g. XPath, XSLT, XQuery, DOM, SAX, etc.) and does not inhibit object binding technologies (e.g. JAXB, XStream, etc).

As a native option, however, the service defines a bespoke object model and API for gDoc trees which offer:

  • dedicated support for tree processing requirements associated with the use of the service;
  • transparencies and optimisations for tree storage, construction, deconstruction, and input/output.

While the model is available to service clients, it also forms the basis of the interface between the service and its plugins. For this reason, its main features are overviewed here while its client-oriented features are discussed later on.

As the figure below illustrates, the model is defined in org.gcube.contentmanagement.contentmanager.stubs.model.trees in terms of the following components:

  • Node: an abstract base for nodes with an identifier, a state, and a map of QName-ed attributes.
  • State: an inner enumeration of Node for node states.
  • Edge: A QName-ed edge to a target Node.
  • InnerNode: a Node with a list of outgoing Edges.
  • Leaf: a Node with a value.
  • gDoc: an InnerNode with a collection identifier.
  • Nodes: a collection of static utilities to generate Nodes and Edges.
  • Bindings: a collection of static utilities to serialise and deserialise Nodess to and from DOM trees and/or character streams.
  • NodeView: a base class for JAXB bindings to Nodes.
  • GDocView: a NodeViewM for JAXB bindings to GDoc nodes.


The ...model.tree package

The model API is illustrated by example in the rest of this Section. The full list of methods and their signatures can be found in the code documentation.

Building Trees

The first and obvious way to create gDoc trees is with the constructors of the concrete node classes (GDoc, InnerNode, Leaf). As a first example, the following code illustrates the creation of a tree with an attributed root and two leaf nodes:

GDoc doc = new GDoc("someid");
doc.setAttribute(new QName("x"), "1");
doc.setAttribute(new QName("someNS","y"), new Date().toString());
doc.collectionID("...");
 
Leaf leaf1 = new Leaf(null,"2"); //no identifier
Leaf leaf2 = new Leaf(null,"true");
 
Edge e1 = new Edge(new QName("a"),leaf1);
Edge e2 = new Edge(new QName("someNS","b"),leaf2);
 
doc.add(e1,e2);

While already more convenient than cross-language and format-oriented tree APIs (e.g. DOM), step-by-step construction is verbose, even in the case of small trees. For a first degree of improvement, the node classes offer rich suites of constructors and setter overloads that allow for more 'in-lined' tree constructions and absorb the creation of QNames:

GDoc doc2 = new GDoc("someid",
		new Edge("a", new Leaf(null,"2")),
		new Edge("someNS","b", new Leaf(null,"true")));
 
doc2.setAttribute("x", "1");
doc2.setAttribute("someNS","y", new Date().toString());
doc2.collectionID("somecollID");

For additional convenience, the Nodes class defines a large number of generators, i.e. factory methods that can be statically imported and then composed into an 'embedded expression language' for gDoc trees:

import static org.gcube.contentmanagement.contentmanager.stubs.model.trees.Nodes.*;
...
 
GDoc doc3 = attr(
		gdoc("somecollID","someid",
			e("a",2), 
			e("someNS","b",true)),
            a("x",1),a("someNS","y",new Date()));>

Here, gdoc, attr, e, a are examples of node, attribute, and edge generators. Besides allowing fully in-lined tree expressions without the use of the new operator, the generators offer QName creation transparencies and object-to-string conversion transparencies (cf. the int, boolean, and Date example above). The transparency of date conversions is particularly important here, as it ensures adherence to XML serialisation standards that are not natively adopted in Java (e.g. in the implementation of toString). See the code documentation for the full list of available generators, as well as for the additional examples that follow:

doc = gdoc();
doc = gdoc("someid");
doc = gdoc("collectionid","someid");
 
doc = gdoc("1",
              e("a", n("2",            //n() => inner node generator
              e("b",l("3",0)))));
 
doc = gdoc(    //no identifier
                e("a", attr(
                            n("2", 
                                e("b",l("3",0)),            //l()= explicit leaf generator for identity assignment
                                e("a",l("4",0))),
                          a("foo","0"))));
 
 
doc = attr(gdoc("1",
		e("a",l("2",5)),
		e("b",attr(
                             n("3",e("c",4)),
			  a("foo",0))),
		e("c",5)),
      a("x",0));
 
 
doc = attr(gdoc("1",
               e("a", n("2",
               e("b",n("$2")))),
                   e("a",n("a1",
                               e("c",n(
                                          e("d","..."),
                                           e("d",attr(                                       //l()= explicit leaf generator for attribute assignments
                                                        l("<xml>..</xml>"),
                                                     a("w",".."))))))),
               e("b",attr(
                            n("1:/2"),
                       a("w","...")))),
         a("x","http://org.acme:8080"),a("y","<a>...</a>"));

The literal construction of trees is particularly convenient in during testing, though it composes well with the programmatic construction in the development of production code:

Edge edge = ....
InnerNode node = ....;
 
attr( 
   node.add(e("before","..."), edge, e("after","..."))
), a("newattr","...");

note: the node classes override equals for equivalence-based comparisons, and hashCode for their correct use as keys within hash-based data structures, and toString for convenience of debugging.

Serialising and Deserialising Trees

The Bindings class offers static facilities to transform native models of gDoc trees into XML-based models. Two representations are supported natively, based on which other XML-based representation can be produced using standard platform facilities (e.g. TRAX):

  • Bindings.toElement(GDoc) converts native models of gDoc trees into equivalent DOM models.
  • Bindings.fromElement(Element) converts DOM models of gDoc trees into equivalent native models.
  • Bindings.toXML(GDoc, Writer, boolean?) converts the native model into XML document streams, optionally excluding document declarations.
  • Bindings.fromXML(Reader) converts XML document streams into gDoc trees.

note: DOM conversions of native models are implemented directly, as they are most commonly required for interactions with the Content Manager service. Stream conversions are instead derived from DOM conversions via TRAX, at an additional processing cost.

note: conversions from native models to XML-based models assign the conventional name http://gcube-system.org/namespaces/contentmanagement/gdoc:gdoc (cf. Bindings.GDOC_NS, and Bindings.GDOC_NAME constants) to the document element. Vice versa, conversion from XML-based representations to native models discard the name of the document element.

Here is a usage example, which shows that equivalence of native models is preserved under round-trip conversions.

import static org.gcube.contentmanagement.contentmanager.stubs.model.trees.Bindings.*;
...
GDoc doc = ....
 
//DOM conversion
GDOc doc2 = fromElement(toElement(doc));
assert doc.equals(doc2); //true!
 
//stream conversion
StringWriter w = new StringWriter();
toXML(doc,w);
GDOC doc3 = fromXML(w.toString());
assert doc.equals(doc3);   //true!

note: due to the treatment of root element names, equivalence of XML-based representations is not necessarily preserved after round-trip conversion. It is preserved only if the XML-based representations have been previously produced with the conversion routines.

note: in all the conversions above, null values in attribute and leaf values are serialised using a special constant (exposed programmatically as Node.NULL).

note: the conversions are also available at arbitrary inner nodes, not only roots (cf. Bindings.nodeToElement(Node, QName?), Bindings.nodeFromElement(Element),Bindings.nodeToXML(Node, Writer, QName), and Bindings.nodeFromXML(Reader).

Consuming Trees

The gDoc API offers simple means of procedural tree navigation. For declarative queries, clients can convert the model into an XML-based representation and leverage platform standards and popular offerings (e.g. XPath, XQuery, or XSLT implementations). If required, the gDoc API can then be reasserted on query outputs.

The Node class defines methods to expose the state common to all nodes of a gDoc tree:

  • id(): returns the identifier.
  • parent(): returns the parent.
  • ancestors(): returns the list of all nodes from the parent to the root.
  • ancestorsAndSelf(): behaves ancestors but the returned list includes and starts with the recipient node.
  • attributes(): returns a copy of the attributes, indexed by name.
  • attribute(QName): returns the value of an attribute with the given name (or fails).
  • hasAttribute(QName): checks for the existence of an attribute with a given name.


note: identifiers can only be set at node creation time. Attributes can be added, modified, and removed at any point (cf. Node.state(Node.State), setAttribute(QName,String), removeAttribute(QName)).

note: for convenience, all methods that take QNames are overloaded to accept local names as well as (namespace,local name) pairs.

note: invoking methods that take node types is simplified by statically importing the class constants defined in the Nodes class (cf. Nodes.N for InnerNode.class and Nodes.L for Leaf.class).


The Leaf class adds methods to read and set the value (cf. value(), value(String)).

The InnerNode class adds methods to navigate along edges and or identifiers:

  • children(): returns the list of children.
  • children(QName): returns the list of children under edges whose label matches a given label.
  • <T extends Node> children(class<T>): returns the list of children of a given node type.
  • <T extends Node> children(class<T>, QName): returns the list of children of a given node type under edges whose label matches a given label.
  • child(QName): returns the child under an edge whose label matches a given label (or fails if there are zero o more such children).
  • <T extends Node> child(Class<T>, QName):returns the child of a given node type under an edge whose label matches a given label (or fails if there are zero o more such children).
  • descendants(QName*): returns the list of descendants that can be reached following edges whose labels match a given label.
  • <T extends Node> descendants(Class<T>,QName*): returns the list of descendants of a given type that can be reached following edges whose labels match a given label.
  • edges(): returns the list of all the edges.
  • edges(QName): returns the list of edges whose label matches a given label.
  • edge(QName): returns the list of edges whose labels match a given label (o fails if there are zero or more such edges).
  • hasEdge(QName): checks for the existence of an edge whose label matches a given label.
  • labels(): returns the list of all edge labels.
  • labels(QName): returns the list of labels that match a given label.


note: edges can be added or more removed at any time (cf. add(Edge*), removeEdge(Edge*), removeEdge(QName)).

note: all matches on qualified names are based on arbitrary regular expressions, both on the namespace and the local name of the label.

note: as above, methods that take QNames have overloads that accepts local names and, where appropriate, overloads that accept (namespace,local name) pairs.

note: as above, invoking methods that take node types is simplified by statically importing the corresponding constants in Nodes (cf. Nodes.N, Nodes.L).


The GDoc class adds a method to read and set the collection identifier (cf. collectionID, collectionID(String)).

Finally, the Edge class exposes its label and target (cf. label(), target()).

The following example illustrates some of the supported idioms, do check the code documentation for detailed information about method signatures:

import static org.gcube.contentmanagement.contentmanager.stubs.model.trees.Nodes.*;
 
GDoc doc = attr(gdoc("1",
		e("a",l("2",5)),
		e("b",attr(
				n("3",e("c",4)),
			  a("foo",0))),
		e("c",5)),
           a("x",0));
 
//typed children
String val = doc.child(L,"a").value();
 
//typed descendant
String val2 = doc.descendant(N,"3").child(L,"c").value();
 
for (InnerNode node : doc.children(N))
	for (QName l : node.labels())
		//process label
 
for (Node d : doc.descendants("b","e"))
	for (Edge siblingEdge : d.parent().edges()) 
		if (siblingEdge.target()!=d)
		//process sibling of descendant

Binding Trees

Clients that expect gDoc trees of a given form may wish to bind them to objects. The API offers two classes to streamline JAXB object bindings in the package org.gcube.contentmanagement.contentmanager.stubs.model.views. In particular, it includes two base classes for node and document bindings to XML serialisations of gDoc trees:

  • NodeView is a base class for node bindings. The view binds and exposes the identifier of the node as well as the URL of the node, if one exists (cf. getID(), getURL()). Node URLs are discussed later.
  • GDocView extends NodeView as a base class for document bindings. The view binds and exposes the collection identifier of the root node (cf. getCollID()), in addition to what already bound and exposed via its superclass.

Clients can extend these classes and the corresponding bindings. The following example illustrates:

@XmlRootElement(name=Bindings.GDOC_NAME,namespace=Bindings.GDOC_NS)
class MyDocView extends GDocView {
	@XmlElement(namespace="http://acme.org") int i;
	@XmlElement(namespace="http://acme.org") MyDocComponent c;
 
 
class MyDocComponent extends MyNodeView {
	@XmlElement Date date;
}

MyDocView and MyDocComponent are toy examples of user-defined views over gDoc trees and tree nodes, and they should be familiar to JAXB users.

MyDocView extends GDocView and uses JAXB annotations to specify the qualified name of the document elements to which it will be bound. Here we have chosen a name that aligns with the serialisations produced by the Bindings class, as shown above, but different names may be specified if the binding target serialisations produced through different means.MyDocView then includes two fields in its own namespace, an integer field and a MyDocComponent field, both of which are bound to XML elements. MyDocComponent extends NodeView, specifies a single Date, and uses JAXB annotations to bind it to an XML element. In both classes, we have chosen simple JAXB annotations. For example, we have assumed that the gDoc trees that will come to binding have labels that match the field names. The full range of JAXB facilities is of course available to customise bindings to less aligned trees.

Suppose now MyDocView is to be bound to the gDoc tree below. Wee use the generators of the gDoc API to denote the tree, but this is just for convenience of exposition; the tree may have been generated through any suitable means.

GDoc doc = gdoc("collID","123",
                   e(NS,"i",3),
                   e(NS,"c", n("789",
                              e("d",l("1",new Date())),
                              e("b",l("2",15)))),
                   e(NS,"d",new Date()), 
                   e(NS,"b",n("456")));

Clearly, the tree contains a subset that matches the binding expectations of the classes above. As with all JAXB clients, the binding would require steps similar to the following:

JAXBContext context = JAXBContext.newInstance(MyDocView.class);
...
 
//assuming a DOM binding to the tree has already occurred (other JAXB inputs could have been used instead, e.g. character streams)
Element docElement = ....
 
//bind
MyDocView mv = (MyDocView) context.createUnmarshaller().unmarshal(docElement);
 
 
...mv.id()...
...mv.collID()...
...mv.url()...
...mv.i...
...mv.c...
...mv.id()...
...mv.url()...
...mv.c.d...
 
//serialiase (again to DOM)
Document dom = ....;
Marshaller m = context.createMarshaller();
m.marshal(mv,dom);

gDoc Predicates

The gDoc model is untyped, in that neither the topology of trees nor the values of their attributes or leaves are subjected to constraints (beside those dictated by the XML serialisation). Types are reintroduced later, under the view that they can be projected on gDoc trees at the point of consumption.

Type projections serve two main purposes in the context of the Content Manager:

  • to validate the content of gDoc trees.
The main use case for validation is at the point of content ingestion through the write operations of the Content Manager. In particular, a plugin may project a type on incoming gDoc trees, with a view to rejecting those that fail the projection.
  • to identify the data of interest within gDoc trees.
The main use case for content identification is at the point of content retrieval through the read operations of the Content Manager. Through the service, in particular, a client may ask plugins to return only the portion of the data that succeeds the projection, and to discard the rest. Content pruning results in minimal bandwidth consumption and delivers content to client in forms which are optimal for their own object bindings.

Accordingly, support for type projections requires:

  • a language of tree types with which clients and plugins can capture the required shape and content of gDoc trees.
  • the ability to project such types over gDoc trees with both validation and pruning semantics.

XML schema languages are natural candidates for the choice of tree types. However, the also introduce complexity - both conceptually and in terms of tooling - which is not required when working with the subset of XML that corresponds to the gDoc model. As importantly, schema languages are strongly associated with validation and there are no implementations that use them towards document pruning (or indeed content extraction).

Accordingly, the tree API includes a native language of tree types, the gDoc predicates, as well as support for projecting them over content for validation and pruning purposes. gDoc predicates, in particular, can be used to constrain:

  • the topology of gDoc trees, including the labels and cardinality of edges (e.g. the existence of at least one edge whose label matches a given label).
  • the values of leaves, so that they conform to the textual literal of a range of atomic types (e.g. numbers or boolean values) or simply verify some type-specific predicate.

note: Support for predicates on attributes is forthcoming.

Predicate API

gDoc predicates are defined in the packages org.gcube.contentmanagement.contentmanager.stubs.model.predicates and org.gcube.contentmanagement.contentmanager.stubs.model.constraints, the main components of which are the following:

  • Predicate: the interface of all node predicates, defines match(Node) and prune(Node) methods for validation-based and pruning-based projection semantics.
    • AnyPredicate: a Predicate that specifies no constraints on nodes, i.e. matches any node and prunes nothing from it.
    • TreePredicate: a Predicate that specifies a list of EdgePredicates on inner nodes.
    • LeafPredicate: a Predicate that specifies a Constraints on the value of leaf nodes.
      • Bool: an LeafPredicate that specifies a boolean Constraint on the value of leaf nodes.
      • Num: an LeafPredicate that specifies a numeric Constraint on the value of leaf nodes.
      • Text: an LeafPredicate that specifies a textual Constraint on the value of leaf nodes.
      • Date: an LeafPredicate that specifies a date Constraint on the value of leaf nodes.
      • URI: an LeafPredicate that specifies a URI Constraint on the value of leaf nodes.
      • ID: a LeafPredicate on the identifier of an inner node .
  • EdgePredicate: a predicate that specifies a node Predicate on the targets of edges whose labels match a given label.
      • One: an EdgePredicate that asserts the existence of exactly one edge whose label matches a given label and whose target matches a given predicate.
      • Opt: an EdgePredicate that asserts the existence of zero or one edges whose labels match a given label and whose targets match a given predicate.
      • AtLeast: an EdgePredicate that asserts the existence of one or more edges whose labels match a given label and whose targets match a given predicate.
      • Many: an EdgePredicate that asserts the existence of zero or more edges whose labels match a given label and whose targets match a given predicate.
      • Only: an EdgePredicate that asserts that all the edges whose labels match a given label match also a given predicate.
  • Predicates: factory methods for an expression language of tree predicates.
  • Constraint: the interface of all constraints over values of leaf nodes.
    • Same: the Constraint that is satisfied by values that are equivalent to a given value.
    • Match: the Constraint that is satisfied by values that match a given regular expression.
    • More: the Constraint that is satisfied by values that are numbers strictly greater than a given number.
    • Less: the Constraint that is satisfied by values that are number strictly smaller than a given number.
    • Before: the Constraint that is satisfied by values that are earlier dates than a given date.
    • After: the Constraint that is satisfied by values that are later dates than a given date.
    • Not: the Constraint that is satisfied by values that do not satisfy another Constraint.
    • Either: the Constraint that is satisfied by values that satisfy at least one of a number of other Constraints.
    • All: the Constraint that is satisfied by values that satisfy a number of other Constraints.
  • Constraints: factory methods for an expression language of Constraints.


Predicate and Constraints packages

Building Predicates

Similarly to gDoc trees, gDoc predicates may be built with classic constructor-based idioms and/or else with predicate generators, a collection of factory methods in the Predicates class and Constraints classes which can be statically imported and then composed into a pseudo expression language for gDoc predicates. We concentrate here on predicate generators as the preferred way to build gDoc predicates. See the code documentation for the constructors available in predicate and constraint classes.

Consider this first example:

import static org.gcube.contentmanagement.contentmanager.stubs.model.constraints.Constraints.*;
import static org.gcube.contentmanagement.contentmanager.stubs.model.predicates.Predicates.*;
 
Predicate p = tree(
               one("a",
                   num(more(6))));

Here, tree() generates a TreePredicate which characterises trees which satisfy a single EdgePredicate. This latter predicate requires that the trees have exactly one outgoing edge with label a and with a leaf target. This leaf must in turn satisfy a Num predicate, i.e. its text value must represent a number and this number must satisfy a More constraint, which requires it to be greater than 6. In summary, we are characterising trees with a single a-edge that ends in a leaf with a number greater than 6.

The following example showcases a range of other predicates and constraints:

import static org.gcube.contentmanagement.contentmanager.stubs.model.constraints.Constraints.*;
import static org.gcube.contentmanagement.contentmanager.stubs.model.predicates.Predicates.*;
 
Predicate p = tree(
                one("a",any()),
                one("b",text(either(is("abc"),is("efg")))),
                atleast("c",bool(is(true))),
                opt("d",tree()),
                many("e",date(future())),
                one("f", uri(matches("^http.*"))),
                many("g", num(all(less(5),more(10)))),
                one("h", text()),
                one("j",text(not(is("somestring")))),
                one("k",id("12345",tree())),
                only("l", num()));

Here, the predicate characterises trees with:

  • a single a-edge that ends in any type of node, inner node or leaf. (any() is a generator of AnyPredicates);
  • a single b-edge that ends in a leaf whose value is either one of two strings;
  • one or more c-edges that end in leaves with a boolean value of true;
  • zero or one d-edges that end in inner nodes;
  • zero or more e-edges that end in leaves whose values are dates in the future;
  • a single f-edge that ends in a leaf whose value is an absolute http URI;
  • zero or more g-edges that end in leaves whose values are numbers between 5 and 10;
  • a single h-edge that ends in a leaf, not characterised further;
  • a single j-edge that ends in a leaf whose value differs from a given string;
  • a single k-edge that ends in an an inner node with an identifier of 12345;
  • zero or more l-edge that all end in leaves with numeric values;

note: predicates can nest recursively to match the structure of trees.

note: the Edge-predicates above use plain strings to match edge labels, but they may more generally use qualified names with regular expressions on both namespace and local parts. For example, the following predicate:

Predicate p = tree(
                 atleast("^part.*",tree(
                    one(".*acme.org$",".*",num()))));
characterises trees that have one or more edges whose labels begin with part and whose targets contain edges with no more than one edge with in a namespace that ends with acme.org and with a numeric value.

For a full list of available predicate and constraint generators, see the code documentation of the Predicates and Constraints classes.

Matching and Pruning

A gDoc predicate can be projected over a gDoc tree using the methods match() and prune() common to all Predicates. The first indicates whether tree satisfies the predicate, the second prunes it of all the nodes that are not matched by the predicate.

Consider this simple example:

import static org.gcube.contentmanagement.contentmanager.stubs.model.constraints.Constraints.*;
import static org.gcube.contentmanagement.contentmanager.stubs.model.predicates.Predicates.*;
 
Date d = new Date();
 
GDoc doc = gdoc(
         e("a",-1),
         e("a",1),
         e("a",2),
         e("b","..."),
         e("b",n(
             e("b1","..."))),
             e("c",n(
                  e("c1",d),
                  e("c2","..."))),
          e("d","..."));
 
Predicate p = tree(
          many("a",num(more(0))),
          atleast("b",tree()),
          one("c", tree(
                          one("c1",date()))));
 
assert p.matches(doc)==true;
 
GDoc pruned = gdoc(
              e("a",1),
              e("a",2),
              e("b",n(
                 e("b1","..."))),
              e("c",n(
                   e("c1",d))));
 
assert pruned.equals(p.prune(doc));

Here, it is easy to see that the gDoc tree satisfies the predicate. Accordingly, match() returns true and prune() successfully reduces the tree to include only the paths from the root which are directly described by the predicate (i.e. a tree equivalent to pruned in the example). If the tree had had a second c-edge, for example, match would have returned false and prune() would have failed with an exception. Note that, If the tree had had no b-edges, or if its b-edges had all ended in leaves, or in fact under any of a number of alternative assumptions, the outcome would have been equally negative.

note: match() does never fail, as a gDoc tree either satisfies the predicate or it does not. In contrast, prune fails whenever the tree does not match the predicate. In other words, prune subsumes match and reacts with a failure to mismatches.

note: Edge predicates are applied in order and each predicate can match any edge that has not been matched by previous predicates.

In the projection above, all the constraints applied to the data of interest, i.e. parts of the trees that were not to be pruned. Often, however, the requirement is to characterise parts of tree while retaining others. For this, some predicates at the edges can be marked as conditions, as shown in the following example:

Date d = new Date();
 
GDoc doc = gdoc(
         e("a",-1),
         e("a",1),
         e("a",2),
         e("b","..."),
         e("b",n(
             e("b1","..."))),
             e("c",n(
                  e("c1",d),
                  e("c2","..."))),
          e("d","..."));
 
Predicate p = tree(
          many("a",num(more(0))),
          cond(atleast("b",tree())),          one("c", tree(
                          one("c1",date()))));
 
GDoc pruned = gdoc(
              e("a",1),
              e("a",2),
              e("c",n(
                   e("c1",d))));
 
assert pruned.equals(p.prune(doc));

Here the predicate includes a condition on b-edges. At prune time, the condition is used to match the tree but it does not imply that matching edges ought to be preserved.

note: condition predicates do not alter the semantics of match(), only prune().


Another common requirement for pruning projections is to preserve all the children of a given node, as long as some children satisfy some constraints. As an example, consider the following document:

GDoc doc = gdoc(
             e("a",n(
                 e("b",-1),
                 e("c","..."),
                 e("d","..."),
                 e("e","..."))),
              e("a",n(
                 e("b","notanumber"),
                 e("c","..."),
                 e("d","..."),
                  e("e","..."))));

One may wish to prune this document so as to retain the a nodes whose b child contains a number. One can then use regular expressions and AnyType to preserve all the children of a nodes that do not need to be explicitly characterised, as shown below:

p = tree(many("a",
           tree(
             one("b",num()),tail()))); 
 
GDoc pruned = gdoc(
             e("a",n(
                 e("b",-1),
                 e("c","..."),
                 e("d","..."),
                 e("e","..."))));
 
p.prune(doc);
 
assertEquals(pruned, doc);

Here, Predicates.tail() is a factory method for the common Edge predicate many(".*",any()), which achieves the desired result.

Finally, note that cond() and tail can be combined to discard all the children of a given node. The following example illustrates:

p = tree(many("a",
   tree(
      cond(tail()))); 
GDoc pruned = gdoc(e("a",n()));
 
p.prune(doc);
 
assertEquals(pruned, doc);

note: The factory method Predicates.cut() is available as a shortcut for cond(tail()).

Serialising and Deserialising Predicates

The predicate and constraint classes are ready for JAXB bindings to XML (i.e. contain appropriate JAXB annotations). In addition, the Predicates class encapsulates a JAXB context and exposes javax.xml.bind.Marshallers and javax.xml.bind.Unmarshallers ready for client use (cf. getMarshaller(), getUnmarshaller).

For example, a client that needs works with character streams may operate as follows:

import static org.gcube.contentmanagement.contentmanager.stubs.model.predicates.Predicates.*;
 
//serialise predicate
Predicate p1 = ...;
Writer w = ...
getMarshaller().marshal(p,w);
 
//deserialise predicate
Reader r =...
Predicate  p2 = (Predicate) getUnmarshaller().unmarshal(r);

note: clients who use the Content Management libraries do not explicitly need to worry about conversion to and from DOM representations of predicates. The libraries perform conversions on their behalf.

The following is an example of predicate serialisation (namespaces are omitted for simplicity):


Sample Predicate XML Representation


The full schemas of gDoc tree predicates and constraints is available here.