Difference between revisions of "The Tree Library"
m (moved The Trees Library to The Tree Library) |
(→XML Bindings) |
||
(11 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | The '''Tree Library''' implements a model of ''edge-labelled trees'', and related tools. It is a shared dependency for the components of a subsystem that provides data access under this model, including the Tree Manager services, the [[The Tree Manager Framework|Tree Manager Framework]], the [[The Tree Manager Library|Tree Manager Library]] and its clients. | ||
− | + | The library is available in our Maven repositories with the following coordinates: | |
+ | |||
+ | <source lang="java5"> | ||
+ | |||
+ | <groupId>org.gcube.data.access</groupId> | ||
+ | <artifactId>trees</artifactId> | ||
+ | <version>...</version> | ||
+ | |||
+ | </source> | ||
+ | |||
+ | Source code and documentation are also available in the repositories as secondary artefacts with classifiers <code>sources</code> and <code>javadoc</code>, respectively: | ||
+ | |||
+ | <source lang="java5"> | ||
+ | |||
+ | <groupId>org.gcube.data.access</groupId> | ||
+ | <artifactId>trees</artifactId> | ||
+ | <classifier>sources</code> | ||
+ | <version>...</version> | ||
+ | |||
+ | <groupId>org.gcube.data.access</groupId> | ||
+ | <artifactId>trees</artifactId> | ||
+ | <classifier>javadoc</code> | ||
+ | <version>...</version> | ||
+ | |||
+ | </source> | ||
+ | |||
+ | |||
+ | = Trees = | ||
+ | |||
+ | The library implements a model of ''node-attributed'', ''leaf-valued'' , ''edge-labelled'', and ''directed trees''. In particular: | ||
+ | |||
+ | * nodes may have an identifier and a number of uniquely named attributes; | ||
+ | * edges have a label; | ||
+ | * leaves may have a value; | ||
+ | * roots may identify the source of origin. | ||
+ | |||
+ | Moreover: | ||
+ | |||
+ | * identifiers, attributes, and leaves have text values; | ||
+ | * attribute names and labels may be namespaced. | ||
+ | |||
+ | |||
+ | [[Image:tree-model.png|center]] | ||
+ | |||
+ | |||
+ | The model is implemented in code by the following classes: | ||
+ | |||
+ | [[Image:tree-class-hierarchy.png|center]] | ||
+ | |||
+ | Nodes are either <code>InnerNode</code>s or <code>Leaf</code>s. <code>InnerNode</code>s have zero or more <code>Edge</code>s, each of which has a label and target <code>Node</code>. Attributes and identifiers are common to all <code>Node</code>s, while only <code>Leaf</code>s have values. <code>Tree</code>s are <code>InnerNode</code>s with source identifiers, and serve as tree roots. | ||
+ | |||
+ | Instances of ''model classes'' are mutable, in that we can: | ||
+ | |||
+ | * add and remove attributes and edges; | ||
+ | * change the value of attributes and leaves; | ||
+ | * change source identifiers; | ||
+ | |||
+ | Note however that node identifiers and edge labels are assigned at creation time and cannot be changed thereafter. | ||
+ | |||
+ | We can also: | ||
+ | |||
+ | * compare instances for equivalence (instances override <code>equals()</code>); | ||
+ | * use them as keys in hash-based structures (instances override <code>hashcode()</code>); | ||
+ | * publish their state for debugging purposes (instances override <code>toString()</code>). | ||
+ | |||
+ | ==Building Trees== | ||
+ | |||
+ | We can construct trees in a standard object-oriented fashion, using any of the overloaded constructors common to all model classes (including ''copy constructors''), e.g: | ||
+ | |||
+ | <source lang="java5"> | ||
+ | |||
+ | Tree tree = new Tree("rootId"); | ||
+ | tree.setAttribute(new QName("x"),"..."); | ||
+ | tree.setAttribute(new QName("http://foo.org","y"), "..."); | ||
+ | tree.setSourceID("sourceId"); | ||
+ | Leaf leaf1 = new Leaf("id1"); | ||
+ | leaf1.value("..."); | ||
+ | Leaf leaf2 = new Leaf("..."); | ||
+ | leaf2.value("true"); | ||
+ | Edge e1 = new Edge(new QName("a"),leaf1); //unqualified | ||
+ | Edge e2 = new Edge(new QName("http://bar.org","b"),leaf2); | ||
+ | tree.add(e1,e2) | ||
+ | </source> | ||
+ | |||
+ | Overloaded constructors simplify object creation, e.g.: | ||
+ | |||
+ | <source lang="java5"> | ||
+ | Tree tree = new Tree("rootId", | ||
+ | new Edge("a", new Leaf("id1","...")), | ||
+ | new Edge("http://bar.org","b", new Leaf("id2","..."))); | ||
+ | |||
+ | tree.setAttribute("x","..."); | ||
+ | tree.setAttribute("http://foo.org","y","..."); | ||
+ | tree.setSourceID("sourceId"); | ||
+ | |||
+ | </source> | ||
+ | |||
+ | We find details on individual constructors and mutators in the in Javadoc documentation. | ||
+ | |||
+ | Alternatively, we can adopt a more functional style, importing the static builder methods of the <code>Nodes</code> class and composing them in inlined tree expressions, e.g: | ||
+ | |||
+ | <source lang="java5"> | ||
+ | |||
+ | import static org.gcube.data.trees.data.Nodes.*; | ||
+ | … | ||
+ | Tree tree = attr(t("sourceId","rootId", | ||
+ | e("a","..."), | ||
+ | e(q("http://bar.org","b"),"...")), | ||
+ | a("x",1),a(q("http://foo.org","y"),"...")); | ||
+ | |||
+ | </source> | ||
+ | |||
+ | The functional style is far less verbose, and is particularly suited for testing. <code>t()</code>, <code>a()</code>, <code>e()</code>, and <code>q()</code> are examples of <code>Node</code>, attribute, <code>Edge</code> and <code>QName</code> builders, respectively. We use them to avoid the <code>new</code> operator for those types. They also perform object-to-string conversions. | ||
+ | |||
+ | We show some further sample expressions below, while details on all the available builders are in the Javadoc documentation of the <code>Nodes</code> class: | ||
+ | |||
+ | <source lang="java5"> | ||
+ | |||
+ | //child-less root: t() is a tree builder | ||
+ | Tree tree = t(); | ||
+ | |||
+ | //with identifier | ||
+ | tree = t("id1"); | ||
+ | |||
+ | //with identifier and provenance | ||
+ | tree = t("source","id1"); | ||
+ | |||
+ | //with a simple child: e() is an edge builder | ||
+ | tree = t(e("a",1)); | ||
+ | |||
+ | //with a qualified edge label: q() is a QName builder | ||
+ | tree = t(e("http://acme.org","a",1)); | ||
+ | |||
+ | //with a second, identified simple child: l() is a leaf builder | ||
+ | tree = t( | ||
+ | e("a",1), | ||
+ | e("b",l("id2","foo"))); | ||
+ | |||
+ | //with a complex child: n() is an inner node builder | ||
+ | tree = t( | ||
+ | e("a", | ||
+ | n( | ||
+ | e("b",1)))); | ||
+ | |||
+ | //with attributes: a() is an attribute builder, attr() is attribute-decorator for nodes | ||
+ | tree = attr( t(....), | ||
+ | a("a1","3"), | ||
+ | a(q("http://acme.org","a2"),"foo")); | ||
+ | |||
+ | //with an attributed child | ||
+ | tree = t( | ||
+ | e("a", | ||
+ | attr(l("foo"),a("b1","3"),a("b2","bar")))); | ||
+ | |||
+ | </source> | ||
+ | |||
+ | We can of course mix object-oriented and functional style. | ||
+ | |||
+ | ==Inspecting and Traversing Trees== | ||
+ | |||
+ | Given any <code>Node</code>, we can: | ||
+ | |||
+ | * inspect its identifier (<code>id()</code>); | ||
+ | * inspect its attributes( <code>attributes()</code>, <code>attribute(QName)</code>, <code>hasAtttribute(QName)</code>); | ||
+ | * inspect its ancestors (<code>ancestors()</code>, <code>ancestorAndSelf()</code>) | ||
+ | * move to its parent <code>Node</code>, if any exists ( <code>parent()</code>); | ||
+ | |||
+ | '''note''': for convenience, all methods that take <code>QName</code>s are overloaded to accept unqualified names, as well as (namespace,local name) pairs. This also applies to <code>QName</code>-based methods discussed below. | ||
+ | |||
+ | Given an <code>InnerNode</code>s we can also: | ||
+ | |||
+ | * inspect its children (<code>children()</code>,<code>children(QName)</code>, <code>children(Class<? extends Node>), <code>children(Class<? extends Node,QName>)</code>); | ||
+ | * move to its children (<code>child(QName)</code>, <code>child(Class<? extends Node>,QName)</code>); | ||
+ | * inspects its descendants along paths of edge labels (<code>descendants()</code>, <code>descendants(QName...)</code>, <code>descendants(Class<? extends Node>,QName...)</code>); | ||
+ | * move to its descendants along paths of node identifiers (<code>descendant(String...)</code>, <code>descendant(Class<? extends Node>, String)</code>); | ||
+ | * inspect its edges (<code>edges()</code>, <code>edges(QName)</code>); | ||
+ | * move along its edges (<code>edge(QName)</code>, <code>hasEdge(QName)</code>); | ||
+ | * inspect the labels of its edges (<code>labels()</code>, <code>labels</code>). | ||
+ | |||
+ | '''note''': methods that return non-<code>List</code> values fail with an <code>IllegalStateException</code> whenever the input is ambiguous, e.g. <code>child(QName)</code> fails if there is more than one edge with the given label; | ||
+ | |||
+ | '''note''': methods that take <code>Class<? extends Node></code> parameters filter the output to contain only instances of the input class, i.e. only <code>InnerNode</code>s or <code>Leaf</code>s). For fluency, the constants <code>Nodes#N</code> and <code>Nodes#L</code> can be used in lieu of, respectively, <code>InnerNode.class</code>, <code>Leaf.class</code>, e.g.: | ||
+ | |||
+ | <source lang="java5"> | ||
+ | |||
+ | import static org.gcube.data.trees.data.Nodes.*; | ||
+ | |||
+ | ... | ||
+ | |||
+ | Node n = ... | ||
+ | List<InnerNode> complexChildren = n.children(N); | ||
+ | Leaf leaf = n.child(N,"a").child(N,"b").child(L,"c"); | ||
+ | |||
+ | </source> | ||
+ | |||
+ | '''note''': all methods that take <code>QName</code>s accept regular expressions, both on the namespace and the local name of the label, e.g.: | ||
+ | |||
+ | <source lang="java5"> | ||
+ | |||
+ | import static org.gcube.data.trees.data.Nodes.*; | ||
+ | |||
+ | ... | ||
+ | |||
+ | Node n = ... | ||
+ | List<Node> bChildren = n.children("b.*"); | ||
+ | |||
+ | </source> | ||
+ | |||
+ | Finally, we can: | ||
+ | |||
+ | * inspect the value of a <code>Leaf</code> (<code>value()</code>); | ||
+ | * inspect the provenance of a <code>Tree</code> (<code>sourceId()</code>); | ||
+ | * inspect that label of an <code>Edge</code> (<code>label()</code>); | ||
+ | * move to the target <code>Node</code> of an <code>Edge</code> (<code>target()</code>). | ||
+ | |||
+ | Some further examples: | ||
+ | |||
+ | <source lang="java5"> | ||
+ | |||
+ | import static org.gcube.data.trees.data.Nodes.*; | ||
+ | |||
+ | Tree tree = attr(t("1", | ||
+ | e("a",l("2",5)), | ||
+ | e("b",attr( | ||
+ | n("3",e("c",4)), | ||
+ | a("foo",0))), | ||
+ | e("c",5)), | ||
+ | a("x",0)); | ||
+ | |||
+ | //typed children | ||
+ | String val = doc.child(L,"a").value(); | ||
+ | |||
+ | //typed descendant | ||
+ | String val2 = tree.descendant(N,"3").child(L,"c").value(); | ||
+ | |||
+ | for (InnerNode node : tree.children(N)) | ||
+ | for (QName l : node.labels()) | ||
+ | //process label | ||
+ | |||
+ | for (Node n : tree.descendants("b","e")) | ||
+ | for (Edge siblingEdge : n.parent().edges()) | ||
+ | if (siblingEdge.target()!=n) | ||
+ | //process sibling of descendant | ||
+ | |||
+ | </source> | ||
+ | |||
+ | ==XML Bindings== | ||
+ | |||
+ | Trees have straightforward XML representations. Nodes become elements and are named after incoming edges, attributes become element attributes. | ||
+ | Moreover: | ||
+ | |||
+ | * the root element is arbitrarily named; | ||
+ | * the content of <code>Leaf</code> elements are <code>Leaf</code> values; | ||
+ | * the content of <code>InnerNode</code> elements is the representation of <code>InnerNode</code>'s children; | ||
+ | * node identifiers become attributes called <code>http://gcube-system.org/namespaces/data/trees:id</code>; | ||
+ | * source identifiers become attributes called <code>http://gcube-system.org/namespaces/data/trees:sourceId</code>. | ||
+ | |||
+ | Thus one the sample trees discussed above: | ||
+ | |||
+ | <source lang="java5"> | ||
+ | |||
+ | attr(t("sourceId","rootId", | ||
+ | e("a","..."), | ||
+ | e(q("http://bar.org","b"),"...")), | ||
+ | a("x",1),a(q("http://foo.org","y"),"...")); | ||
+ | |||
+ | </source> | ||
+ | |||
+ | can be represented in XML as follows: | ||
+ | |||
+ | <source lang="xml"> | ||
+ | |||
+ | <t:root xmlns:t="http://gcube-system.org/namespaces/data/trees" | ||
+ | xmlns:foo="http://foo.org" | ||
+ | t:id="rootId" t:sourceId="sourceId" | ||
+ | x="1" foo:y="..."> | ||
+ | <a>...</a> | ||
+ | <bar:b xmlns:bar="http://bar.org">...</bar:b> | ||
+ | </t:root> | ||
+ | |||
+ | </source> | ||
+ | |||
+ | Note that there is no explicit trace in the representation of node or edge concepts. In this sense, it is a "natural" representation, rather than a "serialisation". This means that we can easily parse and manipulate trees with any XML parsing tool. | ||
+ | |||
+ | =Patterns= |
Latest revision as of 11:22, 22 December 2012
The Tree Library implements a model of edge-labelled trees, and related tools. It is a shared dependency for the components of a subsystem that provides data access under this model, including the Tree Manager services, the Tree Manager Framework, the Tree Manager Library and its clients.
The library is available in our Maven repositories with the following coordinates:
<groupId>org.gcube.data.access</groupId> <artifactId>trees</artifactId> <version>...</version>
Source code and documentation are also available in the repositories as secondary artefacts with classifiers sources
and javadoc
, respectively:
<groupId>org.gcube.data.access</groupId> <artifactId>trees</artifactId> <classifier>sources</code> <version>...</version> <groupId>org.gcube.data.access</groupId> <artifactId>trees</artifactId> <classifier>javadoc</code> <version>...</version>
Trees
The library implements a model of node-attributed, leaf-valued , edge-labelled, and directed trees. In particular:
- nodes may have an identifier and a number of uniquely named attributes;
- edges have a label;
- leaves may have a value;
- roots may identify the source of origin.
Moreover:
- identifiers, attributes, and leaves have text values;
- attribute names and labels may be namespaced.
The model is implemented in code by the following classes:
Nodes are either InnerNode
s or Leaf
s. InnerNode
s have zero or more Edge
s, each of which has a label and target Node
. Attributes and identifiers are common to all Node
s, while only Leaf
s have values. Tree
s are InnerNode
s with source identifiers, and serve as tree roots.
Instances of model classes are mutable, in that we can:
- add and remove attributes and edges;
- change the value of attributes and leaves;
- change source identifiers;
Note however that node identifiers and edge labels are assigned at creation time and cannot be changed thereafter.
We can also:
- compare instances for equivalence (instances override
equals()
); - use them as keys in hash-based structures (instances override
hashcode()
); - publish their state for debugging purposes (instances override
toString()
).
Building Trees
We can construct trees in a standard object-oriented fashion, using any of the overloaded constructors common to all model classes (including copy constructors), e.g:
Tree tree = new Tree("rootId"); tree.setAttribute(new QName("x"),"..."); tree.setAttribute(new QName("http://foo.org","y"), "..."); tree.setSourceID("sourceId"); Leaf leaf1 = new Leaf("id1"); leaf1.value("..."); Leaf leaf2 = new Leaf("..."); leaf2.value("true"); Edge e1 = new Edge(new QName("a"),leaf1); //unqualified Edge e2 = new Edge(new QName("http://bar.org","b"),leaf2); tree.add(e1,e2)
Overloaded constructors simplify object creation, e.g.:
Tree tree = new Tree("rootId", new Edge("a", new Leaf("id1","...")), new Edge("http://bar.org","b", new Leaf("id2","..."))); tree.setAttribute("x","..."); tree.setAttribute("http://foo.org","y","..."); tree.setSourceID("sourceId");
We find details on individual constructors and mutators in the in Javadoc documentation.
Alternatively, we can adopt a more functional style, importing the static builder methods of the Nodes
class and composing them in inlined tree expressions, e.g:
import static org.gcube.data.trees.data.Nodes.*; … Tree tree = attr(t("sourceId","rootId", e("a","..."), e(q("http://bar.org","b"),"...")), a("x",1),a(q("http://foo.org","y"),"..."));
The functional style is far less verbose, and is particularly suited for testing. t()
, a()
, e()
, and q()
are examples of Node
, attribute, Edge
and QName
builders, respectively. We use them to avoid the new
operator for those types. They also perform object-to-string conversions.
We show some further sample expressions below, while details on all the available builders are in the Javadoc documentation of the Nodes
class:
//child-less root: t() is a tree builder Tree tree = t(); //with identifier tree = t("id1"); //with identifier and provenance tree = t("source","id1"); //with a simple child: e() is an edge builder tree = t(e("a",1)); //with a qualified edge label: q() is a QName builder tree = t(e("http://acme.org","a",1)); //with a second, identified simple child: l() is a leaf builder tree = t( e("a",1), e("b",l("id2","foo"))); //with a complex child: n() is an inner node builder tree = t( e("a", n( e("b",1)))); //with attributes: a() is an attribute builder, attr() is attribute-decorator for nodes tree = attr( t(....), a("a1","3"), a(q("http://acme.org","a2"),"foo")); //with an attributed child tree = t( e("a", attr(l("foo"),a("b1","3"),a("b2","bar"))));
We can of course mix object-oriented and functional style.
Inspecting and Traversing Trees
Given any Node
, we can:
- inspect its identifier (
id()
); - inspect its attributes(
attributes()
,attribute(QName)
,hasAtttribute(QName)
); - inspect its ancestors (
ancestors()
,ancestorAndSelf()
) - move to its parent
Node
, if any exists (parent()
);
note: for convenience, all methods that take QName
s are overloaded to accept unqualified names, as well as (namespace,local name) pairs. This also applies to QName
-based methods discussed below.
Given an InnerNode
s we can also:
- inspect its children (
children()
,children(QName)
,children(Class<? extends Node>), <code>children(Class<? extends Node,QName>)
); - move to its children (
child(QName)
,child(Class<? extends Node>,QName)
); - inspects its descendants along paths of edge labels (
descendants()
,descendants(QName...)
,descendants(Class<? extends Node>,QName...)
); - move to its descendants along paths of node identifiers (
descendant(String...)
,descendant(Class<? extends Node>, String)
); - inspect its edges (
edges()
,edges(QName)
); - move along its edges (
edge(QName)
,hasEdge(QName)
); - inspect the labels of its edges (
labels()
,labels
).
note: methods that return non-List
values fail with an IllegalStateException
whenever the input is ambiguous, e.g. child(QName)
fails if there is more than one edge with the given label;
note: methods that take Class<? extends Node>
parameters filter the output to contain only instances of the input class, i.e. only InnerNode
s or Leaf
s). For fluency, the constants Nodes#N
and Nodes#L
can be used in lieu of, respectively, InnerNode.class
, Leaf.class
, e.g.:
import static org.gcube.data.trees.data.Nodes.*; ... Node n = ... List<InnerNode> complexChildren = n.children(N); Leaf leaf = n.child(N,"a").child(N,"b").child(L,"c");
note: all methods that take QName
s accept regular expressions, both on the namespace and the local name of the label, e.g.:
import static org.gcube.data.trees.data.Nodes.*; ... Node n = ... List<Node> bChildren = n.children("b.*");
Finally, we can:
- inspect the value of a
Leaf
(value()
); - inspect the provenance of a
Tree
(sourceId()
); - inspect that label of an
Edge
(label()
); - move to the target
Node
of anEdge
(target()
).
Some further examples:
import static org.gcube.data.trees.data.Nodes.*; Tree tree = attr(t("1", e("a",l("2",5)), e("b",attr( n("3",e("c",4)), a("foo",0))), e("c",5)), a("x",0)); //typed children String val = doc.child(L,"a").value(); //typed descendant String val2 = tree.descendant(N,"3").child(L,"c").value(); for (InnerNode node : tree.children(N)) for (QName l : node.labels()) //process label for (Node n : tree.descendants("b","e")) for (Edge siblingEdge : n.parent().edges()) if (siblingEdge.target()!=n) //process sibling of descendant
XML Bindings
Trees have straightforward XML representations. Nodes become elements and are named after incoming edges, attributes become element attributes. Moreover:
- the root element is arbitrarily named;
- the content of
Leaf
elements areLeaf
values; - the content of
InnerNode
elements is the representation ofInnerNode
's children; - node identifiers become attributes called
http://gcube-system.org/namespaces/data/trees:id
; - source identifiers become attributes called
http://gcube-system.org/namespaces/data/trees:sourceId
.
Thus one the sample trees discussed above:
attr(t("sourceId","rootId", e("a","..."), e(q("http://bar.org","b"),"...")), a("x",1),a(q("http://foo.org","y"),"..."));
can be represented in XML as follows:
<t:root xmlns:t="http://gcube-system.org/namespaces/data/trees" xmlns:foo="http://foo.org" t:id="rootId" t:sourceId="sourceId" x="1" foo:y="..."> <a>...</a> <bar:b xmlns:bar="http://bar.org">...</bar:b> </t:root>
Note that there is no explicit trace in the representation of node or edge concepts. In this sense, it is a "natural" representation, rather than a "serialisation". This means that we can easily parse and manipulate trees with any XML parsing tool.