The Tree Manager Framework
The Tree Manager service may be called to persist or retrieve edge-labelled trees, either one at the time or many at once. Either way, the data is not necessarily stored locally to the service, or as trees. Instead, the data is most often held remotely, it is autonomously managed, and it is exposed by other services in a variety of forms and through different APIs.
The main value proposition of the service is that, in many cases and for many purposes, this variety of data sources may be ignored and the data uniformly accessed under a sufficiently general API and data model. This uniformity defines a basis for interoperability between service clients and data sources. It enables generic clients to implement cross-domain functions - including data indexing, transformation, discovery, transfer, browsing, viewing, etc. - over a single data model, against a single API, and with a consistent set of tools. Similar advantages can be granted to less generic clients that implement domain-specific or application-specific functions, provided that consensus is achieved around conventional uses of the tree model.
Within the service, uniformity is achieved with two-way transformations from the API and tree model of the service to those of the underlying data sources. Transformations are implemented in plugins, libraries developed in autonomy from the service so as to extend its capabilities at runtime. Service and plugins interact through a protocol defined by a set of local interfaces which, collectively, serve as a framework for plugin development.
The framework is packaged and distributed as a stand-alone library, the tree-manager-framework
, and serves as a dependency for both service and plugins.
The library and all its transitive dependencies are available in our Maven repositories. Plugins that are managed with Maven, can resolve them with a single dependency declaration:
<dependency> <groupId>org.gcube.data.access</groupId> <artifactId>tree-manager-framework</artifactId> <version>...</version> <scope>compile</scope> </dependency>
In what follows, we address the plugin developer and describe the framework in detail, illustrating also design options and best practices for plugin development.
Contents
Overview
We start by overviewing the key components of the framework, their role in the design of a plugin, and their relationships.
The service and the plugin interact in order to notify each other of the occurrence of certain events:
- the service observes events that relate to its clients, first and foremost their requests. These events translate in actions which the plugin must perform on data sources;
- the plugin may observe events that relate to the data source, first and foremost changes to their state. These events need to be reported to the service.
In essence, the framework defines the interfaces through which all the relevant events may be notified.
The most important events are client requests, which can be of one of the following types:
- bind requests, where client ask the plugin to "connect" to given sources. These clients know about the plugin and have included in the request all the information that the plugin needs in order to establish a binding. The service delivers the request to a
SourceBinder
provided by the plugin and expects back oneSource
instance for each bound source. The plugin configuresSource
s with information extracted or derived from the request, and configures them with other components that the service needs to access in future read and write requests. Thereafter, the service manages theSource
s on behalf of the plugin.

- read requests, where clients want to retrieve data from sources that have been previously bound to the plugin, specifying either one or more identifiers to resolve (lookup requests) or a pattern to match (query requests). These clients may not be aware of the plugin, having only discovered that the service gives read access to sources of interest. The service resolves
Source
s from requests and then relays them to associatedSourceReader
s, expecting trees back. It is the job of the plugin to translate the requests for the API of the bound sources and to transform the results returned by the sources into trees.

- write requests, where clients want to store data in sources that have been previously bound to the plugin, either new data (add requests) or changes to existing data, including deletions (update requests). These clients typically knows about the plugin and what type of trees it expects to add or update. The service resolves
Source
s from requests and then relays them to associatedSourceWriter
s. Again, it is the job of the plugin to translate the requests for the API of the bound sources.

Besides relaying client requests, the service also notifies the plugin of key events in the lifetime of its source bindings. It does so by invoking event-specific callbacks of a SourceLifecycle
associated with the Source
s. As we shall see, lifetime events include binding initialisation, reconfiguration, passivation, resumption, and termination.

These are all the events that the service observes and passes on to the plugin. Others events may be observed directly by the plugin, including changes in properties or status of bound sources. These events are predefined as SourceEvent
instances and the plugin reports them to SourceNotifier
s that the service itself configures on Source
s. The service also registers its own SourceConsumer
s with SourceNotifier
s to receive event notifications. If useful for its own design, the plugin may also implement its own SourceEvent
s and SourceConsumer
s.

All the key components of the plugin are introduced to the service through an implementation of the Plugin
interface. From it, the service obtains the SourceBinder
s and, from the binder, the Source
s, their SourceLifecycle
s, their SourceReader
s and their SourceWriter
s. In addition, the Plugin
implementation exposes descriptive information about the plugin that the service publishes in the infrastructure and uses in order to mange the plugin. For increased control over its own lifecycle, the plugin may implement the PluginLifecycle
interface, which extends Plugin
with callbacks invoked by the service when it loads and unloads the plugin.
To bootstrap the process of component discovery and find Plugin
implementations, the service uses the ServiceLoader
mechanism defined by the language. Accordingly, the plugin includes a file META-INF/services/org.gcube.data.tmf.api.Plugin
in its Jar, where the files contains a single line with the qualified name of its Plugin
or PluginLifecycle
implementation.
 This completes our quick overview of the main interfaces and classes provided by the framework. Note that, besides implementing the interfaces that define the interaction protocol with the the service, the plugin is free to design and develop against the framework using any technology that seems appropriate to the task.
In the rest of this guide we look at the key components of the framework in more detail.
Key Design Issues
The framework has been designed to support a wide range of plugins. There are indeed many degrees of freedom in plugin design:
- what sources can it bind to? the plugin may
- be dedicated to specific data sources (source-specific plugin)
- target open-ended classes of data sources which publish data through standard APIs and in standard models (source-generic plugin);
- what kind of trees does it accepts and/or returns? All plugins transform to trees and/or from trees, but the structure and intended semantics of those trees may vary substantially across plugins:
- the plugin may be fully general transform a data model which is as general-purpose as the tree model (type-generic plugin). A plugin for data sources that expose arbitrary instances of XML data or RDF data through some standard API, for example, fall within this category. In this case, the meaning and shape of the trees may be unconstrained in principle, or it may be constrained only at the point of binding to specific data sources.
- the plugin may be extremely specific and transform a concrete data model into trees with well-defined structures, i.e. abiding to a statically defined set of constraints on edge labels and leaf values (type-specific plugin). In this case, the tree model of the service is used as a general-purpose carrier for the original data model. The plugin documentation will include the definition of its tree type, anything ranging from narrative to formal XML Schema definitions. The definition may be specific to the plugin or else reflect wider consensus.
- most plugins will always work with a single tree type, but this is not a constraint imposed by the framework. The plugin may support transformations into a number of tree types and allow binding clients to indicate in their bind requests the type they desire sources to be bound with (multi-type plugin). The plugin may embed the transformations , or else take a more a dynamic approach and define a framework for transformers which are discoverable on the classpath.
- the plugin may support a single transformation which outputs trees that can be assigned multiple types, from more generic to more specific types, e.g. a generic RDF type as well as a more specific type associated with some RDF schema.
- what requests does it support?
- all plugins must accept at least one form of bind request, but a plugin may support many so as to cater for different types of bindings, or to support reconfiguration of a previous binding;
- most plugins bind a single source per bind request, but some may bind many at once for some requests;
- most plugins support read requests but do not support write requests, typically because the bound sources are static, or grant write access only to privileged clients. In principle at least, the converse may apply and a plugin may grant only write access to the sources. Overall, a plugin may support one of the following access modes: read-only, write-only, or read-write.
- what functional and QoS limitations does it have? Rarely will the API and tree model of the service prove functionally equivalent to those of the bound sources. Even if the plugin restricts its support to a particular access mode, e.g. read-only, it may not be able to support all the requests associated with that mode, or to support them all efficiently. For example, its bound sources may:
- offer no lookup API because they they do not mandate or regulate the existence of identifiers in the data;
- offer no query API, or else support (the equivalent of) a subset of the patterns that clients may indicate in query requests;
- not allow the plugin to retrieve, add, or update many trees at once.;
- not support updates or not support partial updates or not support deletions.
- In some cases, the plugin may be able to compensate for differences, typically at the cost of reduction in QoS. For example, the plugin:
- it may be configured at binding time with queries that model lookups, differently for different bound sources;
- it may partially transform patterns and then do local matches on the results returned by sources (2-phase match).
- if the bound sources do not support partial updates, it may fetch the data first and then apply them locally;
- In other cases, such as then bound sources do not support deletions, the plugin has not other obvious option but to fail requests.
Answering the questions above fixes some of the free variables in plugin design and helps to characterise it ahead of implementation. Collectively, the answers define a "profile" for the plugin and the presentation of this profile should have a central role in its documentation.
Plugin, PluginLifecycle, and Environment
A plugin implements the following methods of the Plugin
interface:
-
String name()
: returns the name of the plugin. The service will publish it and its clients may use it to discover instances of the service which have been extended with the plugin.
-
String description()
: returns a brief description of the plugin. The service will publish it so that it can be inspected and displayed by a range of clients;
-
List<Property> properties()
: returns triples (name, value, description), allString
-valued. The service will publish them and its clients may use them to identify instances of the service which have been extended with the plugin. The plugin decides what properties may be useful to clients for discovery, inspection, or display. For example, if the plugin is multi-type, it will probably list the types that it supports here. The implementation returnsnull
or an empty list if it has no properties to publish;
-
SourceBinder binder()
; returns the plugin's implementation of theSourceBinder
interface. The service will relay bind requests to it.
-
List<String> requestSchemas()
: the schemas of the bind requests that the plugin can process. These will be published by the service to instruct binding clients to formulate their bind requests. There are GUIs within the system that use the schema to generate forms for interactive formulation of bind requests. The implementation may returnnull
and decide to document its expectations elsewhere. If it does not returnnull
, it is free to use any schema language of choice, though the existing GUIs expect XML Schemas which is thus the recommended language. Note that, in the common case in which the plugin models requests with Java classes and use JAXB as standard data binding solution, it can easily generate schemas directly in the implementation of the method, usingJAXB.generateSchema()
.
-
boolean isAnchored()
: returns an indication of whether the plugin is anchored, i.e. stores data locally to the service. Iftrue
, the service will inhibit its internal replication schemes. In the common case in which the plugin targets remote data sources, the implementation will simply returnfalse
.
As mentioned above, a plugin that needs more control over its own lifetime can implement PluginLifecycle
, which extends Plugin
with the following callback methods:
-
void init(Environment)
is invoked when the plugin is first loaded;
-
void stop(Environment)
is invoked when the plugin is unloaded;
For example, the plugin may implement init()
to to start up a DI container of the likes of Spring, Guice, of CDI.
Environment
is implemented by the service to encapsulate access to the environment in which the plugin is deployed or undeployed. At the time of writing, it serves solely as a sandbox over the location of the file system which may be accessed by the plugin. Accordingly, it exposes only the following method:
-
File file(path)
, which returns a file with a given path relative to the storage location.
SourceBinder
Whenever clients request bindings of data sources, the service consults the Plugin
implementation discussed above and obtains a SourceBinder
. It then invokes its single method:
List<Source> bind(Element)
The SourceBinder
attempts the binding on the basis of the information found in the client request, and it returns a list of corresponding Source
s. Note that:
- the service ignores the particular shape of the request and passes it to
bind()
as a DOM’sElement
. The plugin may inspect the request with the DOM API, any other XML API, or by binding the request to some Java class, e.g. using JAXB;
- the plugin may accept a single type of request or many alternative types;
- the plugin must throw an
InvalidRequestException
if the request is unrecognised or otherwise invalid, and a genericException
for any other problem that it may encounter in the execution ofbind()
;
- in most cases, the request will result in the binding of a single data source, providing precise coordinates to identify it (e.g. an endpoint address). In some cases, however, the request may provide less pinpoint information, and the plugin may identify and bind at once many data sources from it. This explains the
List
type for the return value.
The actual binding process may vary significantly across plugins. For many, it may be as simple as extracting the endpoint of some remote data access service from the request (and checking its availability). For others, it may require discovering such an endpoint through some registry. Yet for others it may be a complex process comprised of a number of local and remote actions.
Finally, it should be noted that:
- the service may not use all the
Source
s returned by the plugin. In particular, it will discardSource
s that the plugin has already bound in previous invocations ofbind()
(this may occur if two bind requests target overlapping sets of data sources, or because they are identical requests issued from two autonomous clients, or because one request is aimed explicitly to the re-configuration of sources already bound by the other). Whenever possible, the plugin should avoid side-effects or expensive work inbind()
, e.g. engage in network interactions. Rather, it should defer expensive work inSourceLifecycle.init()
, as the service will make this callback only forSource
s that it effectively retains. The minimal amount of work that the plugin must do inbind()
is really to identify resources and setting theirSourceLifecycle
. We discussSourceLifecycle
below.
- the service sets
SourceNotifier
s andEnvironment
s onSource
s when theSourceBinder
returns them frombind()
. Accordingly, if the plugin needs to access the file system or notify an event at binding time, it should do so inSourceLifecycle.init()
rather than inbind()
. This is a corollary of the recommendation made above, i.e. avoid actions with side-effects inbind()
.
Source
If Plugin
implementations provide the service with information about the plugin, Source
s provide it with information about the data sources that become bound to the plugin. They do so by implementing the following methods:
-
String id()
: returns the identifier of the source, which the service uses to tell sources apart;
-
String name()
: returns the descriptive name of the source, which the service publishes and clients may use it to discover the source;
-
String description()
: returns a brief description of the source, which the service publishes for reporting purposes;
-
List<Property> properties()
: returns arbitrary properties of the bound source as triples (name, value, description), allString
-valued. The service publishes them and its clients may use them to discover the source. Implementations must returnnull
or an empty list if they have no properties to publish;
-
List<QName> types()
: returns all the tree types produced and/or accepted by the bound source, as discussed above. These are qualified names that characterises the edge labels and leaf values that theSourceReader
produce and that theSourceWriter
consumes. The service publishes the types and its clients may use them to discover sources that produce or consume data with expected properties;
-
Calendar creationTime()
: returns the time in which the source was created (the source, not theSource
object). The service publishes this information, but implementations can returnnull
if the source does not expose this information;
-
boolean isUser()
: indicates whether the source ought be to marked as a user-level source or a system-level source. This is not a security option as such, and it does not imply any form of authorisation or query filtering. It’s rather a marker that may be used by certain clients to exclude system sources from their processes. In the vast majority of cases, plugins will bind user-level sources. If appropriate, the may be configured by binding clients to bind system-level sources;
The Source
properties exposed through methods above are static in nature, in that the plugin sets them at source binding time. Others are instead dynamic, in that the plugin may update them during the lifetime of the binding:
-
Calendar lastUpdate()
: returns the time in which the source was last updated (the source, not theSource
object). Implementations can returnnull
if the source does not expose this information;
-
Long cardinality()
: returns the number of elements in the source. Implementations can returnnull
if the data source does not expose this information and implementations cannot derive it;
The service publishes dynamic properties along with static properties, but it also associates them with topics for notification. Clients can subscribe for changes to the source and be notified when these changes occur. The plugin is responsible for changing these properties and for firing the corresponding event to the service, which then takes over and does the rest. We discuss how plugin can fire events can be fired later on.
Besides descriptive information, Source
s must provide the service with other components that are logically associated with it:
SourceLifecycle lifecycle()
: returns the lifecycle of the source. The service invokes its methods to notify the occurrence of certain events in the source’s lifetime;
-
SourceReader reader()
: returns theSourceReader
of the source. The service invokes its methods to relay read requests to the plugin. Implementations can returnnull
if the plugin does not support read requests. Note that in this case the plugin must support write requests;
-
SourceWriter writer()
: returns theSourceWriter
of the source. The service invokes its methods to relay write requests to the plugin. Implementations can returnnull
if the plugin does not support write requests. Note that in this case the plugin must support read requests;
If the plugin extends the default implementations of SourceLifecycle
, SourceReader
, or SourceWriter
, the methods above can be overridden to restrict their output to more specific classes. This avoids casts in components that access the implementations through Source
s, e.g.:
@Override public MyReader reader() { return (MyReader) super.reader(); }
Next, Source
s allow the service to set and then access its implementations of Environment
and SourceNotifier
:
Environment environment(); void setEnvironment(Environment); SourceNotifier notifier(); void setNotifier(SourceNotifier);
We have discussed above how plugins can use the Environment
to access the deployment context of the plugin. We discuss later how they can use the SourceNotifier
to notify the service of events that relate to the source.
Note also that Source
s may be passivated to disk by the service, as we discuss in more detail below. Source
is indeed a Serializable
interface, and the final requirement is that implementations honour that interface.
The framework provides an AbstractSource
class that implements the interface partially. Sources
implementations can and should extend it to avoid plenty of boilerplate code (state variables, accessor methods, default values, implementations of equals()
, hashcode()
, and toString()
, shutdown hooks, correct serialisation, etc.). AbstractSource
simplifies also the management of dynamic properties, in that it automatically fires a change event whenever the plugin changes the time of last update of Source
s.
At its simplest, a Source
implementation may take the following form:
public class MySource extends AbstractSource { private static final long serialVersionUID = 1L; //your additional fields, if any public MySource(String id) { super(id); } @Override public List<QName> types() { //here factored-out in constants because fixed return Collections.singletonList(MyConstants.TYPE); } @Override public List<Property> properties() { //here factored-out in constants because fixed return MyConstants.PROPERTIES; } //your additional methods, if any }
SourceLifecycle
The SourceLifecycle
interface define the following callbacks:
-
void init()<?code>: called by the service during bind requests to initialise the <code>Source
s previously bound by theSourceBinder
. As discussed above, this is place to perform actions that are expensive or generate side-effects. If the plugin needs to perform remote interactions or have some tasks to schedule, this is where it should do. It should also report any failure it encounters, so that the service can relay it to the binding client as the outcome of the request;
-
void reconfigure(Element)
: called by the service during bind requests to reconfigureSource
s previously bound by the plugin. As discussed above, this occurs when theSourceBinder
returns aSource
that it had already produced in previous bind requests. In this case, the service will use the oldSource
to relay the request and simply discard the one returned last by theSourceBinder
. If the plugin does not support reconfiguration, it must throw anInvalidRequestException
. If instead reconfiguration fails, the plugin must instead throw a genericException
;
-
void stop()
: called by the service if it is shutting down, or if it is passivatingSource
s to storage to release some memory. If the plugin has scheduled tasks for the management of theSource
s, this is a good time to gracefully stop them;
-
void resume()
: called by the service whenSource
s are revived from storage, either because the service has been restarted after a shutdown, or because theSource
s had been passivated to release memory resources but are now needed by service clients. If the plugin has scheduled tasks for the management of theSource
s, this is a good time to re-start them. If the attempt fails, the plugin should throw the failure so that the service can relay it clients;
-
void terminate()
: called by the service to signal that its clients no longer need to accessSource
s. If the plugin has some resources to release, this is the time to do it, typically after invokingstop()
to gracefully stop any scheduled tasks that may be running.
Plugins that need to implement only a subset of the callbacks above can extend LifecycleAdapter
and override only the callbacks of interest.
Note also that, like Source
, SourceLifecycle
is a Serializable
interface. The implementation must honour that interface.
Finally, note that all the callbacks assume that SourceLifecycle
s have access to the associated Source
s. Typically, implementations adopt the following pattern:
public class MyLifeCycle extends LifecycleAdapter { private static final long serialVersionUID = 1L; private final MySource source; //additional fields, if any... public MyLifeCycle(MySource source) { this.source = source; } //callbacks and additional methods, if any... }
SourceEvent, SourceNotifier, and SourceConsumer
SourceEvent
is a tagging interface for objects that represent events that relate to data sources and that may only be observed by the plugin. In the interface, two such events are pre-defined as constants:
-
SourceEvent.CHANGE
: this event occurs in correspondence with a change to the dynamic properties of aSource
, such as its cardinality or the time of its last update;
-
SourceEvent.REMOVE
: this event occurs when a data source is no longer available. Note that this is different from the event that occurs when clients indicate that access to the source is no longer needed (cf.SourceLifetime.terminate()
);
The plugin may have the means to observe these events, e.g. because the data source offers subscription mechanisms, or because it exposes its cardinality and the plugin polls it, or even because the plugin offers write-access to the source and thus observes directly when the source and its cardinality change.
In all these cases, the plugin should report events to the SourceNotifier
that the service has set on the Source
s, invoking its method:
void notify(SourceEvent);
Note again that, when Source
s extend AbstractSource
, changing their time of last update automatically fires SourceEvent.CHANGE
events. Unless there are no other reasons to notify events to the service, the plugin may never have to invoke notify()
explicitly.
Note also that, as we have already noted above, the service will inject a SourceNotifier
in the plugin's Source
s only after these are returned to it by SourceBinder.bind()
. Any attempt to notify events prior to that moment will fail. For this reason, if the plugin needs to change dynamic properties at binding time, then it should do so in SourceLifecycle.init()
.
SourceNotifier
has a second method that can be invoked to subscribe consumers for SourceEvent
notifications:
void subscribe(SourceConsumer,SourceEvent...)
This method subscribes a SourceConsumer
to one or more SourceEvents
. Normally, plugins will not have to invoke it, as the service will subscribe its own SourceConsumer
s with the SourceNotifier
s.
However, the plugin is free to use the available support for event notification within its own codebase. In this case, the plugin can define its own SourceEvent
s and implement and subscribe its own SourceConsumer
s. In this case, SourceConsumers
must implement the single method:
void onEvent(SourceEvent...)
which is invoked by the SourceNotifier
with one or more SourceEvent
s. Normally, the subscriber will receive single event notifications, but the first notification after subscription will carry the history of all the events previously notified by the SourceNotifier
.
Auxiliary APIs
All the previous interfaces provide a skeleton around the core functionality of the plugin, which is to transform the API and the tree model of the service to those of the bound sources. The task requires familiarity with three APIs defined outside the framework:
- the tree API, with which the plugin constructs and deconstructs the edge-labelled tree that it accepts in write requests and/or returns in read requests;
- the pattern API, with which the plugin constructs and deconstructs the patterns that characterise the trees returned by read requests. If the plugin supports such requests, it must ensure that it returns only trees that match given patterns, and in fact only the matching portions of those trees;
- the stream API, with which the plugin models the data streams that flow in and out of the plugin. Streams are used in read requests and write requests that take or return many data items at once, such as trees, tree identifiers, or even paths to tree nodes. The streams API models such data streams as instances of the
Stream
interface, a generalisation of the standard JavaIterator
interface which reflects the remote nature of the data. Not all plugins need to implement stream-based operations from scratch, as the framework offers synthetic implementations for them. These implementations, however, are derived from those that work with one data item at the time, hence have very poor performance when the data source is remote. Plugins should use them only when native implementations are not an option because the bound sources do not offer any stream-based or paged bulk operation. When they do, the plugin should really feed their transformed outputs intoStream
s. In a few cases, the plugin may need advanced facilities provided by the streams API, such as fluent idioms to convert, pre-process or post-process data streams.
Documentation on working with trees, tree patterns, and streams is available elsewhere, and we do not replicate it here. The tree API and the pattern API are packaged together in a trees
library available in our Maven repositories. The streams API is packaged in a streams
library also available in the same repositories. If the plugin also uses Maven for build purposes, these libraries are already available in your classpath as indirect dependencies of the framework.
SourceReader
A plugin implements the SourceReader
interface to provide a tree view of the data in a bound source. To begin with, a SourceReader
implements the following "lookup" methods:
-
Tree get(String,Pattern)
: returns a tree with a given identifier and pruned with a givenPattern
.
- The implementation must throw an
UnknownTreeException
if the identifier does not identify a tree in the source, and anInvalidTreeException
if a tree can be identified but does not match thePattern
;
-
Stream<Tree> get(Stream<String>,Pattern)
: returns trees with given identifiers and pruned with a givenPattern
.
- The implementation must throw a generic
Exception
if it cannot produce the stream at all, though it must simply not add to it whenever trees cannot be identified or do not match thePattern
.
In addition, a SourceReader
implements the following "query" method:
-
Stream<Tree> get(Pattern)
: returns trees pruned with a givenPattern
.
- Again, the implementation must throw a generic
Exception
if it cannot produce the stream at all, though it must simply not add to it whenever trees do not match thePattern
.