Difference between revisions of "The Tree Manager Framework"

From Gcube Wiki
Jump to: navigation, search
(Plugin, PluginLifecycle, and Environment)
(SourceWriter)
 
(46 intermediate revisions by the same user not shown)
Line 1: Line 1:
The [[The Tree Manager|Tree Manager]] service may be called to persist or retrieve edge-labelled trees, either one at the time or many at once. Either way, the data is not necessarily stored locally to the service, or as trees. Instead, the data is most often held remotely, it is autonomously managed, and it is exposed by other services in a variety of forms and through different APIs.  
+
The [[The Tree Manager|Tree Manager]] service may be called to store or retrieve edge-labelled trees. Either way, the data is not necessarily stored locally to service endpoints, nor it is stored as trees. Instead, the data is most often held in remote data sources, it is managed independently from the service, and it is exposed by other access services in a variety of forms and through different APIs.  
  
The main value proposition of the service is that, in many cases and for many purposes, this variety of data sources may be ignored and the data uniformly accessed under a sufficiently general API and data model. This uniformity defines a basis for interoperability between service clients and data sources. It enables generic clients to implement cross-domain functions - including data indexing, transformation, discovery, transfer, browsing, viewing, etc. - over a single data model, against a single API, and with a consistent set of tools. Similar advantages can be granted to less generic clients that implement domain-specific or application-specific functions, provided that consensus is achieved around conventional uses of the tree model.
+
The service applies transformations from its own API and tree model to those of the underlying data sources. Transformations are implemented in ''plugins'', libraries developed in autonomy from the service so as to extend its capabilities at runtime. Service and plugins interact through a protocol defined by a set of local interfaces which, collectively, define a ''framework'' for plugin development.
  
Within the service, uniformity is achieved with two-way transformations from the API and tree model of the service to those of the underlying data sources. Transformations are implemented in ''plugins'', libraries developed in autonomy from the service so as to extend its capabilities at runtime. Service and plugins interact through a protocol defined by a set of local interfaces which, collectively, serve as a ''framework'' for plugin development.  The framework is packaged as a stand-alone library, the <code>tree-manager-framework</code>, and is a dependency for both service and plugins.  
+
The framework is packaged and distributed as a stand-alone library, the <code>tree-manager-framework</code>, and serves as a dependency for both service and plugins.  
  
  
Line 9: Line 9:
  
  
The library and all its transitive dependencies are available in our Maven [http://maven.research-infrastructures.eu/nexus repositories]. Plugins that are managed with Maven, can resolve them with a single dependency:  
+
The library and all its transitive dependencies are available in our Maven [http://maven.research-infrastructures.eu/nexus repositories]. Plugins that are managed with Maven, can resolve them with a single dependency declaration:  
  
 
<source lang="xml">
 
<source lang="xml">
Line 24: Line 24:
 
= Overview =
 
= Overview =
  
We start by overviewing the key components of the framework, their role in the design of a plugin, and their relationships.
+
Service and the plugins interact in order to notify each other of the occurrence of certain events:
  
The service and the plugin interact in order to notify each other of the occurrence of certain events. The service observes events that relate to its clients, first and foremost their requests; these translate in actions which the plugin must perform on data sources. Vice versa, the plugin may observe events that relate to the data source, first and foremost changes to their state; these need to be reported to the service. The framework defines the interfaces through which all these events may be notified.
+
* the service observes events that relate to its clients, first and foremost their requests. These events translate in actions which plugins must perform on data sources;  
  
There are three types of client requests that the service may relay to the plugin:
+
* plugins may observe events that relate to data sources, first and foremost changes to their state. These events need to be reported to the service.
  
* '''bind requests''', where a client asks the plugin to connect to given sources. This type of client knows about the plugin and has in fact included in the request all the information that the plugin needs in order to establish a binding. The service delivers the request to a <code>SourceBinder</code> provided by the plugin and will expect back one <code>Source</code> for each bound source. The plugin configures <code>Source</code>s with information extracted or derived from the request, and inject them with other components that the service needs to access for later requests.  Thereafter, the service manage <code>Source</code>s on the plugin's behalf.
+
The framework defines the interfaces through which all the relevant events may be notified.
 +
 
 +
The most important events are client requests, which can be of one of the following types:
 +
 
 +
* '''bind request'''
 +
:a client asks the service to "connect" to one or more data sources. The client targets a specific plugin and includes in the request all the information that the plugin needs in order to establish the bindings. The service delivers the request to a <code>SourceBinder</code> provided by the plugin, and it expects back one <code>Source</code> instance for each bound source. The plugin configures the <code>Source</code>s with information extracted or derived from the request.  Thereafter, the service manages the <code>Source</code>s on behalf of the plugin.
  
  
Line 36: Line 41:
  
 
 
* '''read requests''', where a client wants to retrieve data from a source previously bound to the plugin, and specifies either one or more identifiers to resolve (''lookup requests''), or patterns to match (''query requests''). This type of client may not know about the plugin at all, having simply discovered that the service gives read access to a source of interest. The service will resolve the plugin's  <code>Source</code>s from requests and then deliver them to the <code>SourceReader</code>s provided by the <code>Source</code>s, expecting trees back. It is the plugin's job to translate requests for the API of the bound sources and to transform the results returned by the sources into trees.  
+
* '''read request'''
 +
:a client asks the service to retrieve trees from a data source that has been previously bound to some plugin. The client may not be aware of the plugin, having only discovered that the service can read data from the target source. The service identifies a corresponding <code>Source</code> from the request and then relays the request to a <code>SourceReader</code> associated with the <code>Source</code>, expecting trees back. It is the job of the reader to translate the request for the API of the data source, and to transform the results returned by the source into trees.  
  
 
 
Line 42: Line 48:
  
  
* '''write requests''', where a client wants to store data in a source previously bound to the plugin, either new data (''add requests'') or changes to existing data, including removals (''update requests''). This type of client typically knows about the plugin and what this expects from the data. The service resolves the plugin's <code>Source</code>s from requests and then deliver them to the <code>SourceWriter</code>s provided by the <code>Source</code>s, passing trees to themAnd again, it is the plugin's job to translate requests for the API of the bound sources.  
+
* '''write request'''
 +
:a client asks the service to add or update trees in a data source that has been previously bound to some plugin. The client knows about the plugin and what type of trees it expects. The service identifies a corresponding <code>Source</code> from the request and then relays the request to a <code>SourceWriter</code> associated with the <code>Source</code>.  Again, it is the job of the writer to translate the request for the API of the target source, including transforming the input trees into the data structures that the source expects.  
 
 
  
Line 48: Line 55:
  
  
Note that the plugin must provide a <code>SourceBinder</code> and at least one <code>SourceReader</code> or <code>SourceWriter</code>.
 
  
Besides relaying client requests, the service also notifies the plugin of key events in the lifetime of its source bindings. It does so by invoking event-specific callbacks of the <code>SourceLifecycle</code> associated with the plugin's <code>Source</code>s. As we shall see, lifetime events include binding initialisation, reconfiguration, passivation, resumption, and termination.  
+
Besides relaying client requests, the service also notifies plugins of key events in the lifetime of their bindings. It does so by invoking event-specific callbacks of <code>SourceLifecycle</code>s associated with <code>Source</code>s. As [[#SourceLifecycle|we shall see]], lifetime events include the initialisation, reconfiguration, passivation, resumption, and termination of bindings.  
  
  
 
[[Image:tree-manager-framework-service-events.png|center]]
 
[[Image:tree-manager-framework-service-events.png|center]]
 +
  
 
 
These are all the events that the service observes and pass on to the plugin. Others events, however, may be observed by the plugin, such as changes in properties or status of bound sources. These events are predefined as <code>SourceEvent</code>s and the plugin reports them to the <code>SourceNotifier</code>s that the service injects in the plugin's <code>Source</code>s.  The service has its own <code>SourceConsumer</code>s that receive the notifications of the plugin. If useful to connect its components, the plugin can implement its own <code>SourceEvent</code>s and <code>SourceConsumer</code>s.
+
These are all the events that the service observes and passes on to plugins. Others events may be observed directly by plugins, including changes in the state of bound sources. These events are predefined <code>SourceEvent</code>s, and plugins report them to <code>SourceNotifier</code>s that the service itself associates with <code>Source</code>s.  The service also registers its own <code>SourceConsumer</code>s with <code>SourceNotifier</code>s so as to receive event notifications.  
  
  
 
[[Image:tree-manager-framework-plugin-events.png|center]]
 
[[Image:tree-manager-framework-plugin-events.png|center]]
 +
  
 
 
All the key components of the plugins are introduced to the service through an implementation of the <code>Plugin</code> interface. From it, the service gets the plugin's <code>SourceBinder</code> and from the binder it obtains the plugin's <code>Source</code>s, their <code>SourceLifecycle</code>s, their <code>SourceReader</code>s and <code>SourceWriter</code>s. In addition, the <code>Plugin</code> exposes descriptive information about the plugin that the service publishes in the infrastructure and use in order to mange the plugin. Optionally, the plugin may implement <code>PluginLifecycle</code>, which extends <code>Plugin</code> with callbacks invoked by the service when the plugin is loaded and unloaded. This gives the plugin more control on its lifecycle.
+
All the key components of a plugin are introduced to the service through an implementation of the <code>Plugin</code> interface. From it, the service obtains <code>SourceBinder</code>s and, from the binders, bound <code>Source</code>s. From bound <code>Source</code>s, the service obtains  <code>SourceLifecycle</code>s, <code>SourceReader</code>s, and <code>SourceWriter</code>s.  
 +
 
 +
In addition, <code>Plugin</code> implementations exposes descriptive information about plugins which the service publishes in the infrastructure and uses in order to mange the plugins. For increased control over their own lifecycle, plugins may implement the <code>PluginLifecycle</code> interface, which extends <code>Plugin</code> with callbacks invoked by the service when it loads and unloads plugins.
 +
 
 +
To bootstrap the process of component discovery and find <code>Plugin</code> implementations, the service uses the standard <code>ServiceLoader</code> mechanism. Accordingly, plugins include a file <code>META-INF/services/org.gcube.data.tmf.api.Plugin</code> in their Jar distributions, where the file contains a single line with the qualified name of the <code>Plugin</code> or <code>PluginLifecycle</code> implementation which they provide.
  
To bootstrap the process of component discovery and find <code>Plugin</code> implementations, the service adopts the standard Java mechanism based on a <code>ServiceLoader</code>. Accordingly, the plugin includes a file <code>META-INF/services/org.gcube.data.tmf.api.Plugin</code> in its Jar. The files contains a single line with the qualified name of its <code>Plugin</code> implementation.
 
  
 
[[Image:tree-manager-framework-pliugin-discovery.png|center]]
 
[[Image:tree-manager-framework-pliugin-discovery.png|center]]
  
 
This completes our quick overview of the main interfaces and classes provided by the framework. Note at the outset that, besides implementing the interfaces that define the interaction protocol with the the service, the plugin is free to design and develop against the framework using any technology that seems appropriate to the task.
 
  
In the rest of this guide we look at the key components of the framework in more detail.
+
= Design Plan =
  
= Key Design Issues =
+
The framework has been designed to support a wide range of plugins. The following questions characterise the design of a plugin and illustrate some key variations across designs:
  
The framework has been designed to support a wide range of plugins. There are indeed many degrees of freedom in plugin design:
+
* ''what sources can the plugin bind to?''
 +
:all plugins bind and access data sources, but their knowledge of the sources may vary. In particular:
  
* '''what sources can it bind to?''' the plugin may be dedicated to specific data sources (''source-specific plugin''), or it may target open-ended classes of data sources which publish data through standard APIs and in standard models (''source-generic plugin'');
+
: a '''source-specific plugin''' targets a given data source, typically with a custom API and data model;
 +
: a '''source-generic plugin''' targets an open-ended class of data sources with the same API and data model, typically a standard.
  
* '''what kind of trees does it accepts and/or returns?''' All plugins transform to trees and/or from trees, but what is the structure and intended semantics of those trees? Depending on the bound sources and the design of the plugin:
+
* ''what type of trees does it accepts and/or returns?''
 +
:all plugins transform data to trees and/or from trees, but the structure and intended semantics of those trees, i.e. their '''tree type''', may vary substantially from plugin to plugin. In particular:
  
** the plugin may be fully generic, i.e. transform a data model which is as general-purpose as the tree model (''type-generic plugin''). A plugin for data sources that expose arbitrary instances of XML data or RDF data through some standard API, for example, fall within this category. In this case, the meaning and shape of the trees may be unconstrained in principle, or it may be constrained only at the point of binding to specific data sources.  
+
: a '''type-specific plugin''' transforms a concrete data model into a corresponding tree type defined by the plugin or else through broader consensus. Effectively, the plugin uses the tree model of the service as a general-purpose carrier for the target data model. The tree-type may be s. Source-specific plugins are typically also type-specific.  
  
** alternatively, the plugin may be extremely specific and transform a concrete data model into trees with well-defined structures, i.e. abiding to a set of constraints on edge labels and leaf values which is statically defined (''type-specific plugin''). In this case, the tree model of the service serves as a general-purpose carrier for the original data model. The plugin documentation will include the definition of its tree type, anything ranging from narrative to formal XML Schema definitions. The definition may be specific to the plugin, or it may reflect a wider consensus towards which the plugin and many others may converge, regardless of the variety of their bound sources.  
+
: a '''type-generic plugin''' transforms data to and from a model which is as general-purpose as the tree model. The tree type of the plugin may be entirely unconstrained, or it may be constrained at the point of binding to specific data sources. For example, a plugin may target generic XML or RDF repositories.  
  
** most plugins will always work with a single tree type, but this is not a constraint imposed by the framework. The plugin may support transformations into a number of tree types and allow binding clients to indicate in their requests the type they desire sources to be bound with (''multi-type plugin''). The plugin may then embed multiple transformations, or take a more a dynamic approach, define a framework for transformers, and discover the transformers available on the classpath.
+
: a '''multi-type plugin''' transforms a concrete data model into a number of tree types, based on directives included in bind requests. The plugin may embed the required transformations, or take a more a dynamic approach and define a framework that can be extended by an unbound number of transformers. The plugin may also use a single transformation but assign multiple types to its trees, from more generic types to more specific types.
+
** finally, the plugin may support a single transformation which outputs trees that can be assigned multiple types, from more generic to more specific types, e.g. a generic RDF type as well as a more specific type associated with some RDF schema.
+
  
* '''what requests does it support?''' all plugins must accept at least one form of bind request, but a plugin may support many so as to cater for different types of bindings, or to support reconfiguration of a previous binding. Further, most plugins bind a single source per bind request, but some may bind many at once for some requests. Most plugins also support read requests but do not support write requests, typically because the bound sources are static, or grant write access only to privileged clients. In principle at least, the converse may apply and a plugin may grant only write access to the sources. Overall, a plugin may support one of the following access modes: read-only, write-only, or read-write.
+
* ''what requests does the plugin support?''
 +
:all plugins must accept at least one form of bind request, but a plugin may support many so as to cater for different types of bindings, or to support reconfiguration of previous bindings. For example, most plugins:
  
* '''what functional and QoS limitations does it have?''' Rarely will the API and tree model of the service prove functionally equivalent to those of the bound sources.  Even if the plugin restricts its support to a particular access mode, e.g. read-only, it may not be able to support all the requests associated with that mode, or to support them all efficiently. Its bound sources, for example, may offer no lookup API because they they do not mandate or regulate the existence of identifiers in the data. Alternatively, they may offer no query API, or else support (the equivalent of) a subset of the patterns that clients may indicate in query requests. Again, the bound sources may not allow the plugin to retrieve, add, or update many trees at once. In some cases, the plugin may be able to compensate for differences, typically at the cost of reduction in QoS. For example, the plugin may be configured at binding time with queries that model lookups, differently for different bound sources. Similarly, it may partially transform patterns and then do local matches on the results returned by sources (2-phase match). Coming to write requests, the bound sources may not support partial updates, forcing the plugin to fetch the data and apply them locally. Or they may not support updates at all, or they may not support deletions, leaving the plugin with no obvious option but to fail update requests.
+
:* bind a single source per request, but some may bind many source with a single request.
 +
:* support read requests but do not support write requests, typically because the bound sources are static or because they grant write access only to privileged clients. At least in principle, the converse may apply and a plugin may grant only write access to the sources. In general, a plugin may support one of the following access modes: ''read-only'', ''write-only'', or ''read-write''.
  
Answering these questions fixes some of the free variables in plugin design and and helps to characterise it ahead of implementation. Collectively, the answers define a profile for the plugin and should serve as a key element of its documentation.
+
* ''what functional and QoS limitations does the plugin have?''
 +
:rarely will the API and tree model of the service prove functionally equivalent to those of bound sources.  Even if a plugin restricts its support to a particular access mode, e.g. read-only, it may not be able to support all the requests associated with that mode, or to support them all efficiently. For example, the bound sources:
 +
 
 +
:* may offer no lookup API because they they do not mandate or regulate the existence of identifiers in the data;
 +
:* may offer no query API or support (the equivalent of) a subset of the filters that clients may specify in query requests;
 +
:* may not allow the plugin to retrieve, add, or update many trees at once;
 +
:* may not support updates at all, may not support partial updates, or may not support deletions.
 +
 
 +
:In some cases, the plugin may be able to compensate for differences, typically at the cost of QoS reduction. For example:
 +
 
 +
:* if the bound sources do not support lookups, the plugin may be configured in bind requests with queries that simulate them;
 +
:* if the bound sources support only some filters, the plugin may apply additional ones on the data returned by the sources;
 +
:* if the bound sources do not support partial updates, the plugin may first fetch the data first and then update it locally.
 +
 
 +
: In other cases, for example when bound sources do not support deletions, the plugin has not other obvious option but to reject client requests.
 +
 
 +
Answering the questions above fixes some of the free variables in plugin design and helps to characterise it ahead of implementation. Collectively, the answers define a "profile" for the plugin and the presentation of this profile should have a central role in its documentation.
 +
 
 +
= Implementation Plan =
 +
 
 +
Moving from the design of a plugin to its implementation, our [[#Overview|overview]] shows that the framework expects the following components:
 +
 
 +
:* a [[#Plugin, PluginLifecycle, and Environment|<code>Plugin</code>]] implementation, which describes the plugin to the service;
 +
:* a [[#SourceBinder|<code>SourceBinder</code>]] implementation, which binds data sources from client requests;
 +
:* a [[#Source|<code>Source</code>]] implementation, which describes a bound source to the service;
 +
:* a [[#SourceLifecycle|<code>SourceLifecycle</code>]] implementation, which defines actions triggered at key events in the management of a bound source;
 +
:* a [[#SourceReader|<code>SourceReader</code>]] implementation and/or a [[#SourceWriter|<code>SourceWriter</code>]] implementation, which provide read and write access over a bound source.
 +
 
 +
as well as:
 +
 
 +
:* a classpath resource <code>META-INF/services/org.gcube.data.tmf.api.Plugin</code>, which allows the service to discover the plugin.
 +
 
 +
Many of the components above give only structure to the plugin and most are straightforward to implement. As we shall see, the framework includes partial implementations of many interfaces that simplify further the development of the plugin, by reducing boilerplate code or providing overridable defaults.
 +
 
 +
With this support, the complexity of the plugin concentrates on the implementation of <code>SourceReader</code>s and/or <code>SourceWriter</code>s, where it varies proportionally to the capabilities of data sources and the sophistication required in transforming their access APIs and data model.
 +
 
 +
In the rest of this guide we look at each interface defined by the framework in more detail, discussing the specifics of their methods and providing advice on how to implement them.
  
 
= Plugin, PluginLifecycle, and Environment  =
 
= Plugin, PluginLifecycle, and Environment  =
  
A plugin implements the following methods of the <code>Plugin</code> interface:
+
The implementation of a plugin begins in <code>Plugin</code> interface. The interface defines the following methods, which the service invokes to gather information about the plugin when it first loads it:
  
* <code>String name()</code>: returns the name of the plugin. The service will publish it and its clients may use it to discover instances of the service which have been extended with the plugin.
+
* <code>String name()</code>
 +
:returns the name of the plugin. The service publishes this information so that its clients may find the service endpoints where the plugin is available.
  
* <code>String description()</code>: returns a brief description of the plugin. The service will publish it so that it can be inspected and displayed by a range of clients;
+
* <code>String description()</code>
 +
:returns a free-form description of the plugin. The service publishes this information so that it can be displayed to users through a range of interactive clients (e.g. monitoring tools);
  
* <code>List<Property> properties()</code>: returns triples (name, value, description), all <code>String</code>-valued. The service will publish them and its clients may use them to identify instances of the service which have been extended with the plugin. The plugin decides what properties may be useful to clients for discovery, inspection, or display. For example, if the plugin is [[#Key Design Issues|multi-type]], it will probably list the types that it supports here. The implementation returns <code>null</code> or an empty list if it has no properties to publish;
+
* <code>List<Property> properties()</code>
 +
:returns triples of  (property name, property value, property description) . The service publishes this information so that it can be displayed to users through a range of interactive clients (e.g. monitoring tools). The implementation must return <code>null</code> or an empty list if it has no properties to publish;
 
   
 
   
* <code>SourceBinder binder()</code>; returns the plugin's implementation of the <code>SourceBinder</code> interface. The service will relay bind requests to it.
+
* <code>SourceBinder binder()</code>
 +
: returns an implementation of the <code>SourceBinder</code> interface. The service will invoke this method whenever it receives a bind request, as discussed [[#SourceBinder|below]]. Typically, <code>SourceBinder</code> implementations are stateless and implement <code>binder()</code> so as to return always the same instance.
 
   
 
   
* <code>List<String> requestSchemas()</code>: the schemas of the bind requests that the plugin can process. These will be published by the service to instruct binding clients to formulate their bind requests. There are GUIs within the system that use the schema to generate forms for interactive formulation of bind requests. The implementation may return <code>null</code> and decide to document its expectations elsewhere. If it does not return <code>null</code>, it is free to use any schema language of choice, though the existing GUIs expect XML Schemas which is thus the recommended language. Note that, in the common case in which the plugin models requests with Java classes and use JAXB as standard data binding solution, it can easily generate schemas directly in the implementation of the method, using <code>JAXB.generateSchema()</code>.  
+
* <code>List<String> requestSchemas()</code>
 +
:returns the schemas of the bind requests that the plugin can process. The service will publish this information to show how clients can use the plugin to bind data sources. There are interactive clients within the system that use the request schema to generate forms for interactive formulation of bind requests. The implementation may return <code>null</code> and decide to document its expectations elsewhere. If it does not return <code>null</code>, it is free to use any schema language of choice, though XML Schemas remains the recommended language. In the common case in which bind requests are bound to Java classes with JAXB, the plugin can generate schemas dynamically using <code>JAXBContext.generateSchema()</code>.  
  
* <code>boolean isAnchored()</code>: returns an indication of whether the plugin is ''anchored'', i.e. stores data locally to the service. If <code>true</code>, the service will inhibit its internal replication schemes. In the common case in which the plugin targets remote data sources, the implementation will simply return <code>false</code>.
+
* <code>boolean isAnchored()</code>
 +
:returns an indication of whether the plugin is ''anchored'', i.e. stores data locally to service endpoints and does not access remote data source. If the implementation returns <code>true</code>, the service inhibits its internal replication schemes for the plugin. If the plugin accesses remote data sources, the implementation must return <code>false</code>.
  
As mentioned [[#Overview|above]], a plugin that needs more control over its own lifetime can implement <code>PluginLifecycle</code>, which extends <code>Plugin</code> with the following callback methods:
+
As mentioned [[#Overview|above]], a plugin that needs more control over its own lifetime can implement <code>PluginLifecycle</code>, which extends <code>Plugin</code> with the following methods:
  
* <code>void init(Environment)</code> is invoked when the plugin is first loaded;
+
* <code>void init(Environment)</code>
 +
:is invoked by the service when the plugin is first loaded;
  
* <code>void stop(Environment)</code> is invoked when the plugin is unloaded;
+
* <code>void stop(Environment)</code>
 +
:is invoked by the service when the plugin is unloaded;
  
For example, the plugin may implement <code>init()</code> to to start up a DI container of the likes of Spring, Guice, of CDI.
+
For example, the plugin may implement <code>init()</code> to start up a DI container.
  
<code>Environment</code> is implemented by the service to encapsulate access to the environment in which the plugin is deployed or undeployed. At the time of writing, it serves solely as a sandbox over the location of the file system which may be accessed by the plugin. Accordingly, it exposes only the following method:
+
<code>Environment</code> is implemented by the service to encapsulate access to the environment in which the plugin is deployed or undeployed. At the time of writing, it serves solely as a sandbox over the location of the file system which the plugin is allowed to write and read from. Accordingly, <code>Environment</code> exposes only the following method:
  
* <code>File file(path)</code>, which returns a file with a given path relative to the storage location.
+
* <code>File file(path)</code>
 +
:returns a <code>File</code> with a given path relative to the storage location of the plugin. The plugin may then use the <code>File</code> to create new files or read existing files.
  
 
= SourceBinder =
 
= SourceBinder =
  
Whenever clients request bindings of data sources, the service consults the <code>Plugin</code> implementation discussed [[#Plugin, PluginLifecycle, and Environment|above]] and obtains a <code>SourceBinder</code>. It then invokes its single method:
+
When it needs to relay bind requests to a given plugin, the service access its <code>Plugin</code>implementation and asks for a <code>SourceBinder</code>, as discussed [[#Plugin, PluginLifecycle, and Environment|<code>Plugin</code> above]] . It then invokes the single method of the binder:
  
 
<code>List&lt;Source&gt; bind(Element)</code>  
 
<code>List&lt;Source&gt; bind(Element)</code>  
  
The <code>SourceBinder</code> attempts the binding on the basis of the information found in the client request, and it returns a list of corresponding <code>Source</code>s. Note that:
+
The method returns one or more <code>Sources</code>, each of which represents a successful binding with a data source. The binding process is driven by the DOM element in input, which captures the request received by the service. It's up to the binder to inspect the request with some XML API, including data binding APIs such as JAXB, and act upon it. If the binder does not recognise the request, or else it finds it invalid, then the binder must throw an <code>InvalidRequestException</code>. The binder can throw a generic <code>Exception</code> for any other failure that occurs in the binding process, as the service will deal with it. 
 +
 
 +
As to the binding process itself, note that:
 +
 
 +
:* the process may vary significantly across plugins. For many, it may be as simple as extracting from the request the addresses of service endpoints that provide access to existing data sources. For others, it may require discovering such addresses through a registry. Yet for others it may be a complex process comprised of a number of local and remote actions.
 +
 
 +
:* in most cases, the binder will bind a single data source, hence return a single <code>Source</code>. In some cases, however, the binder may bind many data sources from a single request, hence return multiple <code>Source</code>s.
 +
 
 +
Finally, note that:
 +
 
 +
:* the service discards all the <code>Source</code>s that the binder has returned from previous invocations of <code>bind()</code>. For this reason, the binder should avoid performing expensive work in <code>bind()</code>, e.g. engage in network interactions. As we discuss [[#SourceLifecycle|below]], the plugin should carry out this work in <code>SourceLifecycle.init()</code>, which the service invokes only for the <code>Source</code>s that it retains.  The minimal amount of work that the binder should perform in <code>bind()</code> is really to identify data sources and build corresponding <code>Source</code>s.
 +
 
 +
:* the service configures a number of objects on the <code>Source</code>s returned by the binder. including a [[#SourceEvent, SourceNotifier, and SourceConsumer|<code>SourceNotifier</code>]] and an [[#Plugin, PluginLifecycle, and Environment|<code>Environment</code>]]. As these objects are not yet available in <code>bind()</code>, the binder must not access the file system or fire events. This is a corollary of the previous reccomendation, i.e. avoid side-effects in <code>bind()</code>.
 +
 
 +
= Source =
 +
 
 +
Plugins implement the <code>Source</code> interface to provide the service with information about the data sources bound by the [[#SourceBinder|<code>SourceBinder</code>]]. The services publishes this information on behalf of the plugin, and clients may use it to discover sources available for read and/or write access.
 +
Specifically, the interface defines the following methods:
 +
 
 +
* <code>String id()</code>
 +
:returns an identifier for the bound source.
 +
 
 +
* <code>String name()</code>
 +
: returns a descriptive name for the bound source.
 +
 
 +
* <code>String description()</code>
 +
: returns a brief description of the source.
 +
 
 +
* <code>List<Property> properties()</code>
 +
:returns properties for the bound source as triple (property name, property value, property description), all <code>String</code>-valued. These properties mirror the properties returned by [[#Plugin, PluginLifecycle, and Environment|<code>Plugin.properties()</code>]], though they relate to a single source rather than the plugin. Implementations must return <code>null</code> or an empty list if they have no properties to publish.
 +
 
 +
* <code>List<QName> types()</code>
 +
:returns all the [[#Design Plan|tree types]] produced and/or accepted by the plugin for the bound source.
 +
 
 +
* <code>Calendar creationTime()</code>
 +
:returns the time in which the bound source was created. Note that this is the creation time of bound source, not the <code>Source</code> instance's. Implementations must return <code>null</code> if the plugin has no means to obtain this information.
 +
 
 +
* <code>boolean isUser()</code>
 +
:indicates whether the bound source is intended for general access. This is not a security option as such, and it does not imply any form of authorisation or query filtering. Rather, it’s  a marker that may be used by certain clients to exclude certain sources from their processes. Most plugins bind always user-level sources, hence return <code>true</code> systematically. If appropriate, the plugin can be designed to take hints for bind requests.
 +
 
 +
 
 +
The callbacks above expose static information about bound sources, i.e. the plugin can set them at binding time. Others are instead dynamic, in that the plugin may update them during the lifetime of the source binding. The service publishes these properties along with static properties, but it also allows clients to be notified of their changes. The plugin is responsible for observing and relaying these changes to the service, as we discuss [[#SourceEvent, SourceNotifier, and SourceConsumer|below]]. The dynamic properties are:
 +
 
 +
* <code>Calendar lastUpdate()</code>
 +
:returns the time in which the bound source was last updated. Note that this is the last update time of the bound source, not the <code>Source</code> instance's. Implementations must return <code>null</code> if the plugin has no means to obtain this information.
 +
 
 +
* <code>Long cardinality()</code>
 +
:returns the number of elements in the bound source. Again, implementations must return <code>null</code> if the plugin has no means to obtain this information.
 +
 
 +
 
 +
Besides descriptive information, the services obtains from <code>Source</code>s other plugin components which are logically associated with the bound source. The relevant callbacks are:
 +
 
 +
*<code>SourceLifecycle lifecycle()</code>
 +
:returns the [[#SourceLifecycle|<code>SourceLifecycle</code>]] associated with the bound source;
 +
 
 +
* <code>SourceReader reader()</code>
 +
:returns the [[#SourceReader|<code>SourceReader</code>]] associated with the bound source.  Implementations must return <code>null</code> if the plugin does not support read requests. Note that in this case the plugin must support write requests.
 +
 
 +
* <code>SourceWriter writer()</code>
 +
:returns the :returns the [[#SourceWriter|<code>SourceWriter</code>]] associated with the bound source. Implementations must return <code>null</code> if the plugin does not support write requests. Note that in this case the plugin must support read requests;
 +
 
 +
Finally, note that:
 +
 
 +
:* <code>Source</code>s may be passivated to disk by the service, as we discuss in more detail [[#SourceLifecyle|below]]. For this reason, <code>Source</code> is a <code>Serializable</code> interface, and implementations must honour this interface.
 +
 
 +
:* The framework provides an <code>AbstractSource</code> class that implements the interface partially. <code>Sources</code> implementations can and should extend it to avoid plenty of boilerplate code (state variables, accessor methods, default values, implementations of <code>equals()</code>, <code>hashcode()</code>, and <code>toString()</code>, shutdown hooks, correct serialisation, etc.).  <code>AbstractSource</code> simplifies also the management of dynamic properties, in that it automatically fires a change event whenever the plugin changes the time of last update of <code>Source</code>s.
 +
 
 +
=SourceLifecycle=
 +
 
 +
Plugins implement the <code>SourceLifecycle</code> interface to be called back at key points in the lifetime their source bindings.
 +
The interface defines the following callbacks:
 +
 
 +
* <code>void init()</code>
 +
:the service calls this method on new bindings. As discussed [[#Source|above]], a plugin can use this method to carry expensive initialisation processes or produce side-effects. If the plugin needs to engage in remote interactions or has some tasks to schedule, this is the place where it should do it. Failures thrown from this method fail bind requests.
 +
 +
* <code>void reconfigure(Element)</code>
 +
:the service calls this method on existing bindings when clients attempt to rebind the same source. The plugin can use the bind request to reconfigure the existing binding. If the plugin does not support reconfiguration, the implementation must throw an <code>InvalidRequestException</code>. If reconfiguration is possible but fails, the implementation must throw a generic <code>Exception</code>.
 +
 
 +
* <code>void stop()</code>
 +
:the service calls this method when it is shutting down, or when it is passivating the bindings to storage in order to release some memory. If the plugin schedules some tasks, this is where it should stop them.
 +
 
 +
* <code>void resume()</code>
 +
:the service calls this method when it restarts after a previous shutdown, or when clients need a binding that was previously passivated by the service. If the plugin schedules some tasks, this is where it should re-start them. If the binding cannot be resumed, the implementation must throw the failure so that the service can handle it.
 +
 +
* <code>void terminate()</code>
 +
:the service calls this method when clients do no longer need access to the bound source. If the plugin has some resources to release, this is where is should do it, typically after invoking <code>stop()</code> to gracefully stop any scheduled tasks that may still be running.
 +
 
 +
 
 +
Note that:
 +
 
 +
* plugins that need to implement only a subset of the callbacks above can extend <code>LifecycleAdapter</code> and override only the callbacks of interest.
 +
 
 +
* like <code>Source</code>, <code>SourceLifecycle</code> is a <code>Serializable</code> interface. The implementation must honour this interface.
 +
 
 +
= SourceEvent, SourceNotifier, and SourceConsumer=
 +
 
 +
During its lifetime, a plugin may have the means to observe events that relate to data sources, typically through subscription or polling mechanisms exposed by the sources. In this case, the plugin may report these events to <code>SourceNotifier</code> that the service configures on <code>Source</code> at binding time.
 +
 
 +
The framework defines a tagging interface <code>SourceEvent</code> to model events. It also pre-defines two events as constants of the <code>SourceEvent</code> interface:
 +
 
 +
<code>SourceEvent</code> is a tagging interface for objects that represent events that relate to data sources and that may only be observed by the plugin. In the interface, two such events are pre-defined as constants:
 +
 
 +
* <code>SourceEvent.CHANGE</code>
 +
:this event occurs when the [[#Source|dynamic properties]] of a <code>Source</code> change, such as its cardinality or the time of its last update.
 +
 
 +
* <code>SourceEvent.REMOVE</code>
 +
:this event occurs when a bound source is no longer available. Note that this is different from the event that occurs when clients indicate that access to the source is no longer needed (cf. <code>SourceLifetime.terminate()</code>).
 +
 +
If the plugin observes then events, it can report them to the service by invoking the following method of the <code>SourceNotifier</code>:
 +
 
 +
<code>void notify(SourceEvent);</code>
 +
 
 +
Note again that:
 +
 
 +
:* when <code>Source</code>s extend <code>AbstractSource</code>, changing their time of last update automatically fires <code>SourceEvent.CHANGE</code> events. Unless there are no other reasons to notify events to the service, the plugin may never have to invoke <code>notify()</code> explicitly.
 +
 
 +
:* as [[#SourceBinder|already noted ]], the service will configure <code>SourceNotifier</code>s on <code>Source</code>s only after these are returned by <code>SourceBinder.bind()</code>. Any attempt to notify events prior to that moment will fail. For this reason, if the plugin needs to change dynamic properties at binding time, then it should do so in <code>SourceLifecycle.init()</code>.
 +
 
 +
<code>SourceNotifier</code> has a second method that can be invoked to subscribe consumers for <code>SourceEvent</code> notifications:
 +
 
 +
<code>void subscribe(SourceConsumer,SourceEvent...)</code>
 +
 
 +
This method subscribes a <code>SourceConsumer</code> to one or more <code>SourceEvents</code>. Normally, plugins do not have to invoke it, as the service will subscribe its own <code>SourceConsumer</code>s with the <code>SourceNotifier</code>s. In other words, the common flow of events is from the plugin to the service.
 +
 
 +
However, the plugin is free to make an internal use of the available support for event subscription and notification. In particular, the plugin can define its own <code>SourceEvent</code>s and implement and subscribe its own <code>SourceConsumer</code>s. In this case, <code>SourceConsumers</code> must implement the single method:
 +
 
 +
<code>void onEvent(SourceEvent...)</code>
 +
 
 +
which is invoked by the <code>SourceNotifier</code> with one or more <code>SourceEvent</code>s. Normally, the subscriber will receive single event notifications, but the first notification after subscription will carry the history of all the events previously notified by the <code>SourceNotifier</code>.
 +
 
 +
= Auxiliary APIs =
 +
 
 +
All the previous interfaces provide a skeleton around the core functionality of the plugin, which is to transform the API and the tree model of the service to those of the bound sources. The task requires familiarity with three APIs defined outside the framework:
 +
 
 +
* the [[The Trees Library|tree API]]
 +
:the plugin uses this API to construct and deconstruct the edge-labelled tree that it accepts in write requests and/or returns in read requests. The API offers a hierarchy of classes that model whole trees (<code>Tree</code>) as well as individual nodes (<code>Node</code>), fluent APIs to construct object graphs based on these classes, and various APIs to traverse them;
 +
 
 +
* the [[The Trees Library|pattern API]]
 +
:the plugin uses this API to constructs and deconstruct ''tree patterns'', i.e. sets of constraints that clients use in read requests to characterise the trees of interest, both in terms of topology and leaf values. The API offers a hierarchy of patterns (<code>Pattern</code>), method to fluently construct patterns, as well as methods to ''match'' tree against patterns (i.e. verify that the trees satisfy the constraints, cf. <code>Pattern.match(Node)</code>) and to ''prune'' trees with patterns (i.e. retain only the nodes that have been explicitly constrained, cf. <code>Pattern.prune(Node)</code>). The plugin must ensure that it returns trees that have been pruned with the patterns provided by clients;
 +
 
 +
* the [[The Streams LIbrary|stream API]]
 +
:the plugin uses this API to model the data streams that flow in and out of the plugin. Streams are used in read requests and write requests that take or return many data items at once, such as trees, tree identifiers, or even paths to tree nodes. The streams API models such data streams as instances of the <code>Stream</code> interface, a generalisation of the standard Java <code>Iterator</code> interface which reflects the remote nature of the data. Not all plugins need to implement stream-based operations from scratch, as the framework offers synthetic implementations for them. These implementations, however, are derived from those that work with one data item at the time, hence have very poor performance when the data source is remote. Plugins should use them only when native implementations are not an option because the bound sources do not offer any stream-based or paged bulk operation. When they do, the plugin should really feed their transformed outputs into <code>Stream</code>s. In a few cases, the plugin may need advanced facilities provided by the streams API, such as fluent idioms to convert, pre-process or post-process data streams.
 +
 +
Documentation on working with trees, tree patterns, and streams is available elsewhere, and we do not replicate it here. The tree API and the pattern API are packaged together in a <code>trees</code> library available in our Maven [http://maven.research-infrastructures.eu/nexus repositories]. The streams API is packaged in a <code>streams</code> library also available in the same repositories. If the plugin also uses Maven for build purposes, these libraries are already available in its classpath as indirect dependencies of the framework.
 +
 
 +
= SourceReader=
 +
 
 +
A plugin implements the <code>SourceReader</code> interface to process [[#Overview|read requests]]. The interface defines the following methods for "tree lookup":
 +
 
 +
* <code>Tree get(String,Pattern)</code>
 +
:returns a tree with a given identifier and pruned with a given <code>Pattern</code>. The reader must throw an <code>UnknownTreeException</code> if the identifier does not identify a tree in the source, and an <code>InvalidTreeException</code> if a tree can be identified but does not match the pattern. The reader should report any other failure to the service, i.e. rethrow it.
 +
 
 +
* <code>Stream<Tree> get(Stream<String>,Pattern)</code>
 +
:returns trees with given identifiers and pruned with a given <code>Pattern</code>. The reader must throw a generic <code>Exception</code> if it cannot produce the stream at all. It must otherwise handle lookup failures for individual trees as <code>get(String,Pattern)</code> does, inserting them into the stream.
 +
 
 +
 
 +
In addition, a <code>SourceReader</code> implements the following "query" method:
 +
 
 +
* <code>Stream<Tree> get(Pattern)</code>
 +
:returns trees pruned with a given <code>Pattern</code>. Again, the reader must throw a generic <code>Exception</code> if it cannot produce the stream at all, though it must simply not add to it whenever trees do not match the pattern.
 +
 
 +
 
 +
Finally, a <code>SourceReader</code> implements lookup methods for individual tree nodes:
 +
 
 +
* <code>Node getNode(Path )</code>
 +
:returns a node from the <code>Path</code> of node identifiers that connect it to the root of a tree. The reader must throw an <code>UnknownPathException</code> if the path does not identify a tree node.
 +
 
 +
* <code>Stream<Node> getNodes(Stream<Path>)</code>
 +
:returns nodes from the <code>Path</code>s of node identifiers that connect them to the root of trees. The reader must throw a generic <code>Exception</code> if it cannot produce the stream at all.  It must otherwise handle lookup failures for individual paths as <code>getNode(Path)</code> does, inserting them into the stream.
 +
 
 +
Depending on the capabilities of the bound source, implementing some of the methods above may prove challenging or altogether impossible. For example, if the source offers only lookup capabilities, the reader may not be able to implement query methods. In this sense, notice that the reader is not forced to fully implement any of the methods above. In particular, it can:
 +
 
 +
* throw a <code>UnsupportedOperationException</code> for all requests to a given method, or:
 +
* throw a <code>UnsupportedRequestException</code> for certain requests of a given method.
 +
 
 +
When this is the case, the plugin should clearly report its limitations in its documentation.
 +
 
 +
 
 +
Similarly, the plugin is not forced to implement all methods from scratch. The framework defines a partial implementation of <code>SourceReader</code>, <code>AbstractReader</code>, which the plugin can derive to obtain default implementations of certain methods, including:
 +
 
 +
* a default implementation of <code>get(Stream<String>,Pattern)</code>;
 +
* a default implementation of <code>getNode(Path)</code>;
 +
* a default implementation of <code>getNodes(Stream<Path)</code>.
 +
 
 +
These defaults are derived from the implementation of <code>get(String,Pattern)</code> provided by the plugin. Note, however, that their performance is likely to be poor over remote sources, as <code>get(String,Pattern)</code> moves data one item at the time. For <code>getNode(Path)</code> the problem is marginal, but for stream-based methods the impact on performance is likely to be substantial. The default implementations should thus be considered as ''surrogates'' for real implementations, and the plugin should override them if and when a more direct mappings on the capabilities of the bound sources exists.
 +
 
 +
 
 +
When the reader ''does'' implement the methods above natively, the following issues arise:
 +
 
 +
* ''applying patterns''
 +
:in some cases, the reader may be able to transform patterns in terms of the querying/filtering capabilities of the bound source. Often, it may be able to do so only partially, i.e. by extracting from the patterns the subset of constraints that it can transform. In this case, the reader would push this subset towards the source, transform the results into trees, and then prune the trees with the original pattern, so as to post-filter the data along the constraints that it could not transform.
 +
:If the bound source offers no querying/filtering capabilities, then the reader must apply the pattern only locally on the unfiltered results returned by the source. Note that the performance of <code>get(Pattern)</code> in this scenario can be severely compromised if the bound source is remote, as the reader would effectively transfer its entire contents over the network at each invocation of the method. The reader may then opt for not implementing this method at all, or for rejecting requests that use particularly ‘inclusive’ patterns (e.g. <code>Patterns.tree()</code> or <code>Patterns.any()</code>, which do not constraint trees at all).
 +
 
 +
* ''transforming data into trees''
 +
:the reader is free to follow the approach and choose the technologies that seem most relevant to the purpose, in that the framework neither limits nor supports any particular choice. It is a good design practice to push the transformations outside the reader, particularly when the plugin supports multiple [[#Design Plan|tree types]], but also to simplify unit testing. The transformation may even be pushed outside the whole plugin and put in a separate library that may be reused in different contexts. For example, it may be reused in another plugin that binds sources through a different protocol but under the same data model. If the transformation works both ways (e.g. because the plugin supports write requests), it may also be reused at the client-side, to revert from tree types to the original data models.
 +
 
 +
* ''streaming data''
 +
:a reader that implements <code>get(Pattern)</code>, or that overrides the surrogate implementations of stream-based methods inherited by <code>AbstractReader</code>, must implement the <code>Stream</code> interface over the bulk transfer mechanisms offered by the bound source. In the common case in which the source uses a paging mechanism, the plugin can provide a 'look-ahead' <code>Stream</code> implementation that localise a new page of data in <code>hasNext()</code> whenever <code>next()</code> has fully traversed the page previously localised. Other transfer mechanisms may require more custom solutions.
 +
 
 +
 
 +
In summary, the plugin can deliver a simple implementation of <code>SourceReader</code> by:
 +
 
 +
* implementing <code>get(String,Pattern)</code> and <code>get(Pattern)</code>, and
 +
* inheriting surrogate implementations of all the other methods from <code>AbstractReader</code>.
 +
 +
Alternatively, the plugin may be able to deliver more performant implementation of <code>SourceReader</code> by:
 +
 
 +
* inheriting from <code>AbstractReader</code> and
 +
* overriding one or more surrogate implementations with native ones.
 +
 
 +
Of course, the plugin may be able to deliver native implementations of some methods and not others.
 +
 
 +
=SourceWriter=
  
* the service ignores the particular shape of the request and passes it to <code>bind()</code> as a DOM’s <code>Element</code>. The plugin may inspect the request with the DOM API, any other XML API, or by binding the request to some Java class, e.g. using JAXB;
+
A plugin implements the <code>SourceWriter</code> interface to process [[#Overview|write requests]]. Writers are rarely implemented by plugins that bind to remote sources, which typically offer read-only interfaces. Writers may be implemented instead by plugins that bind to local sources, so as to turn the service endpoint into a storage service for structured data. The [[The Tree Repository|Tree Repository]] is a primary example of this type of plugin.
  
* the plugin may accept a single type of request or many alternative types;
+
The <code>SourceWriter</code> interface defines the following methods to insert new data in the bound source:
  
* the plugin must throw an <code>InvalidRequestException</code> if the request is unrecognised or otherwise invalid, and a generic <code>Exception</code> for any other problem that it may encounter in the execution of <code>bind()</code>;
+
* <code>Tree add(Tree)</code>
 +
:inserts a tree in the bound source and returns the same tree as this has been inserted in the source.  With its signature, the method supports data sources with different insertion models:
 +
:* If the data is annotated at the point of insertion with identifiers, timestamps, versions and similar metadata, the writer can return these annotations back to the client;
 +
:* If instead the data is unmodified at the point of insertion, the writer can return <code>null</code> to the client so as to simulate a true "add-and-forget" model and avoid unnecessary data transfers.
 +
:Fire-and-forget insertions may also be desirable under the first model, when clients have no use for the annotations added to the data at the point of insertion. The plugin may support these clients if it allows them to specify directives in the input tree itself (e.g. special attributes on root nodes). The writer would recognise directives and return <code>null</code> to clients.
 +
:Regardless of the insertion model of the bound source, input trees may be invalid for insertion, e.g. miss required metadata, have metadata that it should not have (e.g. identifiers that should be assigned by the bound source), or be otherwise malformed with respect to insertion requirements. When this happens, the writer must throw an <code>InvalidTreeException</code>.
  
* in most cases, the request will result in the binding of a single data source, providing precise coordinates to identify it (e.g. an endpoint address). In some cases, however, the request may provide less pinpoint information, and the plugin may identify and bind at once many data sources from it. This explains the <code>List</code> type for the return value.
+
* <code>Stream<Tree> add(Stream<Tree>)</code>
 +
:inserts trees in the bound source and returns the outcomes in the same order, where the outcomes are those that <code>add(Tree)</code> would return for each input tree. In particular, the writer must model failures for individual trees as <code>add(Tree)</code> would, inserting them into the stream. It must instead throw a generic <code>Exception</code> if it cannot produce the stream at all.  
  
The actual binding process may vary significantly across plugins. For many, it may be as simple as extracting the endpoint of some remote data access service from the request (and checking its availability). For others, it may require discovering such an endpoint through some registry. Yet for others it may be a complex process comprised of a number of local and remote actions.
 
  
Finally, it should be noted that:
+
The <code>SourceWriter</code> interface also implements the following methods to change data already in the bound source:
  
* the service may not use all the <code>Source</code>s returned by the plugin. In particular, it will discard <code>Source</code>s that the plugin has already bound in previous invocations of <code>bind()</code> (this may occur if two bind requests target overlapping sets of data sources, or because they are identical requests issued from two autonomous clients, or because one request is aimed explicitly to the re-configuration of sources already bound by the other). Whenever possible, the plugin should avoid side-effects or expensive work in <code>bind()</code>, e.g. engage in network interactions. Rather, it should defer expensive work in <code>SourceLifecycle.init()</code>, as the service will make this callback only for <code>Source</code>s that it effectively retains. The minimal amount of work that the plugin must do in <code>bind()</code> is really to identify resources and setting their <code>SourceLifecycle</code>. We discuss <code>SourceLifecycle</code> [[#SourceLifecycle|below]].  
+
* <code>Tree update(Tree)</code>
 +
:updates a given tree in the bound source and returns the same tree as this has been updated in the source. Like with insertions, the signature of the method supports sources with diverse update models:
 +
:* If the bound source models updates in terms of replacement, the input tree may simply encode the new version of the data;
 +
:* if instead the bound source support in-place updates, the input tree may encode no more and no less than the exact changes to be applied to the existing data. The tree API supports in-place updates with the notion of a ''delta tree'', i.e. a special tree that encodes the changes applied to a given tree over time, i.e. contains only the nodes of the tree that have been added, modified, or deleted, marked with a corresponding attribute. The API can also compute the delta tree between a tree and another tree that represents its evolution at a given point in time (cf. <code>Node.delta(</code>)). Clients may thus compute the delta tree for a set of changes and invoke the service with it. The writer may parse delta tree to effect the changes or, more simply, revert to a replacement model of update: retrieve the data to be updated, transform it into a tree, and then use again the tree API to update it with the changes carried in the delta tree (cf. <code>Node.update(Node)</code>).
 +
:under both models, the input tree can carry the directive to delete existing data, rather than modify it.  
 +
:In all cases, the plugin must document the expectations of its writers over the input tree. Note that input tree must allow the writer to identify which data should be updated. If the target data cannot be identified (e.g. it no longer exists in the source), the writer must throw an <code>UnknownTreeException</code>. If the input tree does allow the writer to identify the target data but it does not meet expectations otherwise, then the writer must throw an <code>InvalidTreeException</code>.  
  
* the service sets <code>SourceNotifier</code>s and <code>Environment</code>s on <code>Source</code>s when the <code>SourceBinder</code> returns them from <code>bind()</code>. Accordingly, if the plugin needs to access the file system or notify an event at binding time, it should do so in <code>SourceLifecycle.init()</code> rather than in <code>bind()</code>. This is a corollary of the recommendation made above, i.e. avoid actions with side-effects in <code>bind()</code>.
+
* <code>Stream<Tree> update(Stream<Tree>)</code>
 +
:updates given trees in the bound source and returns the outcomes in the same order, where the outcomes are those that <code>update(Tree)</code> would return for each input tree. In particular, the writer must model failures for individual trees as <code>update(Tree)</code> would, inserting them into the stream. It must instead throw a generic <code>Exception</code> if it cannot produce the stream at all.

Latest revision as of 13:39, 11 July 2012

The Tree Manager service may be called to store or retrieve edge-labelled trees. Either way, the data is not necessarily stored locally to service endpoints, nor it is stored as trees. Instead, the data is most often held in remote data sources, it is managed independently from the service, and it is exposed by other access services in a variety of forms and through different APIs.

The service applies transformations from its own API and tree model to those of the underlying data sources. Transformations are implemented in plugins, libraries developed in autonomy from the service so as to extend its capabilities at runtime. Service and plugins interact through a protocol defined by a set of local interfaces which, collectively, define a framework for plugin development.

The framework is packaged and distributed as a stand-alone library, the tree-manager-framework, and serves as a dependency for both service and plugins.


Tree-manager-framework-overview.png


The library and all its transitive dependencies are available in our Maven repositories. Plugins that are managed with Maven, can resolve them with a single dependency declaration:

<dependency>
  <groupId>org.gcube.data.access</groupId>
  <artifactId>tree-manager-framework</artifactId>
  <version>...</version>
  <scope>compile</scope>
</dependency>

In what follows, we address the plugin developer and describe the framework in detail, illustrating also design options and best practices for plugin development.

Overview

Service and the plugins interact in order to notify each other of the occurrence of certain events:

  • the service observes events that relate to its clients, first and foremost their requests. These events translate in actions which plugins must perform on data sources;
  • plugins may observe events that relate to data sources, first and foremost changes to their state. These events need to be reported to the service.

The framework defines the interfaces through which all the relevant events may be notified.

The most important events are client requests, which can be of one of the following types:

  • bind request
a client asks the service to "connect" to one or more data sources. The client targets a specific plugin and includes in the request all the information that the plugin needs in order to establish the bindings. The service delivers the request to a SourceBinder provided by the plugin, and it expects back one Source instance for each bound source. The plugin configures the Sources with information extracted or derived from the request. Thereafter, the service manages the Sources on behalf of the plugin.


Tree-manager-framework-bind-requests.png

  • read request
a client asks the service to retrieve trees from a data source that has been previously bound to some plugin. The client may not be aware of the plugin, having only discovered that the service can read data from the target source. The service identifies a corresponding Source from the request and then relays the request to a SourceReader associated with the Source, expecting trees back. It is the job of the reader to translate the request for the API of the data source, and to transform the results returned by the source into trees.

Tree-manager-framework-read-requests.png


  • write request
a client asks the service to add or update trees in a data source that has been previously bound to some plugin. The client knows about the plugin and what type of trees it expects. The service identifies a corresponding Source from the request and then relays the request to a SourceWriter associated with the Source. Again, it is the job of the writer to translate the request for the API of the target source, including transforming the input trees into the data structures that the source expects.

Tree-manager-framework-write-requests.png


Besides relaying client requests, the service also notifies plugins of key events in the lifetime of their bindings. It does so by invoking event-specific callbacks of SourceLifecycles associated with Sources. As we shall see, lifetime events include the initialisation, reconfiguration, passivation, resumption, and termination of bindings.


Tree-manager-framework-service-events.png


 These are all the events that the service observes and passes on to plugins. Others events may be observed directly by plugins, including changes in the state of bound sources. These events are predefined SourceEvents, and plugins report them to SourceNotifiers that the service itself associates with Sources. The service also registers its own SourceConsumers with SourceNotifiers so as to receive event notifications.


Tree-manager-framework-plugin-events.png


 All the key components of a plugin are introduced to the service through an implementation of the Plugin interface. From it, the service obtains SourceBinders and, from the binders, bound Sources. From bound Sources, the service obtains SourceLifecycles, SourceReaders, and SourceWriters.

In addition, Plugin implementations exposes descriptive information about plugins which the service publishes in the infrastructure and uses in order to mange the plugins. For increased control over their own lifecycle, plugins may implement the PluginLifecycle interface, which extends Plugin with callbacks invoked by the service when it loads and unloads plugins.

To bootstrap the process of component discovery and find Plugin implementations, the service uses the standard ServiceLoader mechanism. Accordingly, plugins include a file META-INF/services/org.gcube.data.tmf.api.Plugin in their Jar distributions, where the file contains a single line with the qualified name of the Plugin or PluginLifecycle implementation which they provide.


Tree-manager-framework-pliugin-discovery.png


Design Plan

The framework has been designed to support a wide range of plugins. The following questions characterise the design of a plugin and illustrate some key variations across designs:

  • what sources can the plugin bind to?
all plugins bind and access data sources, but their knowledge of the sources may vary. In particular:
a source-specific plugin targets a given data source, typically with a custom API and data model;
a source-generic plugin targets an open-ended class of data sources with the same API and data model, typically a standard.
  • what type of trees does it accepts and/or returns?
all plugins transform data to trees and/or from trees, but the structure and intended semantics of those trees, i.e. their tree type, may vary substantially from plugin to plugin. In particular:
a type-specific plugin transforms a concrete data model into a corresponding tree type defined by the plugin or else through broader consensus. Effectively, the plugin uses the tree model of the service as a general-purpose carrier for the target data model. The tree-type may be s. Source-specific plugins are typically also type-specific.
a type-generic plugin transforms data to and from a model which is as general-purpose as the tree model. The tree type of the plugin may be entirely unconstrained, or it may be constrained at the point of binding to specific data sources. For example, a plugin may target generic XML or RDF repositories.
a multi-type plugin transforms a concrete data model into a number of tree types, based on directives included in bind requests. The plugin may embed the required transformations, or take a more a dynamic approach and define a framework that can be extended by an unbound number of transformers. The plugin may also use a single transformation but assign multiple types to its trees, from more generic types to more specific types.
  • what requests does the plugin support?
all plugins must accept at least one form of bind request, but a plugin may support many so as to cater for different types of bindings, or to support reconfiguration of previous bindings. For example, most plugins:
  • bind a single source per request, but some may bind many source with a single request.
  • support read requests but do not support write requests, typically because the bound sources are static or because they grant write access only to privileged clients. At least in principle, the converse may apply and a plugin may grant only write access to the sources. In general, a plugin may support one of the following access modes: read-only, write-only, or read-write.
  • what functional and QoS limitations does the plugin have?
rarely will the API and tree model of the service prove functionally equivalent to those of bound sources. Even if a plugin restricts its support to a particular access mode, e.g. read-only, it may not be able to support all the requests associated with that mode, or to support them all efficiently. For example, the bound sources:
  • may offer no lookup API because they they do not mandate or regulate the existence of identifiers in the data;
  • may offer no query API or support (the equivalent of) a subset of the filters that clients may specify in query requests;
  • may not allow the plugin to retrieve, add, or update many trees at once;
  • may not support updates at all, may not support partial updates, or may not support deletions.
In some cases, the plugin may be able to compensate for differences, typically at the cost of QoS reduction. For example:
  • if the bound sources do not support lookups, the plugin may be configured in bind requests with queries that simulate them;
  • if the bound sources support only some filters, the plugin may apply additional ones on the data returned by the sources;
  • if the bound sources do not support partial updates, the plugin may first fetch the data first and then update it locally.
In other cases, for example when bound sources do not support deletions, the plugin has not other obvious option but to reject client requests.

Answering the questions above fixes some of the free variables in plugin design and helps to characterise it ahead of implementation. Collectively, the answers define a "profile" for the plugin and the presentation of this profile should have a central role in its documentation.

Implementation Plan

Moving from the design of a plugin to its implementation, our overview shows that the framework expects the following components:

  • a Plugin implementation, which describes the plugin to the service;
  • a SourceBinder implementation, which binds data sources from client requests;
  • a Source implementation, which describes a bound source to the service;
  • a SourceLifecycle implementation, which defines actions triggered at key events in the management of a bound source;
  • a SourceReader implementation and/or a SourceWriter implementation, which provide read and write access over a bound source.

as well as:

  • a classpath resource META-INF/services/org.gcube.data.tmf.api.Plugin, which allows the service to discover the plugin.

Many of the components above give only structure to the plugin and most are straightforward to implement. As we shall see, the framework includes partial implementations of many interfaces that simplify further the development of the plugin, by reducing boilerplate code or providing overridable defaults.

With this support, the complexity of the plugin concentrates on the implementation of SourceReaders and/or SourceWriters, where it varies proportionally to the capabilities of data sources and the sophistication required in transforming their access APIs and data model.

In the rest of this guide we look at each interface defined by the framework in more detail, discussing the specifics of their methods and providing advice on how to implement them.

Plugin, PluginLifecycle, and Environment

The implementation of a plugin begins in Plugin interface. The interface defines the following methods, which the service invokes to gather information about the plugin when it first loads it:

  • String name()
returns the name of the plugin. The service publishes this information so that its clients may find the service endpoints where the plugin is available.
  • String description()
returns a free-form description of the plugin. The service publishes this information so that it can be displayed to users through a range of interactive clients (e.g. monitoring tools);
  • List<Property> properties()
returns triples of (property name, property value, property description) . The service publishes this information so that it can be displayed to users through a range of interactive clients (e.g. monitoring tools). The implementation must return null or an empty list if it has no properties to publish;
  • SourceBinder binder()
returns an implementation of the SourceBinder interface. The service will invoke this method whenever it receives a bind request, as discussed below. Typically, SourceBinder implementations are stateless and implement binder() so as to return always the same instance.
  • List<String> requestSchemas()
returns the schemas of the bind requests that the plugin can process. The service will publish this information to show how clients can use the plugin to bind data sources. There are interactive clients within the system that use the request schema to generate forms for interactive formulation of bind requests. The implementation may return null and decide to document its expectations elsewhere. If it does not return null, it is free to use any schema language of choice, though XML Schemas remains the recommended language. In the common case in which bind requests are bound to Java classes with JAXB, the plugin can generate schemas dynamically using JAXBContext.generateSchema().
  • boolean isAnchored()
returns an indication of whether the plugin is anchored, i.e. stores data locally to service endpoints and does not access remote data source. If the implementation returns true, the service inhibits its internal replication schemes for the plugin. If the plugin accesses remote data sources, the implementation must return false.

As mentioned above, a plugin that needs more control over its own lifetime can implement PluginLifecycle, which extends Plugin with the following methods:

  • void init(Environment)
is invoked by the service when the plugin is first loaded;
  • void stop(Environment)
is invoked by the service when the plugin is unloaded;

For example, the plugin may implement init() to start up a DI container.

Environment is implemented by the service to encapsulate access to the environment in which the plugin is deployed or undeployed. At the time of writing, it serves solely as a sandbox over the location of the file system which the plugin is allowed to write and read from. Accordingly, Environment exposes only the following method:

  • File file(path)
returns a File with a given path relative to the storage location of the plugin. The plugin may then use the File to create new files or read existing files.

SourceBinder

When it needs to relay bind requests to a given plugin, the service access its Pluginimplementation and asks for a SourceBinder, as discussed Plugin above . It then invokes the single method of the binder:

List<Source> bind(Element)

The method returns one or more Sources, each of which represents a successful binding with a data source. The binding process is driven by the DOM element in input, which captures the request received by the service. It's up to the binder to inspect the request with some XML API, including data binding APIs such as JAXB, and act upon it. If the binder does not recognise the request, or else it finds it invalid, then the binder must throw an InvalidRequestException. The binder can throw a generic Exception for any other failure that occurs in the binding process, as the service will deal with it.

As to the binding process itself, note that:

  • the process may vary significantly across plugins. For many, it may be as simple as extracting from the request the addresses of service endpoints that provide access to existing data sources. For others, it may require discovering such addresses through a registry. Yet for others it may be a complex process comprised of a number of local and remote actions.
  • in most cases, the binder will bind a single data source, hence return a single Source. In some cases, however, the binder may bind many data sources from a single request, hence return multiple Sources.

Finally, note that:

  • the service discards all the Sources that the binder has returned from previous invocations of bind(). For this reason, the binder should avoid performing expensive work in bind(), e.g. engage in network interactions. As we discuss below, the plugin should carry out this work in SourceLifecycle.init(), which the service invokes only for the Sources that it retains. The minimal amount of work that the binder should perform in bind() is really to identify data sources and build corresponding Sources.
  • the service configures a number of objects on the Sources returned by the binder. including a SourceNotifier and an Environment. As these objects are not yet available in bind(), the binder must not access the file system or fire events. This is a corollary of the previous reccomendation, i.e. avoid side-effects in bind().

Source

Plugins implement the Source interface to provide the service with information about the data sources bound by the SourceBinder. The services publishes this information on behalf of the plugin, and clients may use it to discover sources available for read and/or write access. Specifically, the interface defines the following methods:

  • String id()
returns an identifier for the bound source.
  • String name()
returns a descriptive name for the bound source.
  • String description()
returns a brief description of the source.
  • List<Property> properties()
returns properties for the bound source as triple (property name, property value, property description), all String-valued. These properties mirror the properties returned by Plugin.properties(), though they relate to a single source rather than the plugin. Implementations must return null or an empty list if they have no properties to publish.
  • List<QName> types()
returns all the tree types produced and/or accepted by the plugin for the bound source.
  • Calendar creationTime()
returns the time in which the bound source was created. Note that this is the creation time of bound source, not the Source instance's. Implementations must return null if the plugin has no means to obtain this information.
  • boolean isUser()
indicates whether the bound source is intended for general access. This is not a security option as such, and it does not imply any form of authorisation or query filtering. Rather, it’s a marker that may be used by certain clients to exclude certain sources from their processes. Most plugins bind always user-level sources, hence return true systematically. If appropriate, the plugin can be designed to take hints for bind requests.


The callbacks above expose static information about bound sources, i.e. the plugin can set them at binding time. Others are instead dynamic, in that the plugin may update them during the lifetime of the source binding. The service publishes these properties along with static properties, but it also allows clients to be notified of their changes. The plugin is responsible for observing and relaying these changes to the service, as we discuss below. The dynamic properties are:

  • Calendar lastUpdate()
returns the time in which the bound source was last updated. Note that this is the last update time of the bound source, not the Source instance's. Implementations must return null if the plugin has no means to obtain this information.
  • Long cardinality()
returns the number of elements in the bound source. Again, implementations must return null if the plugin has no means to obtain this information.


Besides descriptive information, the services obtains from Sources other plugin components which are logically associated with the bound source. The relevant callbacks are:

  • SourceLifecycle lifecycle()
returns the SourceLifecycle associated with the bound source;
  • SourceReader reader()
returns the SourceReader associated with the bound source. Implementations must return null if the plugin does not support read requests. Note that in this case the plugin must support write requests.
  • SourceWriter writer()
returns the :returns the SourceWriter associated with the bound source. Implementations must return null if the plugin does not support write requests. Note that in this case the plugin must support read requests;

Finally, note that:

  • Sources may be passivated to disk by the service, as we discuss in more detail below. For this reason, Source is a Serializable interface, and implementations must honour this interface.
  • The framework provides an AbstractSource class that implements the interface partially. Sources implementations can and should extend it to avoid plenty of boilerplate code (state variables, accessor methods, default values, implementations of equals(), hashcode(), and toString(), shutdown hooks, correct serialisation, etc.). AbstractSource simplifies also the management of dynamic properties, in that it automatically fires a change event whenever the plugin changes the time of last update of Sources.

SourceLifecycle

Plugins implement the SourceLifecycle interface to be called back at key points in the lifetime their source bindings. The interface defines the following callbacks:

  • void init()
the service calls this method on new bindings. As discussed above, a plugin can use this method to carry expensive initialisation processes or produce side-effects. If the plugin needs to engage in remote interactions or has some tasks to schedule, this is the place where it should do it. Failures thrown from this method fail bind requests.
  • void reconfigure(Element)
the service calls this method on existing bindings when clients attempt to rebind the same source. The plugin can use the bind request to reconfigure the existing binding. If the plugin does not support reconfiguration, the implementation must throw an InvalidRequestException. If reconfiguration is possible but fails, the implementation must throw a generic Exception.
  • void stop()
the service calls this method when it is shutting down, or when it is passivating the bindings to storage in order to release some memory. If the plugin schedules some tasks, this is where it should stop them.
  • void resume()
the service calls this method when it restarts after a previous shutdown, or when clients need a binding that was previously passivated by the service. If the plugin schedules some tasks, this is where it should re-start them. If the binding cannot be resumed, the implementation must throw the failure so that the service can handle it.
  • void terminate()
the service calls this method when clients do no longer need access to the bound source. If the plugin has some resources to release, this is where is should do it, typically after invoking stop() to gracefully stop any scheduled tasks that may still be running.


Note that:

  • plugins that need to implement only a subset of the callbacks above can extend LifecycleAdapter and override only the callbacks of interest.
  • like Source, SourceLifecycle is a Serializable interface. The implementation must honour this interface.

SourceEvent, SourceNotifier, and SourceConsumer

During its lifetime, a plugin may have the means to observe events that relate to data sources, typically through subscription or polling mechanisms exposed by the sources. In this case, the plugin may report these events to SourceNotifier that the service configures on Source at binding time.

The framework defines a tagging interface SourceEvent to model events. It also pre-defines two events as constants of the SourceEvent interface:

SourceEvent is a tagging interface for objects that represent events that relate to data sources and that may only be observed by the plugin. In the interface, two such events are pre-defined as constants:

  • SourceEvent.CHANGE
this event occurs when the dynamic properties of a Source change, such as its cardinality or the time of its last update.
  • SourceEvent.REMOVE
this event occurs when a bound source is no longer available. Note that this is different from the event that occurs when clients indicate that access to the source is no longer needed (cf. SourceLifetime.terminate()).

If the plugin observes then events, it can report them to the service by invoking the following method of the SourceNotifier:

void notify(SourceEvent);

Note again that:

  • when Sources extend AbstractSource, changing their time of last update automatically fires SourceEvent.CHANGE events. Unless there are no other reasons to notify events to the service, the plugin may never have to invoke notify() explicitly.
  • as already noted , the service will configure SourceNotifiers on Sources only after these are returned by SourceBinder.bind(). Any attempt to notify events prior to that moment will fail. For this reason, if the plugin needs to change dynamic properties at binding time, then it should do so in SourceLifecycle.init().

SourceNotifier has a second method that can be invoked to subscribe consumers for SourceEvent notifications:

void subscribe(SourceConsumer,SourceEvent...)

This method subscribes a SourceConsumer to one or more SourceEvents. Normally, plugins do not have to invoke it, as the service will subscribe its own SourceConsumers with the SourceNotifiers. In other words, the common flow of events is from the plugin to the service.

However, the plugin is free to make an internal use of the available support for event subscription and notification. In particular, the plugin can define its own SourceEvents and implement and subscribe its own SourceConsumers. In this case, SourceConsumers must implement the single method:

void onEvent(SourceEvent...)

which is invoked by the SourceNotifier with one or more SourceEvents. Normally, the subscriber will receive single event notifications, but the first notification after subscription will carry the history of all the events previously notified by the SourceNotifier.

Auxiliary APIs

All the previous interfaces provide a skeleton around the core functionality of the plugin, which is to transform the API and the tree model of the service to those of the bound sources. The task requires familiarity with three APIs defined outside the framework:

the plugin uses this API to construct and deconstruct the edge-labelled tree that it accepts in write requests and/or returns in read requests. The API offers a hierarchy of classes that model whole trees (Tree) as well as individual nodes (Node), fluent APIs to construct object graphs based on these classes, and various APIs to traverse them;
the plugin uses this API to constructs and deconstruct tree patterns, i.e. sets of constraints that clients use in read requests to characterise the trees of interest, both in terms of topology and leaf values. The API offers a hierarchy of patterns (Pattern), method to fluently construct patterns, as well as methods to match tree against patterns (i.e. verify that the trees satisfy the constraints, cf. Pattern.match(Node)) and to prune trees with patterns (i.e. retain only the nodes that have been explicitly constrained, cf. Pattern.prune(Node)). The plugin must ensure that it returns trees that have been pruned with the patterns provided by clients;
the plugin uses this API to model the data streams that flow in and out of the plugin. Streams are used in read requests and write requests that take or return many data items at once, such as trees, tree identifiers, or even paths to tree nodes. The streams API models such data streams as instances of the Stream interface, a generalisation of the standard Java Iterator interface which reflects the remote nature of the data. Not all plugins need to implement stream-based operations from scratch, as the framework offers synthetic implementations for them. These implementations, however, are derived from those that work with one data item at the time, hence have very poor performance when the data source is remote. Plugins should use them only when native implementations are not an option because the bound sources do not offer any stream-based or paged bulk operation. When they do, the plugin should really feed their transformed outputs into Streams. In a few cases, the plugin may need advanced facilities provided by the streams API, such as fluent idioms to convert, pre-process or post-process data streams.

Documentation on working with trees, tree patterns, and streams is available elsewhere, and we do not replicate it here. The tree API and the pattern API are packaged together in a trees library available in our Maven repositories. The streams API is packaged in a streams library also available in the same repositories. If the plugin also uses Maven for build purposes, these libraries are already available in its classpath as indirect dependencies of the framework.

SourceReader

A plugin implements the SourceReader interface to process read requests. The interface defines the following methods for "tree lookup":

  • Tree get(String,Pattern)
returns a tree with a given identifier and pruned with a given Pattern. The reader must throw an UnknownTreeException if the identifier does not identify a tree in the source, and an InvalidTreeException if a tree can be identified but does not match the pattern. The reader should report any other failure to the service, i.e. rethrow it.
  • Stream<Tree> get(Stream<String>,Pattern)
returns trees with given identifiers and pruned with a given Pattern. The reader must throw a generic Exception if it cannot produce the stream at all. It must otherwise handle lookup failures for individual trees as get(String,Pattern) does, inserting them into the stream.


In addition, a SourceReader implements the following "query" method:

  • Stream<Tree> get(Pattern)
returns trees pruned with a given Pattern. Again, the reader must throw a generic Exception if it cannot produce the stream at all, though it must simply not add to it whenever trees do not match the pattern.


Finally, a SourceReader implements lookup methods for individual tree nodes:

  • Node getNode(Path )
returns a node from the Path of node identifiers that connect it to the root of a tree. The reader must throw an UnknownPathException if the path does not identify a tree node.
  • Stream<Node> getNodes(Stream<Path>)
returns nodes from the Paths of node identifiers that connect them to the root of trees. The reader must throw a generic Exception if it cannot produce the stream at all. It must otherwise handle lookup failures for individual paths as getNode(Path) does, inserting them into the stream.

Depending on the capabilities of the bound source, implementing some of the methods above may prove challenging or altogether impossible. For example, if the source offers only lookup capabilities, the reader may not be able to implement query methods. In this sense, notice that the reader is not forced to fully implement any of the methods above. In particular, it can:

  • throw a UnsupportedOperationException for all requests to a given method, or:
  • throw a UnsupportedRequestException for certain requests of a given method.

When this is the case, the plugin should clearly report its limitations in its documentation.


Similarly, the plugin is not forced to implement all methods from scratch. The framework defines a partial implementation of SourceReader, AbstractReader, which the plugin can derive to obtain default implementations of certain methods, including:

  • a default implementation of get(Stream<String>,Pattern);
  • a default implementation of getNode(Path);
  • a default implementation of getNodes(Stream<Path).

These defaults are derived from the implementation of get(String,Pattern) provided by the plugin. Note, however, that their performance is likely to be poor over remote sources, as get(String,Pattern) moves data one item at the time. For getNode(Path) the problem is marginal, but for stream-based methods the impact on performance is likely to be substantial. The default implementations should thus be considered as surrogates for real implementations, and the plugin should override them if and when a more direct mappings on the capabilities of the bound sources exists.


When the reader does implement the methods above natively, the following issues arise:

  • applying patterns
in some cases, the reader may be able to transform patterns in terms of the querying/filtering capabilities of the bound source. Often, it may be able to do so only partially, i.e. by extracting from the patterns the subset of constraints that it can transform. In this case, the reader would push this subset towards the source, transform the results into trees, and then prune the trees with the original pattern, so as to post-filter the data along the constraints that it could not transform.
If the bound source offers no querying/filtering capabilities, then the reader must apply the pattern only locally on the unfiltered results returned by the source. Note that the performance of get(Pattern) in this scenario can be severely compromised if the bound source is remote, as the reader would effectively transfer its entire contents over the network at each invocation of the method. The reader may then opt for not implementing this method at all, or for rejecting requests that use particularly ‘inclusive’ patterns (e.g. Patterns.tree() or Patterns.any(), which do not constraint trees at all).
  • transforming data into trees
the reader is free to follow the approach and choose the technologies that seem most relevant to the purpose, in that the framework neither limits nor supports any particular choice. It is a good design practice to push the transformations outside the reader, particularly when the plugin supports multiple tree types, but also to simplify unit testing. The transformation may even be pushed outside the whole plugin and put in a separate library that may be reused in different contexts. For example, it may be reused in another plugin that binds sources through a different protocol but under the same data model. If the transformation works both ways (e.g. because the plugin supports write requests), it may also be reused at the client-side, to revert from tree types to the original data models.
  • streaming data
a reader that implements get(Pattern), or that overrides the surrogate implementations of stream-based methods inherited by AbstractReader, must implement the Stream interface over the bulk transfer mechanisms offered by the bound source. In the common case in which the source uses a paging mechanism, the plugin can provide a 'look-ahead' Stream implementation that localise a new page of data in hasNext() whenever next() has fully traversed the page previously localised. Other transfer mechanisms may require more custom solutions.


In summary, the plugin can deliver a simple implementation of SourceReader by:

  • implementing get(String,Pattern) and get(Pattern), and
  • inheriting surrogate implementations of all the other methods from AbstractReader.

Alternatively, the plugin may be able to deliver more performant implementation of SourceReader by:

  • inheriting from AbstractReader and
  • overriding one or more surrogate implementations with native ones.

Of course, the plugin may be able to deliver native implementations of some methods and not others.

SourceWriter

A plugin implements the SourceWriter interface to process write requests. Writers are rarely implemented by plugins that bind to remote sources, which typically offer read-only interfaces. Writers may be implemented instead by plugins that bind to local sources, so as to turn the service endpoint into a storage service for structured data. The Tree Repository is a primary example of this type of plugin.

The SourceWriter interface defines the following methods to insert new data in the bound source:

  • Tree add(Tree)
inserts a tree in the bound source and returns the same tree as this has been inserted in the source. With its signature, the method supports data sources with different insertion models:
  • If the data is annotated at the point of insertion with identifiers, timestamps, versions and similar metadata, the writer can return these annotations back to the client;
  • If instead the data is unmodified at the point of insertion, the writer can return null to the client so as to simulate a true "add-and-forget" model and avoid unnecessary data transfers.
Fire-and-forget insertions may also be desirable under the first model, when clients have no use for the annotations added to the data at the point of insertion. The plugin may support these clients if it allows them to specify directives in the input tree itself (e.g. special attributes on root nodes). The writer would recognise directives and return null to clients.
Regardless of the insertion model of the bound source, input trees may be invalid for insertion, e.g. miss required metadata, have metadata that it should not have (e.g. identifiers that should be assigned by the bound source), or be otherwise malformed with respect to insertion requirements. When this happens, the writer must throw an InvalidTreeException.
  • Stream<Tree> add(Stream<Tree>)
inserts trees in the bound source and returns the outcomes in the same order, where the outcomes are those that add(Tree) would return for each input tree. In particular, the writer must model failures for individual trees as add(Tree) would, inserting them into the stream. It must instead throw a generic Exception if it cannot produce the stream at all.


The SourceWriter interface also implements the following methods to change data already in the bound source:

  • Tree update(Tree)
updates a given tree in the bound source and returns the same tree as this has been updated in the source. Like with insertions, the signature of the method supports sources with diverse update models:
  • If the bound source models updates in terms of replacement, the input tree may simply encode the new version of the data;
  • if instead the bound source support in-place updates, the input tree may encode no more and no less than the exact changes to be applied to the existing data. The tree API supports in-place updates with the notion of a delta tree, i.e. a special tree that encodes the changes applied to a given tree over time, i.e. contains only the nodes of the tree that have been added, modified, or deleted, marked with a corresponding attribute. The API can also compute the delta tree between a tree and another tree that represents its evolution at a given point in time (cf. Node.delta()). Clients may thus compute the delta tree for a set of changes and invoke the service with it. The writer may parse delta tree to effect the changes or, more simply, revert to a replacement model of update: retrieve the data to be updated, transform it into a tree, and then use again the tree API to update it with the changes carried in the delta tree (cf. Node.update(Node)).
under both models, the input tree can carry the directive to delete existing data, rather than modify it.
In all cases, the plugin must document the expectations of its writers over the input tree. Note that input tree must allow the writer to identify which data should be updated. If the target data cannot be identified (e.g. it no longer exists in the source), the writer must throw an UnknownTreeException. If the input tree does allow the writer to identify the target data but it does not meet expectations otherwise, then the writer must throw an InvalidTreeException.
  • Stream<Tree> update(Stream<Tree>)
updates given trees in the bound source and returns the outcomes in the same order, where the outcomes are those that update(Tree) would return for each input tree. In particular, the writer must model failures for individual trees as update(Tree) would, inserting them into the stream. It must instead throw a generic Exception if it cannot produce the stream at all.