Revision as of 10:02, 29 June 2012

The Tree Manager service may be called to store or retrieve edge-labelled trees. Either way, the data is not necessarily stored locally to service endpoints, nor it is stored as trees. Instead, the data is most often held in remote data sources, it is managed independently from the service, and it is exposed by other access services in a variety of forms and through different APIs.

The service applies transformations from its own API and tree model to those of the underlying data sources. Transformations are implemented in plugins, libraries developed in autonomy from the service so as to extend its capabilities at runtime. Service and plugins interact through a protocol defined by a set of local interfaces which, collectively, define a framework for plugin development.

The framework is packaged and distributed as a stand-alone library, the tree-manager-framework, and serves as a dependency for both service and plugins.

The library and all its transitive dependencies are available in our Maven repositories. Plugins that are managed with Maven, can resolve them with a single dependency declaration:

<dependency>
  <groupId>org.gcube.data.access</groupId>
  <artifactId>tree-manager-framework</artifactId>
  <version>...</version>
  <scope>compile</scope>
</dependency>

In what follows, we address the plugin developer and describe the framework in detail, illustrating also design options and best practices for plugin development.

Overview

Service and the plugins interact in order to notify each other of the occurrence of certain events:

the service observes events that relate to its clients, first and foremost their requests. These events translate in actions which plugins must perform on data sources;

plugins may observe events that relate to data sources, first and foremost changes to their state. These events need to be reported to the service.

The framework defines the interfaces through which all the relevant events may be notified.

The most important events are client requests, which can be of one of the following types:

bind request

a client asks the service to "connect" to one or more data sources. The client targets a specific plugin and includes in the request all the information that the plugin needs in order to establish the bindings. The service delivers the request to a SourceBinder provided by the plugin, and it expects back one Source instance for each bound source. The plugin configures the Sources with information extracted or derived from the request. Thereafter, the service manages the Sources on behalf of the plugin.

read request

a client asks the service to retrieve trees from a data source that has been previously bound to some plugin. The client may not be aware of the plugin, having only discovered that the service can read data from the target source. The service identifies a corresponding Source from the request and then relays the request to a SourceReader associated with the Source, expecting trees back. It is the job of the reader to translate the request for the API of the data source, and to transform the results returned by the source into trees.

write request

a client asks the service to add or update trees in a data source that has been previously bound to some plugin. The client knows about the plugin and what type of trees it expects. The service identifies a corresponding Source from the request and then relays the request to a SourceWriter associated with the Source. Again, it is the job of the writer to translate the request for the API of the target source, including transforming the input trees into the data structures that the source expects.

Besides relaying client requests, the service also notifies plugins of key events in the lifetime of their bindings. It does so by invoking event-specific callbacks of SourceLifecycles associated with Sources. As we shall see, lifetime events include the initialisation, reconfiguration, passivation, resumption, and termination of bindings.

These are all the events that the service observes and passes on to plugins. Others events may be observed directly by plugins, including changes in the state of bound sources. These events are predefined SourceEvents, and plugins report them to SourceNotifiers that the service itself associates with Sources. The service also registers its own SourceConsumers with SourceNotifiers so as to receive event notifications.

All the key components of a plugin are introduced to the service through an implementation of the Plugin interface. From it, the service obtains SourceBinders and, from the binders, bound Sources. From bound Sources, the service obtains SourceLifecycles, SourceReaders, and SourceWriters.

In addition, Plugin implementations exposes descriptive information about plugins which the service publishes in the infrastructure and uses in order to mange the plugins. For increased control over their own lifecycle, plugins may implement the PluginLifecycle interface, which extends Plugin with callbacks invoked by the service when it loads and unloads plugins.

To bootstrap the process of component discovery and find Plugin implementations, the service uses the standard ServiceLoader mechanism. Accordingly, plugins include a file META-INF/services/org.gcube.data.tmf.api.Plugin in their Jar distributions, where the file contains a single line with the qualified name of the Plugin or PluginLifecycle implementation which they provide.

This completes our quick overview of the main interfaces and classes provided by the framework. Note that, besides implementing the interfaces that define the interaction protocol with the the service, plugins are free to design and develop against the framework using any technology that seems appropriate to the task.

In the rest of this guide we look at each component of the framework in more detail.

Key Design Issues

The framework has been designed to support a wide range of plugins. The following questions characterise the design of a plugin and illustrate the possible variations across plugins:

what sources can it bind to?

source-specific plugin: a plugin may be be dedicated to specific data sources;

source-generic plugin: a plugin may target open-ended classes of data sources that publish data through standard APIs and in standard models.

what kind of trees does it accepts and/or returns?

all plugins transform to trees and/or from trees, but the structure and intended semantics of those trees may vary substantially across plugins:

type-generic plugin: a plugin may be fully general, i.e. transform a data model which is as general-purpose as the tree model. A plugin for data sources that expose arbitrary instances of XML data or RDF data through some standard API, for example, fall within this category. In this case, the meaning and shape of the trees handled by the plugin may be unconstrained in principle, or else it may be constrained only at the point of binding to specific data sources, after bind requests.

type-specific plugin: a plugin may be extremely specific and transform a concrete data model into trees with statically known structures. In this case, the tree model of the service is used as a general-purpose carrier for the original data model. The plugin documentation will include the definition of its tree type, anything ranging from informal narrative to formal XML Schema definitions. This definition may be introduced by the plugin or else reflect wider consensus.

multi-type plugin: most plugins will always work with a single tree type, but this is not a constraint imposed by the framework. A plugin may support transformations into a number of tree types and allow binding clients to indicate in their bind requests the type they desire sources to be bound with. The plugin may embed the transformations , or else take a more a dynamic approach and define a framework for transformers which are discoverable on the classpath. Furthermore, the plugin may support a single transformation but produce trees that can be assigned multiple types, from more generic to more specific types, e.g. a generic RDF type as well as a more specific type associated with some RDF schema.

what requests does it support?

all plugins must accept at least one form of bind request, but a plugin may support many so as to cater for different types of bindings, or to support reconfiguration of previous bindings. For example:

most plugins bind a single source per bind request, but some may bind many at once for some requests;
most plugins support read requests but do not support write requests, typically because the bound sources are static, or grant write access only to privileged clients. In principle at least, the converse may apply and a plugin may grant only write access to the sources. Overall, a plugin may support one of the following access modes: read-only, write-only, or read-write.

what functional and QoS limitations does it have?

rarely will the API and tree model of the service prove functionally equivalent to those of bound sources. Even if a plugin restricts its support to a particular access mode, e.g. read-only, it may not be able to support all the requests associated with that mode, or to support them all efficiently. For example, its bound sources:

may offer no lookup API because they they do not mandate or regulate the existence of identifiers in the data;
may offer no query API, or else support (the equivalent of) a subset of the filters that clients may specify in query requests;
may not allow the plugin to retrieve, add, or update many trees at once.;
may not support updates at all, or may not support partial updates, or may not support deletions.

In some cases, the plugin may be able to compensate for differences, typically at the cost of reduction in QoS. For example, the plugin:

it may be configured at binding time with queries that model lookups, differently for different bound sources;
it may partially transform filters and then do local matches on the results returned by sources;
if the bound sources do not support partial updates, it may fetch the data first and then apply them locally,

In other cases, for example when bound sources do not support deletions, the plugin has not other obvious option but to reject the client requests.

Answering the questions above fixes some of the free variables in plugin design and helps to characterise it ahead of implementation. Collectively, the answers define a "profile" for the plugin and the presentation of this profile should have a central role in its documentation.

Plugin, PluginLifecycle, and Environment

A plugin implements the following methods of the Plugin interface:

String name(): returns the name of the plugin. The service will publish it and its clients may use it to discover instances of the service which have been extended with the plugin.

String description(): returns a brief description of the plugin. The service will publish it so that it can be inspected and displayed by a range of clients;

List<Property> properties(): returns triples (name, value, description), all String-valued. The service will publish them and its clients may use them to identify instances of the service which have been extended with the plugin. The plugin decides what properties may be useful to clients for discovery, inspection, or display. For example, if the plugin is multi-type, it will probably list the types that it supports here. The implementation returns null or an empty list if it has no properties to publish;

SourceBinder binder(); returns the plugin's implementation of the SourceBinder interface. The service will relay bind requests to it.

List<String> requestSchemas(): the schemas of the bind requests that the plugin can process. These will be published by the service to instruct binding clients to formulate their bind requests. There are GUIs within the system that use the schema to generate forms for interactive formulation of bind requests. The implementation may return null and decide to document its expectations elsewhere. If it does not return null, it is free to use any schema language of choice, though the existing GUIs expect XML Schemas which is thus the recommended language. Note that, in the common case in which the plugin models requests with Java classes and use JAXB as standard data binding solution, it can easily generate schemas directly in the implementation of the method, using JAXB.generateSchema().

boolean isAnchored(): returns an indication of whether the plugin is anchored, i.e. stores data locally to the service. If true, the service will inhibit its internal replication schemes. In the common case in which the plugin targets remote data sources, the implementation will simply return false.

As mentioned above, a plugin that needs more control over its own lifetime can implement PluginLifecycle, which extends Plugin with the following callback methods:

void init(Environment) is invoked when the plugin is first loaded;

void stop(Environment) is invoked when the plugin is unloaded;

For example, the plugin may implement init() to to start up a DI container of the likes of Spring, Guice, of CDI.

Environment is implemented by the service to encapsulate access to the environment in which the plugin is deployed or undeployed. At the time of writing, it serves solely as a sandbox over the location of the file system which may be accessed by the plugin. Accordingly, it exposes only the following method:

File file(path), which returns a file with a given path relative to the storage location.

SourceBinder

Whenever clients request bindings of data sources, the service consults the Plugin implementation discussed above and obtains a SourceBinder. It then invokes its single method:

List<Source> bind(Element)

The SourceBinder attempts the binding on the basis of the information found in the client request, and it returns a list of corresponding Sources. Note that:

the service ignores the particular shape of the request and passes it to bind() as a DOM’s Element. The plugin may inspect the request with the DOM API, any other XML API, or by binding the request to some Java class, e.g. using JAXB;

the plugin may accept a single type of request or many alternative types;

the plugin must throw an InvalidRequestException if the request is unrecognised or otherwise invalid, and a generic Exception for any other problem that it may encounter in the execution of bind();

in most cases, the request will result in the binding of a single data source, providing precise coordinates to identify it (e.g. an endpoint address). In some cases, however, the request may provide less pinpoint information, and the plugin may identify and bind at once many data sources from it. This explains the List type for the return value.

The actual binding process may vary significantly across plugins. For many, it may be as simple as extracting the endpoint of some remote data access service from the request (and checking its availability). For others, it may require discovering such an endpoint through some registry. Yet for others it may be a complex process comprised of a number of local and remote actions.

Finally, it should be noted that:

the service may not use all the Sources returned by the plugin. In particular, it will discard Sources that the plugin has already bound in previous invocations of bind() (this may occur if two bind requests target overlapping sets of data sources, or because they are identical requests issued from two autonomous clients, or because one request is aimed explicitly to the re-configuration of sources already bound by the other). Whenever possible, the plugin should avoid side-effects or expensive work in bind(), e.g. engage in network interactions. Rather, it should defer expensive work in SourceLifecycle.init(), as the service will make this callback only for Sources that it effectively retains. The minimal amount of work that the plugin must do in bind() is really to identify resources and setting their SourceLifecycle. We discuss SourceLifecycle below.

the service sets SourceNotifiers and Environments on Sources when the SourceBinder returns them from bind(). Accordingly, if the plugin needs to access the file system or notify an event at binding time, it should do so in SourceLifecycle.init() rather than in bind(). This is a corollary of the recommendation made above, i.e. avoid actions with side-effects in bind().

Source

If Plugin implementations provide the service with information about the plugin, Sources provide it with information about the data sources that become bound to the plugin. They do so by implementing the following methods:

String id(): returns the identifier of the source, which the service uses to tell sources apart;

String name(): returns the descriptive name of the source, which the service publishes and clients may use it to discover the source;

String description(): returns a brief description of the source, which the service publishes for reporting purposes;

List<Property> properties(): returns arbitrary properties of the bound source as triples (name, value, description), all String-valued. The service publishes them and its clients may use them to discover the source. Implementations must return null or an empty list if they have no properties to publish;

List<QName> types(): returns all the tree types produced and/or accepted by the bound source, as discussed above. These are qualified names that characterises the edge labels and leaf values that the SourceReader produce and that the SourceWriter consumes. The service publishes the types and its clients may use them to discover sources that produce or consume data with expected properties;

Calendar creationTime(): returns the time in which the source was created (the source, not the Source object). The service publishes this information, but implementations can return null if the source does not expose this information;

boolean isUser(): indicates whether the source ought be to marked as a user-level source or a system-level source. This is not a security option as such, and it does not imply any form of authorisation or query filtering. It’s rather a marker that may be used by certain clients to exclude system sources from their processes. In the vast majority of cases, plugins will bind user-level sources. If appropriate, the may be configured by binding clients to bind system-level sources;

The Source properties exposed through methods above are static in nature, in that the plugin sets them at source binding time. Others are instead dynamic, in that the plugin may update them during the lifetime of the binding:

Calendar lastUpdate(): returns the time in which the source was last updated (the source, not the Source object). Implementations can return null if the source does not expose this information;

Long cardinality(): returns the number of elements in the source. Implementations can return null if the data source does not expose this information and implementations cannot derive it;

The service publishes dynamic properties along with static properties, but it also associates them with topics for notification. Clients can subscribe for changes to the source and be notified when these changes occur. The plugin is responsible for changing these properties and for firing the corresponding event to the service, which then takes over and does the rest. We discuss how plugin can fire events can be fired later on.

Besides descriptive information, Sources must provide the service with other components that are logically associated with it:

SourceLifecycle lifecycle(): returns the lifecycle of the source. The service invokes its methods to notify the occurrence of certain events in the source’s lifetime;

SourceReader reader(): returns the SourceReader of the source. The service invokes its methods to relay read requests to the plugin. Implementations can return null if the plugin does not support read requests. Note that in this case the plugin must support write requests;

SourceWriter writer(): returns the SourceWriter of the source. The service invokes its methods to relay write requests to the plugin. Implementations can return null if the plugin does not support write requests. Note that in this case the plugin must support read requests;

If the plugin extends the default implementations of SourceLifecycle, SourceReader, or SourceWriter, the methods above can be overridden to restrict their output to more specific classes. This avoids casts in components that access the implementations through Sources, e.g.:

@Override
public MyReader reader() {
  return (MyReader) super.reader();
}

Next, Sources allow the service to set and then access its implementations of Environment and SourceNotifier:

Environment environment();
void setEnvironment(Environment);
SourceNotifier notifier();
void setNotifier(SourceNotifier);

We have discussed above how plugins can use the Environment to access the deployment context of the plugin. We discuss later how they can use the SourceNotifier to notify the service of events that relate to the source.

Note also that Sources may be passivated to disk by the service, as we discuss in more detail below. Source is indeed a Serializable interface, and the final requirement is that implementations honour that interface.

The framework provides an AbstractSource class that implements the interface partially. Sources implementations can and should extend it to avoid plenty of boilerplate code (state variables, accessor methods, default values, implementations of equals(), hashcode(), and toString(), shutdown hooks, correct serialisation, etc.). AbstractSource simplifies also the management of dynamic properties, in that it automatically fires a change event whenever the plugin changes the time of last update of Sources.

At its simplest, a Source implementation may take the following form:

public class MySource extends AbstractSource {
 
	private static final long serialVersionUID = 1L;
 
	//your additional fields, if any
 
	public MySource(String id) {
		super(id);
	}
 
	@Override
	public List<QName> types() {
		//here factored-out in constants because fixed
		return Collections.singletonList(MyConstants.TYPE);
	}
	@Override
	public List<Property> properties() {
		//here factored-out in constants because fixed
		return MyConstants.PROPERTIES;
	}
 
	//your additional methods, if any
}

SourceLifecycle

The SourceLifecycle interface define the following callbacks:

void init()<?code>: called by the service during bind requests to initialise the <code>Sources previously bound by the SourceBinder. As discussed above, this is place to perform actions that are expensive or generate side-effects. If the plugin needs to perform remote interactions or have some tasks to schedule, this is where it should do. It should also report any failure it encounters, so that the service can relay it to the binding client as the outcome of the request;

void reconfigure(Element): called by the service during bind requests to reconfigure Sources previously bound by the plugin. As discussed above, this occurs when the SourceBinder returns a Source that it had already produced in previous bind requests. In this case, the service will use the old Source to relay the request and simply discard the one returned last by the SourceBinder. If the plugin does not support reconfiguration, it must throw an InvalidRequestException. If instead reconfiguration fails, the plugin must instead throw a generic Exception;

void stop(): called by the service if it is shutting down, or if it is passivating Sources to storage to release some memory. If the plugin has scheduled tasks for the management of the Sources, this is a good time to gracefully stop them;

void resume(): called by the service when Sources are revived from storage, either because the service has been restarted after a shutdown, or because the Sources had been passivated to release memory resources but are now needed by service clients. If the plugin has scheduled tasks for the management of the Sources, this is a good time to re-start them. If the attempt fails, the plugin should throw the failure so that the service can relay it clients;

void terminate(): called by the service to signal that its clients no longer need to access Sources. If the plugin has some resources to release, this is the time to do it, typically after invoking stop() to gracefully stop any scheduled tasks that may be running.

Plugins that need to implement only a subset of the callbacks above can extend LifecycleAdapter and override only the callbacks of interest. Note also that, like Source, SourceLifecycle is a Serializable interface. The implementation must honour that interface.

Finally, note that all the callbacks assume that SourceLifecycles have access to the associated Sources. Typically, implementations adopt the following pattern:

 public class MyLifeCycle extends LifecycleAdapter {
 
	private static final long serialVersionUID = 1L;
 
	private final MySource source;
 
	//additional fields, if any...
 
	public MyLifeCycle(MySource source) {
		this.source = source;
	}
 
	//callbacks and additional methods, if any...
}

SourceEvent, SourceNotifier, and SourceConsumer

SourceEvent is a tagging interface for objects that represent events that relate to data sources and that may only be observed by the plugin. In the interface, two such events are pre-defined as constants:

SourceEvent.CHANGE: this event occurs in correspondence with a change to the dynamic properties of a Source, such as its cardinality or the time of its last update;

SourceEvent.REMOVE: this event occurs when a data source is no longer available. Note that this is different from the event that occurs when clients indicate that access to the source is no longer needed (cf. SourceLifetime.terminate());

The plugin may have the means to observe these events, e.g. because the data source offers subscription mechanisms, or because it exposes its cardinality and the plugin polls it, or even because the plugin offers write-access to the source and thus observes directly when the source and its cardinality change.

In all these cases, the plugin should report events to the SourceNotifier that the service has set on the Sources, invoking its method:

void notify(SourceEvent);

Note again that, when Sources extend AbstractSource, changing their time of last update automatically fires SourceEvent.CHANGE events. Unless there are no other reasons to notify events to the service, the plugin may never have to invoke notify() explicitly.

Note also that, as we have already noted above, the service will inject a SourceNotifier in the plugin's Sources only after these are returned to it by SourceBinder.bind(). Any attempt to notify events prior to that moment will fail. For this reason, if the plugin needs to change dynamic properties at binding time, then it should do so in SourceLifecycle.init().

SourceNotifier has a second method that can be invoked to subscribe consumers for SourceEvent notifications:

void subscribe(SourceConsumer,SourceEvent...)

This method subscribes a SourceConsumer to one or more SourceEvents. Normally, plugins will not have to invoke it, as the service will subscribe its own SourceConsumers with the SourceNotifiers.

However, the plugin is free to use the available support for event notification within its own codebase. In this case, the plugin can define its own SourceEvents and implement and subscribe its own SourceConsumers. In this case, SourceConsumers must implement the single method:

void onEvent(SourceEvent...)

which is invoked by the SourceNotifier with one or more SourceEvents. Normally, the subscriber will receive single event notifications, but the first notification after subscription will carry the history of all the events previously notified by the SourceNotifier.

Auxiliary APIs

All the previous interfaces provide a skeleton around the core functionality of the plugin, which is to transform the API and the tree model of the service to those of the bound sources. The task requires familiarity with three APIs defined outside the framework:

the tree API, with which the plugin constructs and deconstructs the edge-labelled tree that it accepts in write requests and/or returns in read requests.

The API offers a hierarchy of classes that model whole trees (Tree) as well as individual nodes (Node), fluent APIs to construct object graphs based on these classes, and various APIs to traverse them;

the pattern API, with which the plugin constructs and deconstructs tree patterns, i.e. sets of constraints that clients use in read requests to characterise the trees of interest, both in terms of topology and leaf values.

The API offers a hierarchy of patterns (Pattern), method to fluently construct patterns, as well as methods to match tree against patterns (i.e. verify that the trees satisfy the constraints, cf. Pattern.match(Node)) and to prune trees with patterns (i.e. retain only the nodes that have been explicitly constrained, cf. Pattern.prune(Node)). The plugin must ensure that it returns trees that have been pruned with the patterns provided by clients;

the stream API, with which the plugin models the data streams that flow in and out of the plugin. Streams are used in read requests and write requests that take or return many data items at once, such as trees, tree identifiers, or even paths to tree nodes.

The streams API models such data streams as instances of the Stream interface, a generalisation of the standard Java Iterator interface which reflects the remote nature of the data. Not all plugins need to implement stream-based operations from scratch, as the framework offers synthetic implementations for them. These implementations, however, are derived from those that work with one data item at the time, hence have very poor performance when the data source is remote. Plugins should use them only when native implementations are not an option because the bound sources do not offer any stream-based or paged bulk operation. When they do, the plugin should really feed their transformed outputs into Streams. In a few cases, the plugin may need advanced facilities provided by the streams API, such as fluent idioms to convert, pre-process or post-process data streams.

Documentation on working with trees, tree patterns, and streams is available elsewhere, and we do not replicate it here. The tree API and the pattern API are packaged together in a trees library available in our Maven repositories. The streams API is packaged in a streams library also available in the same repositories. If the plugin also uses Maven for build purposes, these libraries are already available in its classpath as indirect dependencies of the framework.

SourceReader

A plugin implements the SourceReader interface to provide a tree view of the data in a bound source.

A SourceReader implements the following "tree lookup" methods:

Tree get(String,Pattern): returns a tree with a given identifier and pruned with a given Pattern.

The reader must throw an UnknownTreeException if the identifier does not identify a tree in the source, and an InvalidTreeException if a tree can be identified but does not match the pattern;

Stream<Tree> get(Stream<String>,Pattern): returns trees with given identifiers and pruned with a given Pattern.

The reader must throw a generic Exception if it cannot produce the stream at all, though it must simply not add to it whenever trees cannot be identified or do not match the pattern.

In addition, a SourceReader implements the following "query" method:

Stream<Tree> get(Pattern): returns trees pruned with a given Pattern.

Again, the reader must throw a generic Exception if it cannot produce the stream at all, though it must simply not add to it whenever trees do not match the pattern.

Finally, a a SourceReader implements lookup and query methods for individual tree nodes:

Node getNode(Path ): returns a node from the Path of node identifiers that connect it to the root of a tree.

The reader must throw an UnknownPathException if the path does not identify a tree node;

Stream<Node> getNodes(Stream<Path>): returns nodes from the Paths of node identifiers that connect them to the root of trees.

The reader must throw a generic Exception if it cannot produce the stream at all, though it must simply not add to it whenever nodes cannot be identified from the paths.

Depending on the capabilities of the bound source, implementing some of the methods above may prove challenging or altogether impossible. For example, if the source offers only lookup capabilities, the reader may not be able to implement query methods. In this sense, notice that the reader is not forced to fully implement any of the methods above. In particular, it can:

throw a UnsupportedOperationException for all requests to a given method, or:
throw a UnsupportedRequestException for certain requests of a given method.

When this is the case, the plugin should clearly report its limitations in its documentation.

Similarly, the plugin is not forced to implement all methods from scratch. The framework defines a partial implementation of SourceReader, AbstractReader, which the plugin can derive to obtain default implementations of certain methods, precisely:

a default implementation of get(Stream<String>,Pattern);
a default implementation of getNode(Path);
a default implementation of getNodes(Stream<Path).

These defaults are derived from the implementation of get(String,Pattern) provided by the plugin. Note, however, that their performance is likely to be sub-optimal over remote bound sources, as get(String,Pattern) moves data one item at the time. For getNode(Path) the problem is marginal, but for the stream-based methods the impact is likely to be substantial. In this sense, the default implementations should be considered as surrogates for real implementations, and the plugin should override them if and when a more direct mappings on the capabilities of the bound sources exists.

When the reader does implement the methods above natively, the following issues arise:

applying patterns

In some cases, the reader may be able to transform patterns in terms of querying/filtering capabilities of the bound source. Often, it may be able to do so only partially, i.e. by extracting from the patterns the subset of constraints that it can transform. In this case, the reader would push this subset towards the source, transform the results into trees, and then prune the trees with the original pattern, so as to post-filter the data along the constraints that it could not transform.

If the bound source offers no querying/filtering capabilities, then the reader must apply the pattern only locally on the unfiltered results returned by the source. Note that the performance of get(Pattern) in this scenario can be severely compromised if the bound source is remote, as the reader would effectively transfer its entire contents over the network at each invocation of the method. The reader may then opt for not implementing this method at all, or for rejecting requests that use particularly ‘inclusive’ patterns (e.g. Patterns.tree() or Patterns.any(), which do not constraint trees at all).

transforming data into trees

The reader is free to follow the approach and choose the technologies that seem most relevant to the purpose, in that the framework neither limits nor supports any particular choice. It is a good design practice to push the transformations outside the reader, particularly when the plugin supports multiple tree types, but also to simplify unit testing. The transformation may even be pushed outside the whole plugin and put in a separate library that may be reused in different contexts. For example, it may be reused in another plugin that binds sources through a different protocol but under the same data model. If the transformation works both ways (e.g. because the plugin supports write requests), it may also be reused at the client-side, to revert from tree types to the original data models.

In summary, the plugin can deliver a simple implementation of SourceReader by:

implementing get(String,Pattern) and get(Pattern), and
inheriting surrogate implementations of all the other methods from AbstractReader.

Alternatively, the plugin may be able to deliver more performant implementation of SourceReader by:

inheriting from AbstractReader and
overriding one or more surrogate implementations with native ones.

Of course, the plugin may be able to deliver native implementations of some methods and not others.

SourceWriter

A plugin implements the SourceWriter interface to change the contents of a bound source in response to write requests. Writers are rarely implemented by plugins that bind to remote sources, which typically offer read-only interfaces. Writers may be implemented instead by plugins that bind to local sources, so as to turn the service endpoint into a storage service for structured data.

A SourceWriter implements methods to insert new data in the bound source:

Tree add(Tree): inserts a tree in the bound source and returns the same tree as this has been inserted in the source.

With its signature, the method supports multiple insertion models:

If the data is annotated at the point of insertion with identifiers, timestamps, versions and similar metadata, the writer can return these annotations back to the client;
If instead the data is unmodified at the point of insertion, the writer can return null to the client so as to simulate a true "add-and-forget" model and avoid unnecessary data transfers.

Fire-and-forget insertions may also be desirable under the first model, when clients have no use for the annotations added to the metadata at the point of insertion. The plugin may support these clients if it allows them to specify directives in the input tree itself (e.g. special attributes on root nodes). The writer would recognise directives and return null to clients.

Regardless of the insertion model of the bound source, input trees may be invalid for insertion, e.g. miss required metadata, have metadata that it should not have (e.g. identifiers that should be assigned by the bound source), or be otherwise malformed with respect to insertion requirements. When this happens, the writer must throw an InvalidTreeException.

Stream<AddOutcome> add(Stream<Tree>): inserts trees in the bound source and returns the outcomes in the same order.

AddOutcome is a wrapper for either the same output returned by add(Tree) (i.e. the tree as it has been inserted in the source), or for the Exception encountered in the process of inserting a tree. In the latter case, the writer must wrap the same Exceptions that it would throw in add(Tree), and throw a generic Exception only if cannot produce the stream at all.

A SourceWriter implements also methods to change the data in the bound source:

void update(Tree): updates a given tree in the bound source.

Like for insertions, the signature of the method supports multiple update models

If the bound source models updates in terms of replacement, the input tree may simply encode the new version of the data;
if instead the bound source support in-place updates, the input tree may encode no more and no less than the exact changes to be applied to the existing data. The tree API supports in-place updates with the notion of a delta tree, i.e. a special tree that encodes the changes applied to a given tree over time, i.e. contains only the nodes of the tree that have been added, modified, or deleted, marked with a corresponding attribute. The API can also compute the delta tree between a tree and another tree that represents its evolution at a given point in time (cf. Node.delta()). Clients may thus compute the delta tree for a set of changes and invoke the service with it. The writer may parse delta tree to effect the changes or, more simply, revert to a replacement model of update: retrieve the data to be updated, transform it into a tree, and then use again the tree API to update it with the changes carried in the delta tree (cf. <node>Node.update(Node)</node>).
under both models, the input tree can carry the directive to delete existing data, rather than modify it.

In all cases, the plugin must document the expectations of its writers over the input tree. Note that input tree must allow the writer to identify which data should be updated. If the target data cannot be identified (e.g. it no longer exists in the source), the writer must throw an UnknownTreeException. If the input tree does allow the writer to identify the target data but it does not meet expectations otherwise, then the writer must throw an InvalidTreeException.

Stream<UpdateTreeFailure> update(Stream<Tree>): updates given trees in the bound source.

The trees in input may have the same range of semantics discussed above for update(Tree). :UpdateTreeFailure is a wrapper for the Exception encountered in the process of updating a given tree. In the latter case, the writer must wrap the same Exceptions that it would throw in update(Tree), and throw a generic Exception only if cannot produce the stream at all.

@@ Line 87: / Line 87: @@
 = Key Design Issues =
-The framework has been designed to support a wide range of plugins. There are indeed many degrees of freedom in plugin design:
+The framework has been designed to support a wide range of plugins. The following questions characterise the design of a plugin and illustrate the possible variations across plugins:
-* '''what sources can it bind to?''' the plugin may
+* ''what sources can it bind to?''
-:* be dedicated to specific data sources (''source-specific plugin'')
+:'''source-specific plugin''': a plugin may be be dedicated to specific data sources;
-:* target open-ended classes of data sources which publish data through standard APIs and in standard models (''source-generic plugin'');
+:'''source-generic plugin''': a plugin may target open-ended classes of data sources that publish data through standard APIs and in standard models.
-* '''what kind of trees does it accepts and/or returns?''' All plugins transform to trees and/or from trees, but the structure and intended semantics of those trees may vary substantially across plugins:
+* ''what kind of trees does it accepts and/or returns?''
+:all plugins transform to trees and/or from trees, but the structure and intended semantics of those trees may vary substantially across plugins:
-:* the plugin may be fully general transform a data model which is as general-purpose as the tree model (''type-generic plugin''). A plugin for data sources that expose arbitrary instances of XML data or RDF data through some standard API, for example, fall within this category. In this case, the meaning and shape of the trees may be unconstrained in principle, or it may be constrained only at the point of binding to specific data sources.
+:'''type-generic plugin''': a plugin may be fully general, i.e. transform a data model which is as general-purpose as the tree model. A plugin for data sources that expose arbitrary instances of XML data or RDF data through some standard API, for example, fall within this category. In this case, the meaning and shape of the trees handled by the plugin may be unconstrained in principle, or else it may be constrained only at the point of binding to specific data sources, after bind requests.
-:* the plugin may be extremely specific and transform a concrete data model into trees with well-defined structures, i.e. abiding to a statically defined set of constraints on edge labels and leaf values (''type-specific plugin''). In this case, the tree model of the service is used as a general-purpose carrier for the original data model. The plugin documentation will include the definition of its tree type, anything ranging from narrative to formal XML Schema definitions. The definition may be specific to the plugin or else reflect wider consensus.
+:'''type-specific plugin''': a plugin may be extremely specific and transform a concrete data model into trees with statically known structures. In this case, the tree model of the service is used as a general-purpose carrier for the original data model. The plugin documentation will include the definition of its tree type, anything ranging from informal narrative to formal XML Schema definitions. This definition may be introduced by the plugin or else reflect wider consensus.
-:* most plugins will always work with a single tree type, but this is not a constraint imposed by the framework. The plugin may support transformations into a number of tree types and allow binding clients to indicate in their bind requests the type they desire sources to be bound with (''multi-type plugin''). The plugin may embed the transformations , or else take a more a dynamic approach and define a framework for transformers which are discoverable on the classpath.
+:'''multi-type plugin''': most plugins will always work with a single tree type, but this is not a constraint imposed by the framework. A plugin may support transformations into a number of tree types and allow binding clients to indicate in their bind requests the type they desire sources to be bound with. The plugin may embed the transformations , or else take a more a dynamic approach and define a framework for transformers which are discoverable on the classpath. Furthermore, the plugin may support a single transformation but produce trees that can be assigned multiple types, from more generic to more specific types, e.g. a generic RDF type as well as a more specific type associated with some RDF schema.
-:* the plugin may support a single transformation which outputs trees that can be assigned multiple types, from more generic to more specific types, e.g. a generic RDF type as well as a more specific type associated with some RDF schema.
-* '''what requests does it support?'''
+* ''what requests does it support?''
+:all plugins must accept at least one form of bind request, but a plugin may support many so as to cater for different types of bindings, or to support reconfiguration of previous bindings. For example:
-:* all plugins must accept at least one form of bind request, but a plugin may support many so as to cater for different types of bindings, or to support reconfiguration of a previous binding;
 :* most plugins bind a single source per bind request, but some may bind many at once for some requests;
+:* most plugins support read requests but do not support write requests, typically because the bound sources are static, or grant write access only to privileged clients. In principle at least, the converse may apply and a plugin may grant only write access to the sources. Overall, a plugin may support one of the following access modes: ''read-only'', ''write-only'', or ''read-write''.
-:* most plugins support read requests but do not support write requests, typically because the bound sources are static, or grant write access only to privileged clients. In principle at least, the converse may apply and a plugin may grant only write access to the sources. Overall, a plugin may support one of the following access modes: read-only, write-only, or read-write.
+* ''what functional and QoS limitations does it have?''
+:rarely will the API and tree model of the service prove functionally equivalent to those of bound sources.  Even if a plugin restricts its support to a particular access mode, e.g. read-only, it may not be able to support all the requests associated with that mode, or to support them all efficiently. For example, its bound sources:
-* '''what functional and QoS limitations does it have?''' Rarely will the API and tree model of the service prove functionally equivalent to those of the bound sources.  Even if the plugin restricts its support to a particular access mode, e.g. read-only, it may not be able to support all the requests associated with that mode, or to support them all efficiently. For example, its bound sources may:
-:* offer no lookup API because they they do not mandate or regulate the existence of identifiers in the data;
+:* may offer no lookup API because they they do not mandate or regulate the existence of identifiers in the data;
-:* offer no query API, or else support (the equivalent of) a subset of the patterns that clients may indicate in query requests;
+:* may offer no query API, or else support (the equivalent of) a subset of the filters that clients may specify in query requests;
-:* not allow the plugin to retrieve, add, or update many trees at once.;
+:* may not allow the plugin to retrieve, add, or update many trees at once.;
-:* not support updates or not support partial updates or not support deletions.
+:* may not support updates at all, or may not support partial updates, or may not support deletions.
 : In some cases, the plugin may be able to compensate for differences, typically at the cost of reduction in QoS. For example, the plugin:
 :* it may be configured at binding time with queries that model lookups, differently for different bound sources;
-:* it may partially transform patterns and then do local matches on the results returned by sources (2-phase match).
+:* it may partially transform filters and then do local matches on the results returned by sources;
-:* if the bound sources do not support partial updates, it may fetch the data first and then apply them locally;
+:* if the bound sources do not support partial updates, it may fetch the data first and then apply them locally,
-: In other cases, such as then bound sources do not support deletions, the plugin has not other obvious option but to fail requests.
+: In other cases, for example when bound sources do not support deletions, the plugin has not other obvious option but to reject the client requests.
 Answering the questions above fixes some of the free variables in plugin design and helps to characterise it ahead of implementation. Collectively, the answers define a "profile" for the plugin and the presentation of this profile should have a central role in its documentation.

Difference between revisions of "The Tree Manager Framework"

Revision as of 10:02, 29 June 2012

Contents

Overview

Key Design Issues

Plugin, PluginLifecycle, and Environment

SourceBinder

Source

SourceLifecycle

SourceEvent, SourceNotifier, and SourceConsumer

Auxiliary APIs

SourceReader

SourceWriter

Navigation menu

Views

Personal tools

gCube Wiki

gCube features

gCube documentation

Integration and Distribution

Search

Tools