The Tree Manager Framework

From Gcube Wiki
Revision as of 11:41, 20 June 2012 by Fabio.simeoni (Talk | contribs)

Jump to: navigation, search

The Tree Manager service may be called to persist or retrieve edge-labelled trees, either one at the time or many at once. Either way, the data is not necessarily stored locally to the service, or as trees. Instead, the data is most often held remotely, it is autonomously managed, and it is exposed by other services in a variety of forms and through different APIs.

The main value proposition of the service is that, in many cases and for many purposes, this variety of data sources may be ignored and the data uniformly accessed under a sufficiently general API and data model. This uniformity defines a basis for interoperability between service clients and data sources. It enables generic clients to implement cross-domain functions - including data indexing, transformation, discovery, transfer, browsing, viewing, etc. - over a single data model, against a single API, and with a consistent set of tools. Similar advantages can be granted to less generic clients that implement domain-specific or application-specific functions, provided that consensus is achieved around conventional uses of the tree model.

Within the service, uniformity is achieved with two-way transformations from the API and tree model of the service to those of the underlying data sources. Transformations are implemented in plugins, libraries developed in autonomy from the service so as to extend its capabilities at runtime. Service and plugins interact through a protocol defined by a set of local interfaces which, collectively, serve as a framework for plugin development. The framework is packaged as a stand-alone library, the tree-manager-framework, and is a dependency for both service and plugins.


Tree-manager-framework-overview.png


The library and all its transitive dependencies are available in our Maven repositories. Plugins that are managed with Maven, can resolve them with a single dependency:

<dependency>
  <groupId>org.gcube.data.access</groupId>
  <artifactId>tree-manager-framework</artifactId>
  <version>...</version>
  <scope>compile</scope>
</dependency>

In what follows, we address the plugin developer and describe the framework in detail, illustrating also design options and best practices for plugin development.

Overview

We start by overviewing the key components of the framework, their role in the design of a plugin, and their relationships.

The service and the plugin interact in order to notify each other of the occurrence of certain events. The service observes events that relate to its clients, first and foremost their requests; these translate in actions which the plugin must perform on data sources. Vice versa, the plugin may observe events that relate to the data source, first and foremost changes to their state; these need to be reported to the service. The framework defines the interfaces through which all these events may be notified.

There are three types of client requests that the service may relay to the plugin:

  • bind requests, where a client asks the plugin to connect to given sources. This type of client knows about the plugin and has in fact included in the request all the information that the plugin needs in order to establish a binding. The service delivers the request to a SourceBinder provided by the plugin and will expect back one Source for each bound source. The plugin configures Sources with information extracted or derived from the request, and inject them with other components that the service needs to access for later requests. Thereafter, the service manage Sources on the plugin's behalf.


Tree-manager-framework-bind-requests.png

  • read requests, where a client wants to retrieve data from a source previously bound to the plugin, and specifies either one or more identifiers to resolve (lookup requests), or patterns to match (query requests). This type of client may not know about the plugin at all, having simply discovered that the service gives read access to a source of interest. The service will resolve the plugin's Sources from requests and then deliver them to the SourceReaders provided by the Sources, expecting trees back. It is the plugin's job to translate requests for the API of the bound sources and to transform the results returned by the sources into trees.

Tree-manager-framework-read-requests.png


  • write requests, where a client wants to store data in a source previously bound to the plugin, either new data (add requests) or changes to existing data, including removals (update requests). This type of client typically knows about the plugin and what this expects from the data. The service resolves the plugin's Sources from requests and then deliver them to the SourceWriters provided by the Sources, passing trees to them. And again, it is the plugin's job to translate requests for the API of the bound sources.

Tree-manager-framework-write-requests.png


Note that the plugin must provide a SourceBinder and at least one SourceReader or SourceWriter.

Besides relaying client requests, the service also notifies the plugin of key events in the lifetime of its source bindings. It does so by invoking event-specific callbacks of the SourceLifecycle associated with the plugin's Sources. As we shall see, lifetime events include binding initialisation, reconfiguration, passivation, resumption, and termination.


Tree-manager-framework-service-events.png

 These are all the events that the service observes and pass on to the plugin. Others events, however, may be observed by the plugin, such as changes in properties or status of bound sources. These events are predefined as SourceEvents and the plugin reports them to the SourceNotifiers that the service injects in the plugin's Sources. The service has its own SourceConsumers that receive the notifications of the plugin. If useful to connect its components, the plugin can implement its own SourceEvents and SourceConsumers.


Tree-manager-framework-plugin-events.png

 All the key components of the plugins are introduced to the service through an implementation of the Plugin interface. From it, the service gets the plugin's SourceBinder and from the binder it obtains the plugin's Sources, their SourceLifecycles, their SourceReaders and SourceWriters. In addition, the Plugin exposes descriptive information about the plugin that the service publishes in the infrastructure and use in order to mange the plugin. Optionally, the plugin may implement PluginLifecycle, which extends Plugin with callbacks invoked by the service when the plugin is loaded and unloaded. This gives the plugin more control on its lifecycle.

To bootstrap the process of component discovery and find Plugin implementations, the service adopts the standard Java mechanism based on a ServiceLoader. Accordingly, the plugin includes a file META-INF/services/org.gcube.data.tmf.api.Plugin in its Jar. The files contains a single line with the qualified name of its Plugin implementation.

Tree-manager-framework-pliugin-discovery.png

 This completes our quick overview of the main interfaces and classes provided by the framework. Note at the outset that, besides implementing the interfaces that define the interaction protocol with the the service, the plugin is free to design and develop against the framework using any technology that seems appropriate to the task.

In the rest of this guide we look at the key components of the framework in more detail.

Key Design Issues

The framework has been designed to support a wide range of plugins. There are indeed many degrees of freedom in plugin design:

  • what sources can it bind to? the plugin may be dedicated to specific data sources (source-specific plugin), or it may target open-ended classes of data sources which publish data through standard APIs and in standard models (source-generic plugin);
  • what kind of trees does it accepts and/or returns? All plugins transform to trees and/or from trees, but what is the structure and intended semantics of those trees? Depending on the bound sources and the design of the plugin:
    • the plugin may be fully generic, i.e. transform a data model which is as general-purpose as the tree model (type-generic plugin). A plugin for data sources that expose arbitrary instances of XML data or RDF data through some standard API, for example, fall within this category. In this case, the meaning and shape of the trees may be unconstrained in principle, or it may be constrained only at the point of binding to specific data sources.
    • alternatively, the plugin may be extremely specific and transform a concrete data model into trees with well-defined structures, i.e. abiding to a set of constraints on edge labels and leaf values which is statically defined (type-specific plugin). In this case, the tree model of the service serves as a general-purpose carrier for the original data model. The plugin documentation will include the definition of its tree type, anything ranging from narrative to formal XML Schema definitions. The definition may be specific to the plugin, or it may reflect a wider consensus towards which the plugin and many others may converge, regardless of the variety of their bound sources.
    • most plugins will always work with a single tree type, but this is not a constraint imposed by the framework. The plugin may support transformations into a number of tree types and allow binding clients to indicate in their requests the type they desire sources to be bound with (multi-type plugin). The plugin may then embed multiple transformations, or take a more a dynamic approach, define a framework for transformers, and discover the transformers available on the classpath.
    • finally, the plugin may support a single transformation which outputs trees that can be assigned multiple types, from more generic to more specific types, e.g. a generic RDF type as well as a more specific type associated with some RDF schema.
  • what requests does it support? all plugins must accept at least one form of bind request, but a plugin may support many so as to cater for different types of bindings, or to support reconfiguration of a previous binding. Further, most plugins bind a single source per bind request, but some may bind many at once for some requests. Most plugins also support read requests but do not support write requests, typically because the bound sources are static, or grant write access only to privileged clients. In principle at least, the converse may apply and a plugin may grant only write access to the sources. Overall, a plugin may support one of the following access modes: read-only, write-only, or read-write.
  • what functional and QoS limitations does it have? Rarely will the API and tree model of the service prove functionally equivalent to those of the bound sources. Even if the plugin restricts its support to a particular access mode, e.g. read-only, it may not be able to support all the requests associated with that mode, or to support them all efficiently. Its bound sources, for example, may offer no lookup API because they they do not mandate or regulate the existence of identifiers in the data. Alternatively, they may offer no query API, or else support (the equivalent of) a subset of the patterns that clients may indicate in query requests. Again, the bound sources may not allow the plugin to retrieve, add, or update many trees at once. In some cases, the plugin may be able to compensate for differences, typically at the cost of reduction in QoS. For example, the plugin may be configured at binding time with queries that model lookups, differently for different bound sources. Similarly, it may partially transform patterns and then do local matches on the results returned by sources (2-phase match). Coming to write requests, the bound sources may not support partial updates, forcing the plugin to fetch the data and apply them locally. Or they may not support updates at all, or they may not support deletions, leaving the plugin with no obvious option but to fail update requests.

Answering these questions fixes some of the free variables in plugin design and and helps to characterise it ahead of implementation. Collectively, the answers define a profile for the plugin and should serve as a key element of its documentation.

Plugin, PluginLifecycle, and Environment

A plugin implements the following method of the Plugin interface:

  • String name(): returns the name of the plugin. The service will publish it and its clients may use it to discover instances of the service which have been extended with the plugin.
  • String description(): returns a brief description of the plugin. The service will publish it so that it can be inspected and displayed by a range of clients;
  • List<Property> properties(): returns triples (name, value, description), all String-valued. The service will publish them and its clients may use them to identify instances of the service which have been extended with the plugin. The plugin decides what properties may be useful to clients for discovery, inspection, or display. For example, if the plugin is multi-type, it will probably list the types that it supports here. The implementation returns null or an empty list if it has no properties to publish;
  • SourceBinder binder(); returns the plugin's implementation of the SourceBinder interface. The service will relay bind requests to it.
  • List<String> requestSchemas(): the schemas of the bind requests that the plugin can process. These will be published by the service to instruct binding clients to formulate their bind requests. There are GUIs within the system that use the schema to generate forms for interactive formulation of bind requests. The implementation may return null and decide to document its expectations elsewhere. If it does not return null, it is free to use any schema language of choice, though the existing GUIs expect XML Schemas which is thus the recommended language. Note that, in the common case in which the plugin models requests with Java classes and use JAXB as standard data binding solution, it can easily generate schemas directly in the implementation of the method, using JAXB.generateSchema().
  • boolean isAnchored(): returns an indication of whether the plugin is anchored, i.e. stores data locally to the service. If true, the service will inhibit its internal replication schemes. In the common case in which the plugin targets remote data sources, the implementation will simply return false.

As mentioned above, a plugin that needs more control over its own lifetime can implement PluginLifecycle, which extends Plugin with the following callback methods:

  • void init(Environment) is invoked when the plugin is first loaded;
  • void stop(Environment) is invoked when the plugin is unloaded;

For example, the plugin may implement init() to to start up a DI container of the likes of Spring, Guice, of CDI.

Environment is implemented by the service to encapsulate access to the environment in which the plugin is deployed or undeployed. At the time of writing, it serves solely as a sandbox over the location of the file system which may be accessed by the plugin. Accordingly, it exposes only the following method:

  • File file(path), which returns a file with a given path relative to the storage location.