The Tree Manager Library

From Gcube Wiki
Revision as of 06:59, 15 December 2012 by Fabio.simeoni (Talk | contribs) (Suitable Endpoints)

Jump to: navigation, search

The Tree Manager Library is a client library for the Tree Manager services. It can be used to read from and write into data sources that are accessible via those services.

The library is available in our Maven repositories with the following coordinates:

<groupId>org.gcube.data.access</groupId>
<artifactId>tree-manager-library</artifactId>
<version>...</version>

Preliminary Concepts

As clients of the tree-manager-library we need to understand:

  • what are the Tree Manager services, and what they can do for us;
  • where they can be found and how to find them;
  • which tools we use alongside the tree-manager-library to work with them.

We build this understanding in the rest of this Section, so that we can conceptualise our work ahead of implementation. We then illustrate how the tree-manager-library helps us with the implementation.

The Services

The Tree Manager services let us access heterogeneous data sources in a uniform manner. They present us a unifying view of the data as edge-labelled trees, and translate that view to the native data model and access APIs of individual sources. These translations are implemented as service plugins, which are defined independently from the services for specific data sources or for whole classes of similar sources. If there is a plugin for a given source, then that source is "pluggable" in the Tree Manager services.

In more details:

  • the T-Reader and T-Writer services give us read and write APIs over pluggable sources. We may be able to access any such source through either or both, depending on the source itself, the capabilities of the associated plugin, or the intended access policy.
  • the T-Binder service let us connect T-Reader and/or T-Writer services to pluggable sources. Thus the service stages the others, creating read and/or write views over the sources.

We can then interact with The Tree Manager services in either one of two roles. When we act as staging clients, we invoke the T-Binder service to create read and write views over pluggable sources, ahead of access. When we act as access clients, we invoke the T-Reader and the T-Writer to consume such views, i.e. pull and push data from and towards them. Of course, we can play both roles within the same piece of code.

In summary, services, plugins, clients, and sources come together as follows:

Tree-manager-client-overview.png

Suitable Endpoints

Abstractions aside, it's worth remembering that we do not interact with the services, which are unreachable software abstractions . Rather, we interact with addressable service endpoints, which pop up to life when we deploy and start the services. Deployment is largely back-end business, but as clients we should be aware of the following facts:

  • T-Binder, T-Reader, and T-Writer are always deployed together, on one or more nodes.
  • plugins are always deployed "near" the service endpoints, i.e on the same nodes.
  • plugins are deployed independently from each other. There is no requirement that a plugin be deployed wherever the services are.

This makes for a wide range of deployment scenarios, where different scenarios address different requirements of access policy or quality of service. Consider for example the following scenario:

Tree-manager-client-deployment.png

Here the services are deployed on two nodes, N1 and N2. There are different plugins on the two nodes but one plugin P1 is available on both nodes. Using this plugin, the T-Binder at N1 has created read and write views over a source S1. Similarly, the T-Binder at N2 has created a read view over the same source. We may now read data from S1 using the T-Reader at either node. This redundancy helps avoiding bottlenecks under load and service outages after partial failures.

Note however that we may only write data into S1 from the T-Writer at N2. Perhaps scalability and outages are less of a concern for write operations. At least they aren't 't yet; the T-Writer at N1 may be bound to S1 later, when and if those concerns arise.

We may also read from a different source S2, through the T-Reader at N1 and via a plugin P2 which is not available on N2, at least not yet. We may not write into S2, however, as there is no T-Writer around which let us do that. Perhaps the source is truly read-only to remote clients. Perhaps it is the intended policy that we may not change the source through the Tree Manager services.

This scenario makes it clear that, at any given time:

  • not all endpoints are equally suitable to all clients.
if we want to write in S1 we have no business with what's on N2. Conversely, if we want to write in S2 we can ignore N1.
  • not all endpoints are uniquely suitable to their clients.
if we want to read from S1, either of the two nodes will do and if we cannot work with one we can always try the other.
  • the suitability of any endpoint may vary over time.
If either node holds no interest to us for a given task, it may do so tomorrow. P2 may be deployed in N2, a T-Writer for S may be staged on it, and so on.

So, service deployment may well be back-end business, but as clients we simply cannot ignore that the world out there is varied and in movement. Our first task is thus to locate endpoints that are suitable, such as T-Binders with the right plugin or T-Readers and T-Writers bound to the right sources.

We can find suitable endpoints by querying the Information Services. These services resolve queries against descriptions that the endpoints have previously published. T-Binders publish the list of plugins that are locally available on their nodes. T-Readers and T-Writers publish information about the data sources they are bound to.

The good news is that the tree-manager-library can do the heavy lifting for us. We do not need to know how to specify and submit queries, only declare the properties of suitable endpoints. The library takes over from there, acting as a client of the Information Services on our behalf. Overall, the library exposes us to the realities of service deployment no more and no less than we really have to.

Tree Types

At first glance, clients and plugins have liitle to share. The previous picture makes it clear: client sit in front of the services, plugin live behind them. At a closer look, however clients may have clear dependencies on plugins.

For access clients the dependency has to do with the "shape" of trees. T-Reader and T-Writer do not constrain what edges trees may have and what values may be in their leaves. In contrast, plugins typically translate the data model of data sources to a particular tree type. If we want to access those sources, we need to know those types. Often they are fully defined by the plugin, and in this case we will find them defined in their documentation. In other cases, plugins work with types defined by broader agreement. In these cases, we go wherever the agreement is documented to find the definition of the tree type.

For staging clients the dependency is even more explicit. When we ask the T-Binder to create read and write views over a source, we need to name what plugin is going to handle that source. This plugin may also allow us to configure how it is going to handle the source. These staging directives vary from plugin to plugin, and may be arbitrarily complex. The T-Binder will take them and dutifully pass them on to the target plugin. We learn about these directives in the documentation of the plugin. In most cases, the plugin will give us facilities to formulate directives, such as bean classes and facilities to serialise them on the wire. The facilities will ship in a shared library that we will then add to our dependencies. Again, the documentation of the plugin will provide the required coordinates.

Do we always depend on plugins? No, if we do not stage the services and can process trees of arbitrary types. In a word, if we are truly generic read clients. A good example here is a source explorer, which renders in some generic fashion the contents of any data source that can be accessed through the services. Another example is a generic indexer or transformer, which can be declaratively configured to work on any tree type. gCube includes in fact many clients that belong to this category. These clients use the Tree Manager services as the single data source they need to deal with.

In all the other cases, however, we need to know the plugins that are relevant to our task, as they define the data we will be accessing and and/or how to make that data accessible in the first place. The following picture illustrates the point.

Tree-manager-client-and-plugins.png

Dependencies

We use the tree-manager-library to locate and invoke service endpoints, but we find in other libraries the tools to work with trees. We share these libraries with services, plugins, and in fact any other component which requires similar support. The tree-manager-library specifies them as dependencies and Maven brings them automatically on our classpath:

  • trees : the library contains an object implementations of the tree model used by the services. We use it to:
construct, change, and inspect edge-labelled trees;
describe what trees or parts thereof we want to read from sources;
describe what trees or parts thereof we want to change or cancel from sources.
In some cases, we may also use additional facilities, e.g. obtain resolvable URIs for trees and tree nodes, or generate synthetic trees for testing purposes.
  • streams: this library lets us work with data streams, which we encounter when we read or write many trees at once. We use it primarily for its facilities to produce and consume streams. In some cases, we may also use it to transform streams, handle their failures, publish them on the network, and more.

We need to be well familiar with their documentation before we can work with the tree-manager-library.

Reading From Sources

To read data from a given source, we use a local proxy of a T-Reader bound to that source. We then invoke the methods of the proxy to pull data from the source.

Read Proxies

To obtain a proxy, we provide a query that identifies the target source, e.g by name. In code:


In words:

  • we create proxies and queries with static methods of the TServiceFactory class, a one-stop shop to obtain objects from the library. For added fluency, we first import the static methods of the factory.
  • we invoke the method readSource() of the factory to get a query builder, and use the builder to characterise suitable T-Readers.
  • We repeat the process to get a proxy. We invoke the reader() method to obtain a proxy builder, and use it to get a proxy configured with the query.

We assume here we know the name of the source. We could equally work with source identifiers:


Read a Tree

Read Many Trees

Reading Tree Nodes

Writing To Sources

Adding Trees

Changing Trees