The Tree Manager Library
The Tree Manager Library is a client library for the Tree Manager services. It can be used to read from and write into data sources that are accessible via those services.
The library is available in our Maven repositories with the following coordinates:
<groupId>org.gcube.data.access</groupId> <artifactId>tree-manager-library</artifactId> <version>...</version>
Contents
Preliminary Concepts
As clients of the tree-manager-library we need to understand:
- what are the Tree Manager services, and what they can do for us;
- where they can be found and how to find them;
- which tools we use alongside the tree-manager-library to work with them.
We build this understanding in the rest of this Section, so that we can conceptualise our work ahead of implementation. We then illustrate how the tree-manager-library
helps us with the implementation.
The Services
The Tree Manager services let us access heterogeneous data sources in a uniform manner. They present us a unifying view of the data as edge-labelled trees, and translate that view to the native data model and access APIs of individual sources. These translations are implemented as service plugins, which are defined independently from the services for specific data sources or for whole classes of similar sources. If there is a plugin for a given source, then that source is "pluggable" in the Tree Manager services.
In more details:
- the T-Reader and T-Writer services give us read and write APIs over pluggable sources. We may be able to access any such source through either or both, depending on the source itself, the capabilities of the associated plugin, or the intended access policy.
- the T-Binder service let us connect T-Reader and/or T-Writer services to pluggable sources. Thus the service stages the others, creating read and/or write views over the sources.
We can then interact with The Tree Manager services in either one of two roles. When we act as staging clients, we invoke the T-Binder service to create read and write views over pluggable sources, ahead of access. When we act as access clients, we invoke the T-Reader and the T-Writer to consume such views, i.e. pull and push data from and towards them. Of course, we can play both roles within the same piece of code.
In summary, services, plugins, clients, and sources come together as follows:
Suitable Endpoints
Abstractions aside, it's worth remembering that we do not interact with the services, which are unreachable software abstractions . Rather, we interact with addressable service endpoints, which pop up to life when we deploy and start the services. Deployment is largely back-end business, but as clients we should be aware of the following facts:
- T-Binder, T-Reader, and T-Writer are always deployed together, on one or more nodes.
- plugins are always deployed "near" the service endpoints, i.e on the same nodes.
- plugins are deployed independently from each other. There is no requirement that a plugin be deployed wherever the services are.
This makes for a wide range of deployment scenarios, where different scenarios address different requirements of access policy or quality of service. Consider for example the following scenario:
Here the services are deployed on two nodes, N1 and N2. There are different plugins on the two nodes but one plugin P1 is available on both nodes. Using this plugin, the T-Binder at N1 has created read and write views over a source S1. Similarly, the T-Binder
at N2
has created a read view over the same source. We may now read data from S1 using the T-Reader at either node. This redundancy helps avoiding bottlenecks under load and service outages after partial failures.
Note however that we may only write data into S1 from the T-Writer at N2. Perhaps scalability and outages are less of a concern for write operations. At least they aren't 't yet; the T-Writer at N1 may be bound to S1 later, when and if those concerns arise.
We may also read from a different source S2, through the T-Reader at N1 and via a plugin P2 which is not available on N2, at least not yet. We may not write into S2, however, as there is no T-Writer around which let us do that. Perhaps the source is truly read-only to remote clients. Perhaps it is the intended policy that we may not change the source through the Tree Manager services.
This scenario makes it clear that, at any given time:
- not all endpoints are equally suitable to all clients.
- if we want to write in S1 we have no business with what's on N2. Conversely, if we want to write in S2 we can ignore N1.
- not all endpoints are uniquely suitable to their clients.
- if we want to read from S1, either of the two nodes will do and if we cannot work with one we can always try the other.
- the suitability of any endpoint may vary over time.
- If either node holds no interest to us for a given task, it may do so tomorrow. P2 may be deployed in N2, a T-Writer for S may be staged on it, and so on.
So, service deployment may well be back-end business, but as clients we simply cannot ignore that the world out there is varied and in movement. Our first task is thus to locate endpoints that are suitable, such as T-Binder
s with the right plugin or T-Readers and T-Writers bound to the right sources.
We can find suitable endpoints by querying the Information Services. These services resolve queries against descriptions that the endpoints have previously published. T-Binders publish the list of plugins that are locally available on their nodes. T-Readers and T-Writers publish information about the data sources they are bound to.
The good news is that the tree-manager-library can do the heavy lifting for us. We do not need to know how to specify and submit queries, only declare the properties of suitable endpoints. The library takes over from there, acting as a client of the Information Services on our behalf. Overall, the library exposes us to the realities of service deployment no more and no less than we really have to.
Dependencies
We use the tree-manager-library to locate and invoke service endpoints, but we find in other libraries the tools to work with trees. We share these libraries with services, plugins, and in fact any other component which requires similar support. The tree-manager-library specifies them as dependencies and Maven brings them automatically on our classpath:
- trees : the library contains an object implementations of the tree model used by the services. We use it to:
construct, change, and inspect edge-labelled trees;
- describe what trees or parts thereof we want to read from sources;
- describe what trees or parts thereof we want to change or cancel from sources.
In some cases, we may also use additional facilities, e.g. obtain resolvable URIs for trees and tree nodes, or generate synthetic trees for testing purposes.
- streams: this library lets us work with data streams, which we encounter when we read or write many trees at once. We use it primarily for its facilities to produce and consume streams. In some cases, we may also use it to transform streams, handle their failures, publish them on the network, and more.
We need to be well familiar with their documentation before we can work with the tree-manager-library
.
Reading From Sources
To read data from a given source, we use a local proxy of a T-Reader bound to that source. We then invoke the methods of the proxy to pull data from the source.
Read Proxies
To obtain a proxy, we provide a query that identifies the target source, e.g by name. In code:
In words:
- we create proxies and queries with static methods of the TServiceFactory class, a one-stop shop to obtain objects from the library. For added fluency, we first import the static methods of the factory.
- we invoke the method readSource() of the factory to get a query builder, and use the builder to characterise suitable T-Readers.
- We repeat the process to get a proxy. We invoke the reader() method to obtain a proxy builder, and use it to get a proxy configured with the query.
We assume here we know the name of the source. We could equally work with source identifiers: