Difference between revisions of "The Tree Manager Library"
(→Preliminary Concepts) |
(→Suitable Endpoints) |
||
Line 64: | Line 64: | ||
[[Image:tree-manager-client-deployment.png|center]] | [[Image:tree-manager-client-deployment.png|center]] | ||
− | Here the services are deployed on two nodes, N1 and N2. There are different plugins on the two nodes but one plugin P1 is available on both nodes. Using this plugin, the T-Binder at N1 has created read and write views over a source S1. Similarly, the | + | Here the services are deployed on two nodes, N1 and N2. There are different plugins on the two nodes but one plugin P1 is available on both nodes. Using this plugin, the T-Binder at N1 has created read and write views over a source S1. Similarly, the T-Binder at N2 has created a read view over the same source. We may now read data from S1 using the T-Reader at either node. This redundancy helps avoiding bottlenecks under load and service outages after partial failures. |
− | Note however that we may only write data into S1 from the T-Writer at | + | Note however that we may only write data into S1 from the T-Writer at N1. Perhaps scalability and outages are less of a concern for write operations. At least they aren't 't yet; the T-Writer at N2 may be bound to S1 later, when and if those concerns arise. |
− | We may also read from a different source | + | We may also read from a different source S2, through the T-Reader at N2 and via a plugin P2 which is not available on N1, at least not yet. We may not write into S2, however, as there is no T-Writer around which let us do that. Perhaps the source is truly read-only to remote clients. Perhaps it is the intended policy that we may not change the source through the Tree Manager services. |
This scenario makes it clear that, at any given time: | This scenario makes it clear that, at any given time: | ||
* ''not all endpoints are equally suitable'' | * ''not all endpoints are equally suitable'' | ||
− | : if we want to write in S1 we have no business with what's on N2. Conversely, if we want to | + | : if we want to write in S1 we have no business with what's on N2. Conversely, if we want to read from S2 we can ignore N1. |
* ''not all endpoints are uniquely suitable'' | * ''not all endpoints are uniquely suitable'' | ||
: if we want to read from S1, either of the two nodes will do and if we cannot work with one we can always try the other. | : if we want to read from S1, either of the two nodes will do and if we cannot work with one we can always try the other. | ||
* ''the suitability of endpoints may vary over time'' | * ''the suitability of endpoints may vary over time'' | ||
− | : If either node holds no interest to us for a given task, it may do so tomorrow. P2 may be deployed | + | : If either node holds no interest to us for a given task, it may do so tomorrow. P2 may be deployed o N1, T-Readers and T-Writers for S2 may be staged on it, a T-Writer for S1 may be staged on N2, and so on. |
So, service deployment may well be back-end business, but as clients we simply cannot ignore that the world out there is varied and in movement. Our first task is thus to locate suitable endpoints right when we need them, i.e. T-Binders with the right plugin or T-Readers and T-Writers bound to the right sources. | So, service deployment may well be back-end business, but as clients we simply cannot ignore that the world out there is varied and in movement. Our first task is thus to locate suitable endpoints right when we need them, i.e. T-Binders with the right plugin or T-Readers and T-Writers bound to the right sources. | ||
Line 84: | Line 84: | ||
The good news is that the <code>tree-manager-library</code> can do the heavy lifting for us. We do not need to know how to specify and submit actual queries, only declare the properties of suitable endpoints. The library takes over from there, acting as a client of the Information Services on our behalf. | The good news is that the <code>tree-manager-library</code> can do the heavy lifting for us. We do not need to know how to specify and submit actual queries, only declare the properties of suitable endpoints. The library takes over from there, acting as a client of the Information Services on our behalf. | ||
− | |||
== Additional Dependencies== | == Additional Dependencies== |
Revision as of 17:11, 15 December 2012
The Tree Manager Library is a client library for the Tree Manager services. It can be used to read from and write into data sources that are accessible via those services.
The library is available in our Maven repositories with the following coordinates:
<groupId>org.gcube.data.access</groupId> <artifactId>tree-manager-library</artifactId> <version>...</version>
Contents
Concepts
As clients of the tree-manager-library we need to understand:
- what are the Tree Manager services, and what they can do for us;
- what data they may take and return;
- how to stage them and where to find them;
- which tools to use to work with them, in addition to the
tree-manager-library
.
We build this understanding in the rest of this Section, so that we can conceptualise our work ahead of implementation. We then illustrate how the tree-manager-library
helps us with the implementation.
About the Services
The Tree Manager services let us access heterogeneous data sources in a uniform manner. They present us a unifying view of the data as edge-labelled trees, and translate that view to the native data model and access APIs of individual sources. These translations are implemented as service plugins, which are defined independently from the services for specific data sources or for whole classes of similar sources. If there is a plugin for a given source, then that source is "pluggable" in the Tree Manager services.
In more details:
- the T-Reader and T-Writer services give us read and write APIs over pluggable sources. We may be able to access any such source through either or both, depending on the source itself, the capabilities of the associated plugin, or the intended access policy.
- the T-Binder service let us connect T-Reader and/or T-Writer services to pluggable sources. Thus the service stages the others, creating read and/or write views over the sources.
We can then interact with The Tree Manager services in either one of two roles. When we act as staging clients, we invoke the T-Binder service to create read and write views over pluggable sources, ahead of access. When we act as access clients, we invoke the T-Reader and the T-Writer to consume such views, i.e. pull and push data from and towards them. Of course, we can play both roles within the same piece of code.
In summary, services, plugins, clients, and sources come together as follows:
Tree Types
At first glance, clients and plugins have liitle to share. The previous picture makes it clear: client sit in front of the services, plugin live behind them. At a closer look, however clients may have clear dependencies on plugins.
For access clients the dependency has to do with the "shape" of trees. T-Reader and T-Writer do not constrain what edges trees may have and what values may be in their leaves. In contrast, plugins typically translate the data model of data sources to a particular tree type. If we want to access those sources, we need to know those types. Often they are fully defined by the plugin, and in this case we will find them defined in their documentation. In other cases, plugins work with types defined by broader agreement. In these cases, we go wherever the agreement is documented to find the definition of the tree type.
For staging clients the dependency is even more explicit. When we ask the T-Binder to create read and write views over a source, we need to name what plugin is going to handle that source. This plugin may also allow us to configure how it is going to handle the source. These staging directives vary from plugin to plugin, and may be arbitrarily complex. The T-Binder will take them and dutifully pass them on to the target plugin. We learn about these directives in the documentation of the plugin. In most cases, the plugin will give us facilities to formulate directives, such as bean classes and facilities to serialise them on the wire. The facilities will ship in a shared library that we will then add to our dependencies. Again, the documentation of the plugin will provide the required coordinates.
Do we always depend on plugins? No, if we do not stage the services and can process trees of arbitrary types. In a word, if we are truly generic read clients. A good example here is a source explorer, which renders in some generic fashion the contents of any data source that can be accessed through the services. Another example is a generic indexer or transformer, which can be declaratively configured to work on any tree type. gCube includes in fact many clients that belong to this category. These clients use the Tree Manager services as the single data source they need to deal with.
In all the other cases, however, we need to know the plugins that are relevant to our task, as they define the data we will be accessing and and/or how to make that data accessible in the first place. The following picture illustrates the point.
Suitable Endpoints
Abstractions aside, it's worth remembering that we do not interact with the services, which are unreachable software abstractions . Rather, we interact with addressable service endpoints, which pop up to life when we deploy and start the services. Deployment is largely back-end business, but as clients we should be aware of the following facts:
- T-Binder, T-Reader, and T-Writer are always deployed together, on one or more nodes.
- plugins are always deployed "near" the service endpoints, i.e on the same nodes.
- plugins are deployed independently from each other. There is no requirement that a plugin be deployed wherever the services are.
This makes for a wide range of deployment scenarios, where different scenarios address different requirements of access policy or quality of service. Consider for example the following scenario:
Here the services are deployed on two nodes, N1 and N2. There are different plugins on the two nodes but one plugin P1 is available on both nodes. Using this plugin, the T-Binder at N1 has created read and write views over a source S1. Similarly, the T-Binder at N2 has created a read view over the same source. We may now read data from S1 using the T-Reader at either node. This redundancy helps avoiding bottlenecks under load and service outages after partial failures.
Note however that we may only write data into S1 from the T-Writer at N1. Perhaps scalability and outages are less of a concern for write operations. At least they aren't 't yet; the T-Writer at N2 may be bound to S1 later, when and if those concerns arise.
We may also read from a different source S2, through the T-Reader at N2 and via a plugin P2 which is not available on N1, at least not yet. We may not write into S2, however, as there is no T-Writer around which let us do that. Perhaps the source is truly read-only to remote clients. Perhaps it is the intended policy that we may not change the source through the Tree Manager services.
This scenario makes it clear that, at any given time:
- not all endpoints are equally suitable
- if we want to write in S1 we have no business with what's on N2. Conversely, if we want to read from S2 we can ignore N1.
- not all endpoints are uniquely suitable
- if we want to read from S1, either of the two nodes will do and if we cannot work with one we can always try the other.
- the suitability of endpoints may vary over time
- If either node holds no interest to us for a given task, it may do so tomorrow. P2 may be deployed o N1, T-Readers and T-Writers for S2 may be staged on it, a T-Writer for S1 may be staged on N2, and so on.
So, service deployment may well be back-end business, but as clients we simply cannot ignore that the world out there is varied and in movement. Our first task is thus to locate suitable endpoints right when we need them, i.e. T-Binders with the right plugin or T-Readers and T-Writers bound to the right sources.
We can find suitable endpoints by querying the Information Services. These services resolve queries against descriptions that the endpoints have previously published. T-Binders publish the list of plugins that are locally available on their nodes. T-Readers and T-Writers publish information about the data sources they are bound to.
The good news is that the tree-manager-library
can do the heavy lifting for us. We do not need to know how to specify and submit actual queries, only declare the properties of suitable endpoints. The library takes over from there, acting as a client of the Information Services on our behalf.
Additional Dependencies
We use the tree-manager-library
to locate and invoke service endpoints, but we find in other libraries the tools to work with trees. We share these libraries with services, plugins, and in fact any other component which requires similar support. The tree-manager-library
specifies them as dependencies and Maven brings them automatically on our classpath:
- trees : the library contains an object implementations of the tree model used by the services. We use it to:
- construct, change, and inspect edge-labelled trees;
- describe what trees or parts thereof we want to read from sources;
- describe what trees or parts thereof we want to change or cancel from sources.
- In some cases, we may also use additional facilities, e.g. obtain resolvable URIs for trees and tree nodes, or generate synthetic trees for testing purposes.
- streams: this library lets us work with data streams, which we encounter when we read or write many trees at once. We use it primarily for its facilities to produce and consume streams. In some cases, we may also use it to transform streams, handle their failures, publish them on the network, and more.
We need to be well familiar with their documentation before we can work with the tree-manager-library
.
Reading From Sources
To read data from a given source, we use a local proxy of a T-Reader bound to that source. We then invoke the methods of the proxy to pull data from the source.
Read Proxies
To obtain a proxy, we provide a query that identifies the target source, e.g by name. In code:
In words:
- we create proxies and queries with static methods of the TServiceFactory class, a one-stop shop to obtain objects from the library. For added fluency, we first import the static methods of the factory.
- we invoke the method readSource() of the factory to get a query builder, and use the builder to characterise suitable T-Readers.
- We repeat the process to get a proxy. We invoke the reader() method to obtain a proxy builder, and use it to get a proxy configured with the query.
We assume here we know the name of the source. We could equally work with source identifiers: