Tree-Based Access

Overview

Discovery, indexing, transformation, transfer, presentation are key examples of data management subsystems that abstract in principle over the domain-specific semantics of the data. Other, equally generic system functions are based in turn on those subsystems, most noticeably search over indexing and process execution over data transfer. It is precisely in this generality that lies the main value proposition of the system as an enabler of data e-Infrastructures.

Directly or indirectly, all the processes mentioned above require access to the data. Like in small-scale systems, it is a requirement of good design that they do so against a uniform interface that aligns with their generality and encapsulates them from the variety of network locations, data models, and access APIs that characterise individual data sources.

Providing this interface is essentially an interoperability requirement. For consistency and uniform growth, the requirement is addressed in a dedicated place of the system’s architecture, i.e. for an open-ended number of subsystems rather than within individual subsystems. This place is the data access and storage subsystem, and specifically the cluster of components that base storage and access on a uniform data model and API.

Key features

Tree-based access and storage components provide the following key features:

uniform model and access API over structured data: The model is based on edge-labelled and node-attributed trees, and the API is based on a suite of CRUD operations exposed by the Tree Manager service.

fine-grained access to structured data: The read operations can filter and prune trees based on a sophisticated pattern language, and they can resolve whole trees or arbitrary nodes from a URI scheme derived from local node identifiers. The write operations can perform updates in place, applying the tree model to the changes themselves (delta trees). Both read and write operations can work on individual trees as well as on arbitrarily large tree streams.

dynamically pluggable architecture of model and API transformations: The Tree Manager implements the interface against an open-ended number of data sources, from local sources to remote sources, including those that are managed outside its boundaries. The implementation relies on two-way transformations between the tree model and API of the service and those of individual sources. Transformations are implemented in plugins, libraries developed in autonomy from the service so as to extend its capabilities at runtime (hot deployment). Plugins may implement the API partially (e.g. for read-only access) and employ best-effort strategies in adapting individual operations to ad-hoc data source or to an open-ended class of data sources that align with model and API standards.

scalable access to remote data sources: The Tree Manager may be replicated within the system, and their replicas can autonomically balance their state by publishing records of their local activity (activation records) in the Directory Services, and by subscribing with those services for notifications of such publications. The Directory Services then act as infrastructure-wide load balancers for the replicas, pointing clients to least-loaded replicas first.

efficient and scalable storage of structured data: A distinguished plugin of the Tree Manager, the Tree Repository, offers tree storage at the service endpoints, using graph database technology (Neo4j) to avoid model impedance mismatches and to offer full coverage and efficient implementations of the service API.

flexible viewing mechanisms over structured data: The View Manager service uses the tree pattern language to maintain “passive views” of data sources that are accessible through the Tree manager service. Views can be published and maintained under generic regimes, but the service can be extended with plugins for custom management of specific classes of views.

rich tooling for client and plugin development: A rich set of libraries support the developments of Java clients and plugins, offering embedded DSLs for the manipulation of trees, patterns, and streams. They also simplify access to remote Tree Manager and View Manager endpoints, through high-level facades for their remote APIs.

Design

Philosophy

This is the rationale behind the design. An example will be provided.

Architecture

The main software components forming the subsystem should be identified and roughly described. An architecture diagram has to be added here. A template for the representation of the architecture diagram will be proposed together with an opensource tool required to produce it.

Deployment

Usually, a subsystem consists of a number of number of components. This section describes the setting governing components deployment, e.g. the hardware components where software components are expected to be deployed. In particular, two deployment scenarios should be discussed, i.e. Large deployment and Small deployment if appropriate. If it not appropriate, one deployment diagram has to be produced.

Large deployment

A deployment diagram suggesting the deployment schema that maximizes scalability should be described here.

Small deployment

A deployment diagram suggesting the "minimal" deployment schema should be described here.

Use Cases

The subsystem has been conceived to support a number of use cases moreover it will be used to serve a number of scenarios. This area will collect these "success stories".

Well suited Use Cases

Describe here scenarios where the subsystem proves to outperform other approaches.

Less well suited Use Cases

Describe here scenarios where the subsystem partially satisfied the expectations.

Tree-Based Access

Contents

Overview

Key features

Design

Philosophy

Architecture

Deployment

Large deployment

Small deployment

Use Cases

Well suited Use Cases

Less well suited Use Cases

Navigation menu

Views

Personal tools

gCube Wiki

gCube features

gCube documentation

Integration and Distribution

Search

Tools