Tree-Based Access
A cluster of components within the system focuses on uniform access and storage of structured data of arbitrary semantics, origin, and size. These components form a distinguished subset of the [[Data_Access_and_Storage_Facilities|subsystem] dedicated to data access and storage.
This document outlines their design rationale, key features, and high-level architecture as well as the options for their deployment.
Overview
Discovery, indexing, transformation, transfer, presentation are key examples of data management subsystems that abstract in principle over the domain-specific semantics of the data. Other, equally generic system functions are based in turn on those subsystems, most noticeably search over indexing and process execution over data transfer. It is precisely in this generality that lies the main value proposition of the system as an enabler of data e-Infrastructures.
Directly or indirectly, all the processes mentioned above require access to the data. Like in small-scale systems, it is a requirement of good design that they do so against a uniform interface that aligns with their generality and encapsulates them from the variety of network locations, data models, and access APIs that characterise individual data sources.
Providing this interface is essentially an interoperability requirement. For consistency and uniform growth, the requirement is addressed in a dedicated place of the system’s architecture, i.e. for an open-ended number of subsystems rather than within individual subsystems. This place is the data access and storage subsystem, and specifically the cluster of components that base storage and access on a uniform data model and API.
Key features
Tree-based access and storage components provide the following key features:
- uniform model and access API over structured data
- The model is based on edge-labelled and node-attributed trees, and the API is based on a suite of CRUD operations exposed by the Tree Manager service.
- fine-grained access to structured data
- The read operations can filter and prune trees based on a sophisticated pattern language, and they can resolve whole trees or arbitrary nodes from a URI scheme derived from local node identifiers. The write operations can perform updates in place, applying the tree model to the changes themselves (delta trees). Both read and write operations can work on individual trees as well as on arbitrarily large tree streams.
- dynamically pluggable architecture of model and API transformations
- The Tree Manager implements the interface against an open-ended number of data sources, from local sources to remote sources, including those that are managed outside its boundaries. The implementation relies on two-way transformations between the tree model and API of the service and those of individual sources. Transformations are implemented in plugins, libraries developed in autonomy from the service so as to extend its capabilities at runtime (hot deployment). Plugins may implement the API partially (e.g. for read-only access) and employ best-effort strategies in adapting individual operations to ad-hoc data source or to an open-ended class of data sources that align with model and API standards.
- scalable access to remote data sources
- The Tree Manager may be replicated within the system, and their replicas can autonomically balance their state by publishing records of their local activity (activation records) in the Directory Services, and by subscribing with those services for notifications of such publications. The Directory Services then act as infrastructure-wide load balancers for the replicas, pointing clients to least-loaded replicas first.
- efficient and scalable storage of structured data
- A distinguished plugin of the Tree Manager, the Tree Repository, offers tree storage at the service endpoints, using graph database technology (Neo4j) to avoid model impedance mismatches and to offer full coverage and efficient implementations of the service API.
- flexible viewing mechanisms over structured data
- The View Manager service uses the tree pattern language to maintain “passive views” of data sources that are accessible through the Tree manager service. Views can be published and maintained under generic regimes, but the service can be extended with plugins for custom management of specific classes of views.
- rich tooling for client and plugin development
- A rich set of libraries support the developments of Java clients and plugins, offering embedded DSLs for the manipulation of trees, patterns, and streams. They also simplify access to remote Tree Manager and View Manager endpoints, through high-level facades for their remote APIs.
Design
Philosophy
This is the rationale behind the design. An example will be provided.
Architecture
Tree-based access and storage are collectively provided by the following components:
- trees: a library that contains the implementation of the tree model and associated DSL, the tree pattern language and associated DSL, the URI protocol scheme, and tree bindings to and from XML and XML-related technologies;
- streams: A library that contains the implementation of a DSL for stream conversion, filtering, and publication;
- tree-manager: a suite of stateful Web Services that expose a tree-oriented API of CRUD operations and implement it by delegation to dynamically deployable plugins for target data sources within and outside the system;
- tree-manager-framework: a framework of local classes and interfaces for third-party plugin development;
- tree-manager-library: a client library that implements a high-level facade to the remote API of the Tree manager service;
- tree-repository: a plugin of the Tree Manager service for local tree storage in graph databases (Neo4j);
- view-manager: a suite of stateful Web Services that use tree patterns to define and maintain passive views over data sources that can be accessed through the Tree Manager service;
- view-manager-library: a client library that implements a high-level facade to the remote API of the View Manager service;
Deployment
Usually, a subsystem consists of a number of number of components. This section describes the setting governing components deployment, e.g. the hardware components where software components are expected to be deployed. In particular, two deployment scenarios should be discussed, i.e. Large deployment and Small deployment if appropriate. If it not appropriate, one deployment diagram has to be produced.
Large deployment
A deployment diagram suggesting the deployment schema that maximizes scalability should be described here.
Small deployment
A deployment diagram suggesting the "minimal" deployment schema should be described here.
Use Cases
The subsystem has been conceived to support a number of use cases moreover it will be used to serve a number of scenarios. This area will collect these "success stories".
Well suited Use Cases
Describe here scenarios where the subsystem proves to outperform other approaches.
Less well suited Use Cases
Describe here scenarios where the subsystem partially satisfied the expectations.