Data Transfer Facilities
The implementation of a reliable data transfer mechanisms between the nodes of a gCube-based Hybrid Data Infrastructure is one of the main objectives when dealing with large set of multi-type datasets distributed across different repositories.
To promote an efficient and optimized consumption of these data resources, a number of components have been designed to meet the data transfer requirements.
This document outlines the design rationale and high-level architecture of such components.
The components part of the subsystem provide the following main key features:
- Point to Point transfer
- one writer-one reader as core functionality
- Produce only what is requested
- a producer-consumer model that blocks when needed and reduces the unnecessary data transfers
- Intuitive stream and iterator based interface
- simplified usage with reasonable default behavior for common use cases and a variety of features for increased usability and flexibility
- Multiple protocols support
- data transfer currently supports the following protocols: tcp and http
- HTTP Broker Servlet
- transfer results are exposed as an http endpoint
- Reliable data transfer between Infrastructure Data Sources and Data Storages
- by exploiting the uniform access interfaces provided by gCube
- Structured and unstructured Data Transfer
- both Tree based and File based transfer to cover all possible use-cases
- Transfers to local nodes for data staging
- data staging for particular use cases can be enabled on each node of the infrastructure
- Advanced transfer scheduling and transfer optimization
- a dedicated gCube service responsible for data transfer scheduling and transfer optimization
- Transfer statistics availability
- transfers are logged by the system and made available to interested consumers.
- the Result Set components
- this family of components provide a common data transfer mechanism that aims to establish high throughput point to point on demand communication. It has been designed as a core functionality of the overall system and it can be considered as well the building block for the Data Transfer Scheduler & Agent components.
- this family of components guarantees VO/VRE Administrators the possibility to transfer data among Data Sources and Data Storages. It can be exploited as well by any client or web services to implements data movement between infrastructure nodes.
- the Data Transfer 2
- this family of components ...