Data Transfer Agent
Contents
Data Transfer Agent Service
The Data Transfer Agent Service has been implemented with the aim of facilitating the transfer of data ( both structured and unstructured ) for the following use cases:
- Transfer of Local files from an external/internal client to a remote GHN
- File Transfer from a remote Data Source to a remote GHN using standard protocol ( HTTP, FTP, BitTorrent) by integrating the URLResolution Library.
- File Transfer from a remote Data Source to the gCube Storage Manager ( MongoDB) using standard protocol ( HTTP, FTP, BitTorrent) by integrating the Storage Manager Library and the URLResolution Library.
- Tree based data transfer from a remote Data Source to a remote Tree-Based Storage.
The service make use of gRS to implement Local Transfer, Tree based Data Transfer and delivery of Transfer Outcomes to clients.
In addition the service exploit the Messaging infrastructure in order to publish transfer statistics, that can be consumed by:
- Accounting statistics consumers
- the Data Transfer Scheduler Service, which use messaging in order to consume Agent transfer results.
Architecture
The Data Transfer Agent Service is a gCore stateful service ( singleton) which implements Asynchronous transfer operations between node of a gCube infrastructures. All Server side operations are asynchronous, while the Data Transfer Agent Library CL implements on top of them synchronous operation as well.
In the case of asynchronous operations ( both Server and Client side) the service use a local DB in order to store transfer details and provide them to the clients.
Installation and Configuration
The Data Transfer Agent Service is dynamically deployable on the infra using the gCube Enabling Layers, alternatively it can be deployed manually on a GHN downloading the full-gar artifact from here , which contains as well the needed dependencies. In this case the installation is triggered by:
gcore-deploy-service agent-service-1.0.0-full.gar
Once installed the service is able to run without further configuration, by using some default configuration parameters. In case of configuration customization the file located under $GLOBUS_LOCATION/etc/agent-service-x.x.x/jndi-deploy.xml
Data Transfer Agent Library
The Data Transfer Agent Library is the CL implementing the API for Data Transfer. In particular the Library implement the API to contact the Data Transfer Agent Service
The latest version of the library (1.1.0-SNAPSHOT) and the related javadoc artifacs can be downloaded from maven repository at [1]
In case of integration with a maven components the dependency to be included in the pom file is:
<dependency> <groupId>org.gcube.data.transfer</groupId> <artifactId>agent-library</artifactId> <version>1.1.0-SNAPSHOT</version> </dependency>
The Library offers both sync and asynchronous data transfer operations.
Sync Operations
Just to give a initial overview of the API the following is a simple code snippet which invokes a synchronous data transfer of a list of URIs remotely to a GHN which runs an instance of the Data Transfer agent service.
ScopeProvider.instance.set("/gcube/devsec/"); AgentLibrary library = transferAgent().at("test.research-infrastructures.eu", 9090).build(); ArrayList<URI> input = new ArrayList<URI>(); URI uri1 = new URI("ftp://pcd4science3.cern.ch/test.txt"), URI uri2 = new URI("http://dl.dropbox.com/u/8704957/commons-dbcp-1.3.jar")}; input.add(uri1); input.add(uri2); String outPath = "/test"; TransferOptions options = new TransferOptions(); options = new TransferOptions(); options.setOverwriteFile(true); options.setType(StorageType.LocalGHN); options.setUnzipFile(false); //it can be used to unzip the archives ( only zip for now) after the transfer ArrayList<FileTransferOutcome> outcomes = library.startTransferSync(input, outPath, options);
Please note that in order to use the static methods from the Agent Library this import has to be provided:
import static org.gcube.datatransfer.agent.library.proxies.Proxies.*;
please note as well that the outPath is not an absolute path within the agent FS, but it will be appended to the VFS root path configured on Agent side.
e.g. if the VFS root path on Agent side is /tmp, the files will be stored in /tmp/test/...