How to use Data Transfer 2

From Gcube Wiki
Revision as of 15:41, 12 September 2016 by Fabio.sinibaldi (Talk | contribs) (Data Transfer library)

Jump to: navigation, search

Data Transfer 2 is one of the subsystems forming the gCube Data_Transfer_Facilities. It aims to provide gCube Applications a common layer for efficient and transparent data transfer towards gCube SmartGear nodes. It's designed as a client service architecture exploiting plugin design pattern. A generic overview and its design are described here

Following sections describe how to use and interact with the involved components.

Data Transfer Service

The Data Transfer Service is a SmartGears-aware web application developped on top of [jersey] framework. Its main functionalities are :

  • receive and serve data transfer requests;
  • expose capabilities;

At startup it gathers information on :

  • current network configuration (i.e. exposed hostname, available ports) in order to negotiate transfer channel with clients;
  • available data-transfer plugins

Installation

The data transfer service is released as a war with the following maven coordinates

  <groupId>org.gcube.data.transfer</groupId>
  <artifactId>data-transfer-service</artifactId>

It needs to be hosted in a SmartGears installation in order to run. Please refer to SmartGears for further information.

Interface

In this section we will describe the http interfaces exposed by the service.

Capabilities

The Capabilities interface exposes information regarding :

  • Instance details (i.e. hostname, port, nodeId)
  • Available plugins
  • Available persistence Ids

Capabilities are mapped in a Java Object of class org.gcube.data.transfer.model.TransferCapabilities.

Transfer requests

The Requests interface receives transfer requests from clients, returning the associated ticket ID if the requests has been successfully registered. E request is expected to specify :

  • The transfer settings decided by the caller client (including the data source);
  • The transfer destination (see #Transfer destination);
  • An optional set of plugin invocations (see #Plugin invocation)


Transfer requests are mapped in Java Objects of class org.gcube.data.transfer.model.TransferRequest.

Transfer status

The Status interface provides information on the progress of the transfer identified by its related ticket ID. The transfer status provides information about :

  • The related transfer request;
  • Transfer statistics (i.e. transferredBytes, elapsed Time);
  • Destination file absolute location;
  • Overall status;
  • Error Message if any;

Data Transfer library

The data transfer library is a java library which serves applications as a client to data transfer facilities. In order to use the library, applications must declare the following dependency in their maven pom files :

<dependency>
  <groupId>org.gcube.data.transfer</groupId>
  <artifactId>data-transfer-library</artifactId>
</dependency>


The library is designed in order to offer a simple api to submit transfers to the selected services without dealing with :

  • http calls;
  • status monitoring;
  • transfer channel selection negotiation according to server's capabilities;

Submit a transfer

In order to submit a transfer to a chosen server, the application needs to get an instance of the class org.gcube.data.transfer.library.DataTransferClient. Instances of the client are obtained by calling on of the following static methods :

public static DataTransferClient getInstanceByEndpoint(String endpoint) throws UnreachableNodeException, ServiceNotFoundException;
 
public static DataTransferClient getInstanceByNodeId(String id) throws HostingNodeNotFoundException, UnreachableNodeException, ServiceNotFoundException;

To perform a transfer operation, application just need to invoke one of the exposed methods providing :

  • a transfer source (i.e. a java.io.File object or its absolute path);
  • a transfer destination a.k.a file destination name for the basic scenario (see #Transfer destination for more in-depth details);
  • optional set of Plugin invocations (see #Plugin invovation for more in-depth details).

Please note the library exposes different signature of the same logic in order to mask unwanted functionalities to clients i.e. the following three calls perform the same operation :

DataTransferClient client=DataTransferClient.getInstanceByEndpoint(...);
String localFile="..";
String transferredFileName="..";
client.localFile(localFile,transferredFileName);

Using object org.gcube.data.transfer.model.Destination (see #Transfer destination for more in-depth details).

DataTransferClient client=DataTransferClient.getInstanceByEndpoint(...);
String localFile="..";
String transferredFileName="..";
Destination dest=new Destination(transferredFileName);
client.localFile(localFile,dest);

Using object org.gcube.data.transfer.model.PluginInvocation (see #Plugin invovation for more in-depth details).

DataTransferClient client=DataTransferClient.getInstanceByEndpoint(...);
String localFile="..";
String transferredFileName="..";
Destination dest=new Destination(transferredFileName);
client.localFile(localFile,dest,Collections.<PluginInvocation> emptySet());

Transfer destination

For each transfer operation, clients are required to declare a destination definition using objects of the class org.gcube.data.transfer.model.Destination. Destination definitions include the following parameters :

  • String destination file name
    • the name that will be used for the transferred file in the remote service file system;
  • org.gcube.data.transfer.model.DestinationClashPolicy onExistingFileName
    • declares the policy to follow in case the specified destination file name already exists in the declared location(see #Destination Clash Policies for further information);
  • String persistence id
    • the persistence folder on the service runtime environment, identified by the target's application context name (see SmartGears for further information). Clients can use service capabilities in order to gather information on available context ids (See #Capabilities for further information). To use the default value (which identifies the data-transfer-service itself), clients can use the static member Destination.DEFAULT_PERSISTENCE_ID;
  • String subFolder

declare a destination sub-path starting from selected persistence folder;

  • Boolean createSubFolders
    • tells the service if it must consider or not the subFolder option;
  • org.gcube.data.transfer.model.DestinationClashPolicy onExistingSubFolder
    • declares the policy to follow in case the specified destination subFolder already exists in the declared persistence folder (see #Destination Clash Policies for further information);


Destination Clash Policies

Plugin invocation