How to use Data Transfer 2
Data Transfer 2 is one of the subsystems forming the gCube Data_Transfer_Facilities. It aims to provide gCube Applications a common layer for efficient and transparent data transfer towards gCube SmartGear nodes. It's designed as a client service architecture exploiting plugin design pattern. A generic overview and its design are described here
Following sections describe how to use and interact with the involved components.
- 1 Data Transfer Service
- 2 Data Transfer library
- 3 REST Invocations
- 4 Data Transfer Plugins
- 4.1 General Purpose Plugins
- 4.2 Specific Plugins
Data Transfer Service
The Data Transfer Service is a SmartGears-aware web application developped on top of [jersey] framework. Its main functionalities are :
- receive and serve data transfer requests;
- expose capabilities;
At startup it gathers information on :
- current network configuration (i.e. exposed hostname, available ports) in order to negotiate transfer channel with clients;
- available data-transfer plugins
The data transfer service is released as a war with the following maven coordinates
It needs to be hosted in a SmartGears installation in order to run. Please refer to SmartGears for further information.
In this section we will describe the http interfaces exposed by the service.
The Capabilities interface exposes information regarding :
- Instance details (i.e. hostname, port, nodeId)
- Available plugins
- Available persistence Ids
Capabilities are mapped in a Java Object of class org.gcube.data.transfer.model.TransferCapabilities.
The Requests interface receives transfer requests from clients, returning the associated ticket ID if the requests has been successfully registered. E request is expected to specify :
- The transfer settings decided by the caller client (including the data source);
- The transfer destination (see #Transfer destination);
- An optional set of plugin invocations (see #Plugin invocation)
Transfer requests are mapped in Java Objects of class org.gcube.data.transfer.model.TransferRequest.
The Status interface provides information on the progress of the transfer identified by its related ticket ID. The transfer status provides information about :
- The related transfer request;
- Transfer statistics (i.e. transferredBytes, elapsed Time);
- Destination file absolute location;
- Overall status;
- Error Message if any;
Data Transfer library
The data transfer library is a java library which serves applications as a client to data transfer facilities. In order to use the library, applications must declare the following dependency in their maven pom files :
<dependency> <groupId>org.gcube.data.transfer</groupId> <artifactId>data-transfer-library</artifactId> </dependency>
The library is designed in order to offer a simple api to submit transfers to the selected services without dealing with :
- http calls;
- status monitoring;
- transfer channel selection negotiation according to server's capabilities;
Submit a transfer
In order to submit a transfer to a chosen server, the application needs to get an instance of the class org.gcube.data.transfer.library.DataTransferClient. Instances of the client are obtained by calling on of the following static methods :
public static DataTransferClient getInstanceByEndpoint(String endpoint) throws UnreachableNodeException, ServiceNotFoundException; public static DataTransferClient getInstanceByNodeId(String id) throws HostingNodeNotFoundException, UnreachableNodeException, ServiceNotFoundException;
To perform a transfer operation, application just need to invoke one of the exposed methods providing :
- a transfer source (i.e. a java.io.File object or its absolute path);
- a transfer destination a.k.a file destination name for the basic scenario (see #Transfer destination for more in-depth details);
- optional set of Plugin invocations (see #Plugin invovation for more in-depth details).
Please note the library exposes different signature of the same logic in order to mask unwanted functionalities to clients i.e. the following three calls perform the same operation :
DataTransferClient client=DataTransferClient.getInstanceByEndpoint(...); String localFile=".."; String transferredFileName=".."; client.localFile(localFile,transferredFileName);
Using object org.gcube.data.transfer.model.Destination (see #Transfer destination for more in-depth details).
DataTransferClient client=DataTransferClient.getInstanceByEndpoint(...); String localFile=".."; String transferredFileName=".."; Destination dest=new Destination(transferredFileName); client.localFile(localFile,dest);
Using object org.gcube.data.transfer.model.PluginInvocation (see #Plugin invovation for more in-depth details).
DataTransferClient client=DataTransferClient.getInstanceByEndpoint(...); String localFile=".."; String transferredFileName=".."; Destination dest=new Destination(transferredFileName); client.localFile(localFile,dest,Collections.<PluginInvocation> emptySet());
For each transfer operation, clients are required to declare a destination definition using objects of the class org.gcube.data.transfer.model.Destination. Destination definitions include the following parameters :
- destination file name (String)
- the name that will be used for the transferred file in the remote service file system;
- onExistingFileName (org.gcube.data.transfer.model.DestinationClashPolicy) [default value = ADD_SUFFIX]
- declares the policy to follow in case the specified destination file name already exists in the declared location(see #Destination Clash Policies for further information);
- persistence id (String) [default value = Destination.DEFAULT_PERSISTENCE_ID]
- the persistence folder on the service runtime environment, identified by the target's application context name (see SmartGears for further information). Clients can use service capabilities in order to gather information on available context ids (See #Capabilities for further information). To use the default value (which identifies the data-transfer-service itself), clients can use the static member Destination.DEFAULT_PERSISTENCE_ID;
- subFolder (String) [default value = null]
- declare a destination sub-path starting from selected persistence folder;
- createSubFolders (Boolean) [default value=false]
- tells the service if it must consider or not the subFolder option;
- onExistingSubFolder org.gcube.data.transfer.model.DestinationClashPolicy [default value = APPEND]
- declares the policy to follow in case the specified destination subFolder already exists in the declared persistence folder (see #Destination Clash Policies for further information);
Destination Clash Policies
The enum class org.gcube.data.transfer.model.DestinationClashPolicy represents the available policies in case of file system clashes on server-side. Following is the set of supported clash policies and a brief description :
- abort the transfer;
- overwrite destination by previously deleting the existent one;
- adds a bracket-isolated counter at the end of the clashing name (i.e. myFileName becomes myFileName(1));
- adds the transferred content to the existing one.
Plugin invocations are declared by using instances of the class org.gcube.data.transfer.model.PluginInvocation.
These objects are formed by the following members :
- pluginId (String)
- the id of the installed plugin. Available plugins are listed in the server capabilities (see#Capabilities for more information);
- parameters (Map<String,String>)
- map of parameter-name -> parameter-value to be used in plugin invocations. Please use the static member PluginInvocation.DESTINATION_FILE_PATH as parameter value, for those parameters that need the actual destination's absolute path;
- From gCube 4.9.0 the <TransferMethod> option has been removed from the PATH and will be handled as the query parameter "method" (default value "FileUpload")
The service offers a REST interface for simple transfer requests / handling in the following format :
The following query parameters can be specified :
- create-dirs [Default : false]
- on-existing-file [Default : ADD_SUFFIX]
- on-existing-dir [Default : APPEND]
The following FORM DATA parameters can also be used :
- uploadedFile : the file uploaded by the client
- plugin-invocations : JSON representation of plugin invocation set
THREDDS upload and metadata publication via cURL
The following cURL command has the following behaviour :
- 1. Uploads the file to "thredds" destination, subfolder "public/netcdf/myCatalog"
- 2. Invokes plugin "SIS/GEOTK"
curl -F "uploadedFile=@/home/fabio/raster-1465493223336242.nc" --header "gcube-token:<GCUBE-TOKEN>" http://thredds-d-d4s.d4science.org/data-transfer-service/gcube/service/REST/FileUpload/thredds/public/netcdf/myCatalog --form "plugin-invocations="SIS/GEOTK""
Data Transfer Plugins
This section aims to describe implemented plugins in order to help developers exploit their functionalities. Plugins are modules that are optionally invoked after the transfer is complete. Plugin invocations are declared within the Transfer request, specifying a set of [PluginInvocation] instances. Following sections list respectively :
- #General Purpose Plugins; which are available on every SmartGears node.
- #Specific Plugins; meant to address a particular installation.
General Purpose Plugins
This section describes general purposes plugin, which are included in default distributions. This means that these plugins are always available on a SmartGears node.
Decompress Archive Plugin
The 'Decompress Archive' plugin extracts the content of an archive to a specified path. The implementing module (needed at service side) is
<dependency> <groupId>org.gcube.data.transfer</groupId> <artifactId>decompress-archive-plugin</artifactId> </dependency>
- ID : "DECOMPRESS"
Parameters List :
- "DESTINATION" : [String value] The folder destination of uncompressed content expressed as a path relative to SOURCE_ARCHIVE. Default is same directory of SOURCE_ARCHIVE;
- "OVERWITE_DESTINATION" : [Boolean value] Set true in order to overwrite DESTINATION content. Default is false;
- "DELETE_ARCHIVE" : [Boolean value] Set true in order to delete SOURCE_ARCHIVE after extracting content. Default is false;
DataTransferClient client=DataTransferClient.getInstanceByEndpoint(...); String localFile=".."; String transferredFileName=".."; Map<String,String> params=new HashMap<>(); params.put("DESTINATION", "myFolder"); params.put("SOURCE_ARCHIVE", PluginInvocation.DESTINATION_FILE_PATH); Destination dest=new Destination(transferredFileName); client.localFile(localFile,dest,Collections.<PluginInvocation> singleton(new PluginInvocation("DECOMPRESS",params)));
This section lists plugins modules designed to address a particular installation (typically the management of third party applications). They will be available only on certain installation nodes, depending on needs.
Thredds Plugin Suite
Thredds plugin suite contains a set of plugins aimed to manage a Thredds installation in a gCube infrastructure. The implementing module (needed at service side) is
<dependency> <groupId>org.gcube.data.transfer</groupId> <artifactId>sis-geotk-plugin</artifactId> </dependency>
Following sections describe plugins exposed by this module.
=PLUGIN INFO OUTPUT
Each of the following plugins expose an info object of class 'org.gcube.data.transfer.model.plugins.thredds.ThreddsInfo'. Following is a serialized example of this object :
"name": "Thredds Root Catalog",
"name": "Catalogs of Virtual Research Environments VRE",
"name": "preprodVRECatalog Catalog",
The 'SIS/GEOTK' plugin extracts metadata information from netcdf files by exploiting [apache/sis library features and publishes ISO metadata entries in GeoNetwork.
- ID : "SIS/GEOTK"
Parameters List :
- "GEONETWORK_CATEGORY" : [String value] GeoNetwork category for publiehd metadata. Default is 'Dataset';
- "GEONETWORK_STYLESHEET" : [String value] GeoNetwork stylesheet for publiehd metadata. Default is '_none_';
DataTransferClient client=DataTransferClient.getInstanceByEndpoint(...); String localFile=".."; String transferredFileName=".."; Destination dest=new Destination(transferredFileName); client.localFile(localFile,dest,new PluginInvocation("SIS/GEOTK"));
REGISTER CATALOG Plugin
The 'REGISTER CATALOG' plugin modifies Thredds' main catalog.xml file in order to add/update a reference to the transferred catalog file.
- ID : "REGISTER_CATALOG"
Parameters List :
- "CATALOG_REFERENCE" : [String value] The reference title to be set under catalog.xml which will link to the transferred catalog file
DataTransferClient client=DataTransferClient.getInstanceByEndpoint(...); Destination dest=new Destination(); dest.setPersistenceId("thredds"); dest.setDestinationFileName(reference.replace(" ", "_")+".xml"); dest.setOnExistingFileName(DestinationClashPolicy.REWRITE); PluginInvocation invocation=new PluginInvocation("REGISTER_CATALOG"); invocation.setParameters(Collections.singletonMap("CATALOG_REFERENCE", reference)); client.localFile(catalogFile, dest,invocation);