Difference between revisions of "Data Transfer Agent"

From Gcube Wiki
Jump to: navigation, search
(Data Transfer Agent Library)
(Data Transfer Agent Service)
 
(24 intermediate revisions by 2 users not shown)
Line 1: Line 1:
__TOC__
+
<!-- CATEGORIES -->
 
+
[[Category:Developer's Guide]]
 +
<!-- END CATEGORIES -->
 
== Data Transfer Agent Service ==
 
== Data Transfer Agent Service ==
  
 
The Data Transfer Agent Service has been implemented with the aim of facilitating the transfer of data ( both structured and unstructured ) for the following use cases:
 
The Data Transfer Agent Service has been implemented with the aim of facilitating the transfer of data ( both structured and unstructured ) for the following use cases:
  
* Transfer of Local files from an external/internal client  to a remote GHN
+
* Transfer of Local files from an external/internal client  to a remote [https://gcore.wiki.gcube-system.org/gCube/index.php/Administrator_Guide GHN]
* File Transfer from a remote Data Source to a remote GHN using standard protocol ( HTTP, FTP, BitTorrent) by integrating the URLResolution Library.
+
* File Transfer from a remote Data Source to a remote [https://gcore.wiki.gcube-system.org/gCube/index.php/Administrator_Guide GHN] using standard protocol ( HTTP, FTP, WebDav, etc) by integrating the Apache VFS Library.
* File Transfer from a remote Data Source to the gCube Storage Manager ( MongoDB) using standard protocol ( HTTP, FTP, BitTorrent) by integrating the Storage Manager Library and  the URLResolution Library.
+
* File Transfer from a remote Data Source to the gCube Storage Manager ( MongoDB) using standard protocol ( HTTP, FTP, WebDav) by integrating the Storage Manager Library and  the Apache VFS Library..
 +
* File Transfer from a remote Data Source to  remove Data Storage.
 
* Tree based data transfer from a remote Data Source to a remote Tree-Based Storage.
 
* Tree based data transfer from a remote Data Source to a remote Tree-Based Storage.
  
The service make use of gRS to implement Local Transfer, Tree based Data Transfer and delivery of Transfer Outcomes to clients.  
+
The service make use of [[GRS2|gRS]] to implement Local Transfer, Tree based Data Transfer and delivery of Transfer Outcomes to clients.  
  
 
In addition  the service  exploit the Messaging infrastructure  in order to publish transfer statistics, that can be consumed by:
 
In addition  the service  exploit the Messaging infrastructure  in order to publish transfer statistics, that can be consumed by:
Line 54: Line 56:
 
The Data Transfer Agent Library is the CL implementing the API for Data Transfer. In particular the Library implement the API to contact the [[#Data Transfer Agent Service|Data Transfer Agent Service]]
 
The Data Transfer Agent Library is the CL implementing the API for Data Transfer. In particular the Library implement the API to contact the [[#Data Transfer Agent Service|Data Transfer Agent Service]]
  
The latest version of the library (1.1.0-SNAPSHOT) and the related javadoc artifacs can be downloaded from maven repository at [http://maven.research-infrastructures.eu/nexus/content/repositories/gcube-snapshots/org/gcube/data/transfer/agent-library/1.1.0-SNAPSHOT/]
+
The latest version of the library (1.2.0-SNAPSHOT) and the related javadoc artifacs can be downloaded from maven repository at [http://maven.research-infrastructures.eu/nexus/content/repositories/gcube-snapshots/org/gcube/data/transfer/agent-library/1.1.0-SNAPSHOT/]
  
 
In case of integration with a maven components the dependency to be included in the pom file is:
 
In case of integration with a maven components the dependency to be included in the pom file is:
Line 62: Line 64:
 
   <groupId>org.gcube.data.transfer</groupId>
 
   <groupId>org.gcube.data.transfer</groupId>
 
   <artifactId>agent-library</artifactId>
 
   <artifactId>agent-library</artifactId>
   <version>1.1.0-SNAPSHOT</version>
+
   <version>1.2.0-SNAPSHOT</version>
 
</dependency>
 
</dependency>
 
</pre>
 
</pre>
Line 115: Line 117:
 
=== ASync Operations===
 
=== ASync Operations===
  
In order to implement Aysnc Operations , the Data Transfer agent service exploit a DB ( it can be embedded or external) .
+
In order to implement Aysnc Operations , the Data Transfer agent service exploits a DB ( it can be embedded or external) .
  
The Async API of the Data Transfer Agent Service les't submit, monitor and retrieve transfer results as follows:
+
The Async APIs of the Data Transfer Agent Service let submit, monitor and retrieve transfer results as follows:
  
 
<source lang="java5">
 
<source lang="java5">
Line 144: Line 146:
 
<source lang="java5">
 
<source lang="java5">
 
 
String transferState="";
+
TransferStatus transferStatus= null ;
while (!transferState.contentEquals(TRANSFER_STATE_DONE)) {
+
 
try {
+
do {
+
try {
transferState = library.monitorTransfer(transferId);
+
 +
Thread.sleep(5000);
 +
transferStatus = TransferStatus.valueOf(library.monitorTransfer(transferId));
 
 
 
} catch (MonitorTransferException e) {
 
} catch (MonitorTransferException e) {
 
e.printStackTrace();
 
e.printStackTrace();
 
}
 
}
Thread.sleep(5000);
+
 
}
+
}  while (!transferStatus.hasCompleted())
 +
 
 
System.out.println("done!");
 
System.out.println("done!");
  
 
</source>
 
</source>
  
TransferOutcomes ones the transfer has completed can be also retrieved:
+
 
 +
The transfer status  can be also checked together with progress activities:
 +
 
 +
<source lang="java5">
 +
MonitorTransferReportMessage message = null;
 +
TransferStatus transferStatus= null ;
 +
do{
 +
try {
 +
 +
message = library.monitorTransferWithProgress(transferId);
 +
transferStatus = TransferStatus.valueOf(message.getTransferStatus());
 +
System.out.println("Status: "+ message.getTransferStatus());
 +
System.out.println("TotalBytes: "+ message.getTotalBytes());
 +
System.out.println("transferedBytes: "+ message.getBytesTransferred());
 +
System.out.println("totalTransfers: "+ message.getTotalTransfers());
 +
System.out.println("transfersCompleted: "+ message.getTransferCompleted());
 +
Thread.sleep(5000);
 +
}  catch (Exception e) {
 +
e.printStackTrace();
 +
}
 +
}
 +
while (!( transferStatus.hasCompleted()));
 +
 +
 
 +
</source>
 +
 
 +
TransferOutcomes once the transfer has completed can be also retrieved:
  
 
<source lang="java5">
 
<source lang="java5">
Line 170: Line 201:
  
 
</source>
 
</source>
 +
 +
=== Transfer Options===
 +
 +
The Agent Library methods take as parameters a number of transfer  options that have been modeled in the TransferOptions class. In details the following options can be customized:
 +
 +
<source lang="java5">
 +
 +
TransferOptions options = new TransferOptions();
 +
 +
options.setOverwriteFile(true); // set weather the file has to be overwritten or not on the destination
 +
 +
options.setType(StorageType.LocalGHN); //set the type of the Storage (LocalGHN, DataStorage, StorageManager)
 +
 +
options.setUnzipFile(false); // tells the service to unzip the file after the transfer
 +
 +
options.setCovertFile(true); // tells the service to perform a File Conversion after the transfer
 +
 +
options.setConversionType(ConversionType.GEOTIFF); // the type of conversion to perform
 +
 +
options.setDeleteOriginalFile(true); // tells the service to remove the original file after conversion or unzip.
 +
 +
options.setTransferTimeout(2, TimeUnit.SECONDS); //the timeout for each transfer object ( DEFAULT= 60 mins)
 +
 +
 +
</source>
 +
 +
=== Transfer Status===
 +
 +
When using the Asynchronous transfers facilities, each submitted transfer can be monitored using the MonitorTransfer method. The output of the operation  is a TransferStatus enum value ( defined on the Data Transfer Library). Details of the different enum values are the following:
 +
 +
* '''QUEUED''' : the transfer is queued;
 +
* '''STARTED''' : the transfer is started;
 +
* '''DONE''' : the transfer has been successfully completed;
 +
* '''DONE_WITH_ERRORS''' : the transfer has been completed, but there have been some transfer errors;
 +
* '''CANCEL''' : the transfer has been canceled;
 +
* '''FAILED''' : the transfer failed to start;
 +
 +
The enum expose some method as well to check easily the value:
 +
 +
* hasCompleted(): returns true is the valus is different than STARTED or QUEUED
 +
* hasErrors(): returns true is the value is equal to FAILED or  DONE_WITH_ERRORS
 +
 +
=== Transfer Monitor===
 +
 +
The monitorTransferWithProgress operation can be used to retrieved a report about the transfer activity. The following info are provided:
 +
 +
* ''transferId'' : the transfer ID assigned by the service.
 +
* ''transferStatus'' : see above
 +
* ''totalTransfers'' : the total number of transfers to perform
 +
* ''transfersCompleted'' : the number of completed transfers
 +
* ''totalBytes'' : the total size of the transfers in bytes
 +
* ''bytesTransfered'' : the transfer activity performed in bytes

Latest revision as of 17:30, 25 February 2014

Data Transfer Agent Service

The Data Transfer Agent Service has been implemented with the aim of facilitating the transfer of data ( both structured and unstructured ) for the following use cases:

  • Transfer of Local files from an external/internal client to a remote GHN
  • File Transfer from a remote Data Source to a remote GHN using standard protocol ( HTTP, FTP, WebDav, etc) by integrating the Apache VFS Library.
  • File Transfer from a remote Data Source to the gCube Storage Manager ( MongoDB) using standard protocol ( HTTP, FTP, WebDav) by integrating the Storage Manager Library and the Apache VFS Library..
  • File Transfer from a remote Data Source to remove Data Storage.
  • Tree based data transfer from a remote Data Source to a remote Tree-Based Storage.

The service make use of gRS to implement Local Transfer, Tree based Data Transfer and delivery of Transfer Outcomes to clients.

In addition the service exploit the Messaging infrastructure in order to publish transfer statistics, that can be consumed by:

  • Accounting statistics consumers
  • the Data Transfer Scheduler Service, which use messaging in order to consume Agent transfer results.

Architecture

The Data Transfer Agent Service is a gCore stateful service ( singleton) which implements Asynchronous transfer operations between node of a gCube infrastructures. All Server side operations are asynchronous, while the Data Transfer Agent Library CL implements on top of them synchronous operation as well.

In the case of asynchronous operations ( both Server and Client side) the service use a local DB in order to store transfer details and provide them to the clients.

Details on the architecture of the service can be found on the specification page Data Transfer Scheduler & Agent components

Installation and Configuration

The Data Transfer Agent Service is dynamically deployable on the infra using the gCube Enabling Layers, alternatively it can be deployed manually on a GHN downloading the full-gar artifact from [[1]] , which contains as well the needed dependencies. In this case the installation is triggered by:

gcore-deploy-service agent-service-full-1.2.0-SNAPSHOT.gar

Once installed the service is able to run without further configuration, by using some default configuration parameters. In case of configuration customization the file located under $GLOBUS_LOCATION/etc/agent-service-x.x.x/jndi-deploy.xml

The following parameters can be customized:

  • dbname : The name of the database used by the service ( default: transfer-agent-db)
  • dbConfigurationFile : the name of the configuration files for the DB ( default : db.properties)
  • supportedTransfers : The transfer protocol supported by the transfer which are going to be published on the IS as WS-Resource-Properties. This is useful if we want to have agent dedicated to support only a subgroup of protocol transfers, cause this info is going to be used by the Scheduler Service. (default: FTP,HTTP)
  • messaging: configure the usage of Messaging to deliver transfer outcomes. ( default: false)
  • connectionTimeout : the timeout in milliseconds configured when performing the transfer (default: 10000)
  • vfsRoot : the root folder of the agent VFS ( default : /tmp)


The dbConfigurationFile can be customized as well in order to provide for instance different DB backends. The relevant configuration parameters are:

  • datanucleus.ConnectionDriverName : It specifies the DB driver to use, in case a different DB backend needs to be used the related DB driver should be installed ( they are not provided by default) ( default: org.hsqldb.jdbcDriver)
  • datanucleus.ConnectionURL: the DB connectionURL (default: jdbc:hsqldb:/tmp/DataTransferAgent/transfer-agent-db)
  • javax.jdo.option.ConnectionUserName : the DB user name ( default: sa)
  • javax.jdo.option.ConnectionPassword : the DB user pass ( default: )

Data Transfer Agent Library

The Data Transfer Agent Library is the CL implementing the API for Data Transfer. In particular the Library implement the API to contact the Data Transfer Agent Service

The latest version of the library (1.2.0-SNAPSHOT) and the related javadoc artifacs can be downloaded from maven repository at [2]

In case of integration with a maven components the dependency to be included in the pom file is:

<dependency>
  <groupId>org.gcube.data.transfer</groupId>
  <artifactId>agent-library</artifactId>
  <version>1.2.0-SNAPSHOT</version>
</dependency>


The Library offers both sync and asynchronous data transfer operations.

Sync Operations

Just to give a initial overview of the API the following is a simple code snippet which invokes a synchronous data transfer of a list of URIs remotely to a GHN which runs an instance of the Data Transfer agent service.

ScopeProvider.instance.set("/gcube/devsec/");
 
AgentLibrary library = transferAgent().at("test.research-infrastructures.eu", 9090).build();
 
ArrayList<URI> input = new ArrayList<URI>();
 
URI uri1 = new URI("ftp://pcd4science3.cern.ch/test.txt"),
URI uri2 = new URI("http://dl.dropbox.com/u/8704957/commons-dbcp-1.3.jar")};
 
input.add(uri1);
input.add(uri2);
 
String outPath = "/test";
 
TransferOptions options  = new TransferOptions();
 
options = new TransferOptions();
options.setOverwriteFile(true);
options.setType(StorageType.LocalGHN);
options.setUnzipFile(false); //it can be used to unzip the archives ( only zip for now) after the transfer
 
ArrayList<FileTransferOutcome> outcomes = library.startTransferSync(input, outPath, options);

Please note that in order to use the static methods from the Agent Library this import has to be provided:

import static org.gcube.datatransfer.agent.library.proxies.Proxies.*;

please note as well that the outPath is not an absolute path within the agent FS, but it will be appended to the VFS root path configured on Agent side.

e.g. if the VFS root path on Agent side is /tmp, the files will be stored in /tmp/test/...


ASync Operations

In order to implement Aysnc Operations , the Data Transfer agent service exploits a DB ( it can be embedded or external) .

The Async APIs of the Data Transfer Agent Service let submit, monitor and retrieve transfer results as follows:

ScopeProvider.instance.set("/gcube/devsec/");
 
AgentLibrary library = transferAgent().at("test.research-infrastructures.eu", 9000).build();
 
ArrayList<URI> inputs = new ArrayList<URI>();
inputs.add(new URI("http://img821.imageshack.us/img821/6658/gisviewerdiagram.png"));
inputs.add(new URI("http://img11.imageshack.us/img11/9008/geoexplorerdiagram.png"));
 
String outPath = "./data";
 
TransferOptions options = new TransferOptions();
options.setOverwriteFile(true);
options.setType(StorageType.LocalGHN);
options.setUnzipFile(false);
 
String transferId = library.startTransfer(inputs, outPath, options);

The transferid coming from the startTransfer operaton can be used to monitor the transfer status :

 
TransferStatus transferStatus= null ;
 
do {
 try {
 
 Thread.sleep(5000);			
 transferStatus = TransferStatus.valueOf(library.monitorTransfer(transferId));
 
} catch (MonitorTransferException e) {
	e.printStackTrace();
		}
 
}  while (!transferStatus.hasCompleted())
 
System.out.println("done!");


The transfer status can be also checked together with progress activities:

MonitorTransferReportMessage message = null;
		TransferStatus transferStatus= null ;
		do{
			try {
 
				message = library.monitorTransferWithProgress(transferId);
				transferStatus = TransferStatus.valueOf(message.getTransferStatus());
				System.out.println("Status: "+ message.getTransferStatus());
				System.out.println("TotalBytes: "+ message.getTotalBytes());
				System.out.println("transferedBytes: "+ message.getBytesTransferred());
				System.out.println("totalTransfers: "+ message.getTotalTransfers());
				System.out.println("transfersCompleted: "+ message.getTransferCompleted());
				Thread.sleep(5000);
			}  catch (Exception e) {
				e.printStackTrace();
			}
		}
		while (!( transferStatus.hasCompleted()));

TransferOutcomes once the transfer has completed can be also retrieved:

ArrayList<FileTransferOutcome> outcomes = library.getTransferOutcomes(transferId, FileTransferOutcome.class);
 
for (FileTransferOutcome outcome : outcomes)
	System.out.println("file: "+outcome.getDest()+"; "+ (outcome.isSuccess() ? "SUCCESS" : "FAILURE"));			
}

Transfer Options

The Agent Library methods take as parameters a number of transfer options that have been modeled in the TransferOptions class. In details the following options can be customized:

TransferOptions options = new TransferOptions();
 
options.setOverwriteFile(true); // set weather the file has to be overwritten or not on the destination
 
options.setType(StorageType.LocalGHN); //set the type of the Storage (LocalGHN, DataStorage, StorageManager)
 
options.setUnzipFile(false); // tells the service to unzip the file after the transfer
 
options.setCovertFile(true); // tells the service to perform a File Conversion after the transfer 
 
options.setConversionType(ConversionType.GEOTIFF); // the type of conversion to perform
 
options.setDeleteOriginalFile(true); // tells the service to remove the original file after conversion or unzip.
 
options.setTransferTimeout(2, TimeUnit.SECONDS); //the timeout for each transfer object ( DEFAULT= 60 mins)

Transfer Status

When using the Asynchronous transfers facilities, each submitted transfer can be monitored using the MonitorTransfer method. The output of the operation is a TransferStatus enum value ( defined on the Data Transfer Library). Details of the different enum values are the following:

  • QUEUED : the transfer is queued;
  • STARTED : the transfer is started;
  • DONE : the transfer has been successfully completed;
  • DONE_WITH_ERRORS : the transfer has been completed, but there have been some transfer errors;
  • CANCEL : the transfer has been canceled;
  • FAILED : the transfer failed to start;

The enum expose some method as well to check easily the value:

  • hasCompleted(): returns true is the valus is different than STARTED or QUEUED
  • hasErrors(): returns true is the value is equal to FAILED or DONE_WITH_ERRORS

Transfer Monitor

The monitorTransferWithProgress operation can be used to retrieved a report about the transfer activity. The following info are provided:

  • transferId : the transfer ID assigned by the service.
  • transferStatus : see above
  • totalTransfers : the total number of transfers to perform
  • transfersCompleted : the number of completed transfers
  • totalBytes : the total size of the transfers in bytes
  • bytesTransfered : the transfer activity performed in bytes