Difference between revisions of "WorkflowGridAdaptor"

From Gcube Wiki
Jump to: navigation, search
(Plan Template)
Line 1: Line 1:
 
=Overview=
 
=Overview=
 
This adaptor constructs an [[ExecutionPlan]] that can mediate to submit a job described through a JDL file using a gLite Grid UI node. After its submission the job is monitored for its status and once completed the output files are retrieved and stored in the [[StorageSystem]]. The resources that are provided and need to be moved to the Grid UI are all transfered through the [[StorageSystem]]. They are stored once the plan is constructed and are then retrieved once the execution is started. This does not include the provided user proxy which is transfered as an attachment directly to the remote node to allow secure transfer if the SSL communication option is enabled.
 
This adaptor constructs an [[ExecutionPlan]] that can mediate to submit a job described through a JDL file using a gLite Grid UI node. After its submission the job is monitored for its status and once completed the output files are retrieved and stored in the [[StorageSystem]]. The resources that are provided and need to be moved to the Grid UI are all transfered through the [[StorageSystem]]. They are stored once the plan is constructed and are then retrieved once the execution is started. This does not include the provided user proxy which is transfered as an attachment directly to the remote node to allow secure transfer if the SSL communication option is enabled.
 +
 
=Plan Template=
 
=Plan Template=
 
The entire execution process takes place in the gLite Grid UI node. This node is picked from the [[InformationSystem]] and is currently chosen randomly from all the available ones. Currently once the node has been picked, the execution cannot be moved to a different one even if there is a problem communicating with that node. The execution that takes place is a series of steps executed sequentially. These steps include the following:
 
The entire execution process takes place in the gLite Grid UI node. This node is picked from the [[InformationSystem]] and is currently chosen randomly from all the available ones. Currently once the node has been picked, the execution cannot be moved to a different one even if there is a problem communicating with that node. The execution that takes place is a series of steps executed sequentially. These steps include the following:
Line 16: Line 17:
  
 
=Known limitations=
 
=Known limitations=
*Only retrieve output files if completion is successful.
+
Some of the know limitations of the currently created plan are listed below. This limitations are mainly because of simplicity of the plan and not because off the lack of constructs to cover them. This list will be updated in later versions of the adaptor that will enrich the produced plan.
*Handle errors and not let every exception stop the execution.
+
*gLite Grid UI node selection
*Allow for relocation of execution
+
*:Have a more elaborate selection strategy for the node submitting the jobs
*If relocated cancel previous execution
+
*Relocation of execution
 +
*:Once the UI node has been picked it cannot be moved. This means that if after picking it the node becomes unreachable, the whole workflow is lost. Allow relocation, cancelation of proevious submittion and resubmission, continue monitoring of previously submitted job, etc
 +
*Error Handling
 +
*:Now all errors are fatal. Be more resilient when errors are non critical
 +
*Results retrieval
 +
*:Only retrieve output files if completion is successful
 
*Add SSL option in communication
 
*Add SSL option in communication
 
*Allow multiple JDLs and collection style submission
 
*Allow multiple JDLs and collection style submission
*Delete files stored in Storage System if error in plan construction and after completion
+
*Cleanup files stored in [[StorageSystem]] as intermediate steps after completion
 +
*Allow user cancellation

Revision as of 18:08, 29 January 2010

Overview

This adaptor constructs an ExecutionPlan that can mediate to submit a job described through a JDL file using a gLite Grid UI node. After its submission the job is monitored for its status and once completed the output files are retrieved and stored in the StorageSystem. The resources that are provided and need to be moved to the Grid UI are all transfered through the StorageSystem. They are stored once the plan is constructed and are then retrieved once the execution is started. This does not include the provided user proxy which is transfered as an attachment directly to the remote node to allow secure transfer if the SSL communication option is enabled.

Plan Template

The entire execution process takes place in the gLite Grid UI node. This node is picked from the InformationSystem and is currently chosen randomly from all the available ones. Currently once the node has been picked, the execution cannot be moved to a different one even if there is a problem communicating with that node. The execution that takes place is a series of steps executed sequentially. These steps include the following:

  • Contact the remote node
  • Retrieval of the data stored in the StorageSystem and these include the resources marked as Configuration, Input Data, and JDL description
  • Submit the job using the provided JDL file and optionally any configuration additionally provided using the provided user proxy certificate
  • Go into a loop until either the job is completed or a timeout has expired (If a timeout has been set)
    • Wait for a defined period
    • Retrieve the job status
    • Retrieve the job logging info
    • Process the results of the above two steps
  • Check the reason the loop ended
  • If a timeout happened, cancel the job
  • Otherwise retrieve the output files of the job

Known limitations

Some of the know limitations of the currently created plan are listed below. This limitations are mainly because of simplicity of the plan and not because off the lack of constructs to cover them. This list will be updated in later versions of the adaptor that will enrich the produced plan.

  • gLite Grid UI node selection
    Have a more elaborate selection strategy for the node submitting the jobs
  • Relocation of execution
    Once the UI node has been picked it cannot be moved. This means that if after picking it the node becomes unreachable, the whole workflow is lost. Allow relocation, cancelation of proevious submittion and resubmission, continue monitoring of previously submitted job, etc
  • Error Handling
    Now all errors are fatal. Be more resilient when errors are non critical
  • Results retrieval
    Only retrieve output files if completion is successful
  • Add SSL option in communication
  • Allow multiple JDLs and collection style submission
  • Cleanup files stored in StorageSystem as intermediate steps after completion
  • Allow user cancellation