Difference between revisions of "WorkflowJDLAdaptor"

From Gcube Wiki
Jump to: navigation, search
Line 1: Line 1:
 
=Overview=
 
=Overview=
This adaptor as part of the adaptors offered by the [[WorkflowEngine]] constructs an [[ExecutionPlan]] based on the description of a job defined in a description using the JDL syntax. This description can be of a single job or it can include a DAG of jobs. The JDL description is parsed using the JDLParser and the adaptor then processed the retrieved ParsedJDLInfo to create the ExecutionPlan
+
This adaptor as part of the adaptors offered by the [[WorkflowEngine]] constructs an [[ExecutionPlan]] based on the description of a job defined in JDL syntax. This description can be of a single job or it can include a Directed Acyclic Graph (DAG) of jobs. The JDL description is parsed and the adaptor then processed the retrieved parsed info to create the [[ExecutionPlan]]. During the parsing procedure, the jobs Input and Output sandboxes are examined to determin what is the overall input and output set of the workflow. The input set is covered by the submitting client through the attached resources he provides. The output resources of the workflow are constructed from the elements found in all the jobs Output Sandboxes.
  
The AdaptorJDLResources provided include all the AttachedJDLResource items that are expected as input from the jobs defined. The output resources of the workflow, retrievable as OutputSandboxJDLResource instances, are constructed from the elements found in all the jobs Output Sandbox.
+
=JDL attributes=
  
Depending on the configuration, the adaptor will create the ExecutionPlan that will orchestrate the execution of a DAG of jobs either as a series of SequencePlanElement and FlowPlanElement elements or as a single BagPlanElement. The first case allows for a well defined series of operation but since the creation of such a series of constructs is an exercise on graph topological sorting, which as a problem can provide multiple answers that depending on the nature of the original graph might restrict the parallelization factor of the overall DAG, in cases of complex graphs, this case can damage the parallelization capabilities of n execution plan. The second case is much more dynamic. It allows for execution time decision making of the nodes to be executed. This of course comes as a tradeoff with increased complexity at runtime with respect to the well defined plan, but it can provide the optimal parallelization capabilities.
+
=Highlights=
 
+
==Parellization factor==
Staging of input files for the executables is performed at a level of Input Sandbox defined for each job. The resources that are attached to the adaptor are stored in the Storage System and are retrieved in the node that hosts that defines the input sandbox that needs them. The files declared in the Output Sandbox of a job are stored in the Storage System and information on the way to retrieve the output is provided through the OutputSandboxJDLResource which is valid after the completion of the execution.
+
Depending on the configuration, the adaptor will create the [[ExecutionPlan]] that will orchestrate the execution of a DAG of jobs either as a series of ppSequencePlanElement]] and [[FlowPlanElement]] elements or as a single [[BagPlanElement]]. The first case allows for a well defined series of operation but since the creation of such a series of constructs is an exercise on graph topological sorting, which as a problem can provide multiple answers that depending on the nature of the original graph might restrict the parallelization factor of the overall DAG, in cases of complex graphs, this case can damage the parallelization capabilities of the overall plan. The second case is much more dynamic. It allows for execution time decision making of the nodes to be executed. This of course comes as a tradeoff with increased complexity at runtime with respect to the well defined plan, but it can provide optimal parallelization capabilities.
 +
==Staging==
 +
Staging of input for the executables constituting the execution units of the jobs, is performed at a level of Input Sandbox defined for each job. The resources that are attached to the adaptor are stored in the [[StorageSystem]] and are retrieved in the node that hosts the Input Sandbox that declares them. The files declared in the Output Sandbox of a job are stored in the [[StorageSystem]] and information on the way to retrieve the output is provided through the Output Resources defined by the adaptor and are valid after the completion of the execution.
  
 
=Known limitations=
 
=Known limitations=

Revision as of 16:44, 30 January 2010

Overview

This adaptor as part of the adaptors offered by the WorkflowEngine constructs an ExecutionPlan based on the description of a job defined in JDL syntax. This description can be of a single job or it can include a Directed Acyclic Graph (DAG) of jobs. The JDL description is parsed and the adaptor then processed the retrieved parsed info to create the ExecutionPlan. During the parsing procedure, the jobs Input and Output sandboxes are examined to determin what is the overall input and output set of the workflow. The input set is covered by the submitting client through the attached resources he provides. The output resources of the workflow are constructed from the elements found in all the jobs Output Sandboxes.

JDL attributes

Highlights

Parellization factor

Depending on the configuration, the adaptor will create the ExecutionPlan that will orchestrate the execution of a DAG of jobs either as a series of ppSequencePlanElement]] and FlowPlanElement elements or as a single BagPlanElement. The first case allows for a well defined series of operation but since the creation of such a series of constructs is an exercise on graph topological sorting, which as a problem can provide multiple answers that depending on the nature of the original graph might restrict the parallelization factor of the overall DAG, in cases of complex graphs, this case can damage the parallelization capabilities of the overall plan. The second case is much more dynamic. It allows for execution time decision making of the nodes to be executed. This of course comes as a tradeoff with increased complexity at runtime with respect to the well defined plan, but it can provide optimal parallelization capabilities.

Staging

Staging of input for the executables constituting the execution units of the jobs, is performed at a level of Input Sandbox defined for each job. The resources that are attached to the adaptor are stored in the StorageSystem and are retrieved in the node that hosts the Input Sandbox that declares them. The files declared in the Output Sandbox of a job are stored in the StorageSystem and information on the way to retrieve the output is provided through the Output Resources defined by the adaptor and are valid after the completion of the execution.

Known limitations

  • Supported jobs are only those of type Normal
  • The case of node collocation is not handled correctly because multiple BoundaryPlanElement are created. The node used is still a single one but it is contacted multiple times and data locality is not exploited correctly.
  • The arguments defined for an executable in the respective JDL attribute, when passed to the ShellPlanElement are split using the space character (' ') as a delimiter. This way no space containing phrase can be passed a single argument
  • The Retry and Shallow Retry attributes of the JDl are treated equally and are used at the level of ShellPlanElement and not at the level of BoundaryPlanElement
  • After the execution completion not cleanup in the Storage system is done.