Difference between revisions of "Execution Engine Specification"

From Gcube Wiki
Jump to: navigation, search
(Created page with 'This is part of the Facilities Specification Template. == Overview == A brief overview of the subsystem should be here. It should include the key features. === Key featur…')
 
Line 1: Line 1:
This is part of the [[Facilities Specification Template]].
 
 
 
== Overview ==
 
== Overview ==
 +
The Execution Engine aims to execute arbitrarily complex Execution Plans. Execution Plans are plans for the invocation of code components (aka invocables, i.e. services, binary executables, scripts, …) that ensures that prerequisite data are prepared and delivered to their consumers by defining the flow of data and/or control. The initial Execution Plans provided for execution by the Execution Engine originate from a [[Workflow Engine Specification | WorkflowEngine]] instance. In addition, since the Execution Engine supports distributed execution, it can forward subplans of its initial Execution Plan to other Execution Engine Instances. In this way, one can execute any kind of workflow on top of a distributed computational infrastructure.
  
A brief overview of the subsystem should be here. It should include the key features.  
+
When the Workflow Engine is acting in the context of the gCube platform, it provides a gCube compliant Web Service interface. This Web Service acts as the front end not only to Execution facilities, but it is also the "face" of the component with respect to the gCube platform. The Running Instance profile of the service is the placeholder where the underlying Execution Engine instance Execution Environment Providers pushes information that need to be made available to other engine instances. Configuration properties that are used throughout the Workflow Engine instance are retrieved by the appropriate technology specific constructs and are used to initiate services and providers once the service starts.
  
 
=== Key features ===
 
=== Key features ===
  
;Orchestration for external computational and storage infrastructures.
 
:PE2ng allows users and systems to exploit computational resources that reside on multiple eInfrastructures in a single complex process workflow.
 
;Native computational infrastructure provider and manager.
 
:PE2ng can be used to exploit the computational capacity of the D4Science Infrastructure by executing tasks directly on the latter.
 
 
;Control and monitoring of a processing flow execution.
 
;Control and monitoring of a processing flow execution.
:PE2ng provides progress reporting and control constructs based on an event mechanism for tasks operating both on the D4Science and external infrastructures.
+
:The Execution Engine provides progress reporting and control constructs based on an event mechanism for tasks operating both on the D4Science and external infrastructures.
;Handling of data staging among different storage providers.
+
:All PE2ng Adaptors handle data staging in a transparent way, without requiring any kind on external intervention.
+
 
;Handling of data streaming among computational elements.
 
;Handling of data streaming among computational elements.
:PE2ng exploits the high throughput point to point on demand communication facilities offered by [[Result Set Components|gRS2]]
+
:PE2ng exploits the high throughput point to point on demand communication facilities offered by [[Result Set components|gRS2]]
 
;Expressive and powerful execution plan language
 
;Expressive and powerful execution plan language
:The execution plan elements comprising the language can execute literally anything.
+
:The execution plan elements comprising the language can execute literally anything. In addition, the Execution Engine is technology unaware regarding the components it can invoke, handling in the same uniform manner executables such as SOAP Web Services & WSRF, HTTP API (RESTful WS), various executables (including shell scripts), Java Objects etc.
 +
;Multiple ways of invoking executables
 +
:In-process: ultra-high performance, no security boundary crossing, low need for data exchanges
 +
:Intra-process: high throughput and performance, local security boundaries crossed
 +
:Intra-node: low throughput (depending on network), organisational security boundaries crossed
 +
;Advanced error handling support through contigency reaction
 +
:Each Execution Plan element which invokes executables can be annotated with contingency reaction triggers.
 
;Unbound extensibility via providers for integration with different environments.
 
;Unbound extensibility via providers for integration with different environments.
 
:The system is designed in an extensible manner, allowing the transparent integration with a variety of providers for storage, resource registries, reporting systems, etc.
 
:The system is designed in an extensible manner, allowing the transparent integration with a variety of providers for storage, resource registries, reporting systems, etc.
;Alignment with cloud computing principles
+
 
:Application as a Service (AaaS), Plarform as a Service (PaaS), Infrastructure as a Service (IaaS)
+
;User Interface
+
:PE2ng offers a User Interface to launch and monitor commonly used workflows and aims to provide a fully fledged Graphical User Interface for the graphical composition of arbitrary workflows.
+
  
 
== Design ==
 
== Design ==
  
 
=== Philosophy ===
 
=== Philosophy ===
This is the rationale behind the design. An example will be provided.  
+
The Execution Engine is designed to support an expressive, feature-rich workflow language. It aims to enable the execution of arbitrarily complex workflows of literally all kinds by offering a wide array of constructs, namely Execution Plan Elements, which can be used to invoke any kind of executable or to group collections of elements in execution flow structures. The uniform handling of such constructs by the Execution Engine allows the construction of such arbitrarily complex workflows.
 +
 
 +
As a constituent part of PE2ng, the Execution Engine is designed with a layered architecture decoupling the business domain, the infrastructure specific logic and the core execution functionality therefore allowing core re-usage to a multitude of use cases and avoiding sub-optimal compromises of strictly agnostic solutions.
  
 
=== Architecture ===
 
=== Architecture ===
The main software components forming the subsystem should be identified and roughly described. An architecture diagram has to be added here. A template for the representation of the architecture diagram will be proposed together with an opensource tool required to produce it.  
+
The Execution Engine comprises a single component, whose internal architecture corresponds to the constructs it provides.
 +
This grouping can be summarized as follows:
 +
*Execution Elements
 +
*Data Types
 +
*Events
 +
*Contingencies
  
 
== Deployment ==
 
== Deployment ==
Usually, a subsystem consists of a number of number of components. This section describes the setting governing components deployment, e.g. the hardware components where software components are expected to be deployed. In particular, two deployment scenarios should be discussed, i.e. Large deployment and Small deployment if appropriate. If it not appropriate, one deployment diagram has to be produced.
 
  
=== Large deployment ===
+
The Execution Engine, in its service wrapped version, should be deployed at:
 
+
*Each node which should participate in the execution of Execution Plans of local or remote origin.
A deployment diagram suggesting the deployment schema that maximizes scalability should be described here.
+
*Each node which is aimed to act as a gateway to external infrastructures.
  
 +
=== Large deployment ===
 +
In case of high demands for computational power, the Execution Engine should be deployed on as many nodes as possible, so that the Workflow Engine instances which contact it are able to contact a large number of nodes and distribute the computational load evenly across the infrastructure.
 +
[[File:ExecutionEngine_LargeDeployment.png|800px|center|Execution Engine large deployment]]
 
=== Small deployment ===
 
=== Small deployment ===
  
A deployment diagram suggesting the "minimal" deployment schema should be described here.
+
If the processing requirements in the infrastructure are low and/or there is no need to contact external infrastructures, the Execution Engine can be deployed only at the node which hosts also the Workflow Engine and acts as an entry point for incoming workflow processing requests. This means that execution will take place only at that node, locally. In this minimal deployment scenario, one need just deploy the Execution Engine as a library.
 +
[[File:ExecutionEngine_SmallDeployment.png|800px|center|Execution Engine small deployment]]
  
 
== Use Cases ==
 
== Use Cases ==
The subsystem has been conceived to support a number of use cases moreover it will be used to serve a number of scenarios. This area will collect these "success stories".
 
  
 
=== Well suited Use Cases ===
 
=== Well suited Use Cases ===
  
Describe here scenarios where the subsystem proves to outperform other approaches.  
+
The Execution Engine has been successfully used at the execution of all workflows involved in the [[Workflow_Engine_Specification#Use_Cases | use cases]] of the Workflow Engine, as the enabling element of the latter.  
  
 
=== Less well suited Use Cases ===
 
=== Less well suited Use Cases ===
  
Describe here scenarios where the subsystem partially satisfied the expectations.
+
As the Execution Engine aims to provide a generic facility for executing workflows, it cannot know the semantics of its input and output data. Applications which need such kind of data comprehension should instead opt for implementing special adaptors for the Workflow Engine.

Revision as of 22:42, 30 April 2012

Overview

The Execution Engine aims to execute arbitrarily complex Execution Plans. Execution Plans are plans for the invocation of code components (aka invocables, i.e. services, binary executables, scripts, …) that ensures that prerequisite data are prepared and delivered to their consumers by defining the flow of data and/or control. The initial Execution Plans provided for execution by the Execution Engine originate from a WorkflowEngine instance. In addition, since the Execution Engine supports distributed execution, it can forward subplans of its initial Execution Plan to other Execution Engine Instances. In this way, one can execute any kind of workflow on top of a distributed computational infrastructure.

When the Workflow Engine is acting in the context of the gCube platform, it provides a gCube compliant Web Service interface. This Web Service acts as the front end not only to Execution facilities, but it is also the "face" of the component with respect to the gCube platform. The Running Instance profile of the service is the placeholder where the underlying Execution Engine instance Execution Environment Providers pushes information that need to be made available to other engine instances. Configuration properties that are used throughout the Workflow Engine instance are retrieved by the appropriate technology specific constructs and are used to initiate services and providers once the service starts.

Key features

Control and monitoring of a processing flow execution.
The Execution Engine provides progress reporting and control constructs based on an event mechanism for tasks operating both on the D4Science and external infrastructures.
Handling of data streaming among computational elements.
PE2ng exploits the high throughput point to point on demand communication facilities offered by gRS2
Expressive and powerful execution plan language
The execution plan elements comprising the language can execute literally anything. In addition, the Execution Engine is technology unaware regarding the components it can invoke, handling in the same uniform manner executables such as SOAP Web Services & WSRF, HTTP API (RESTful WS), various executables (including shell scripts), Java Objects etc.
Multiple ways of invoking executables
In-process: ultra-high performance, no security boundary crossing, low need for data exchanges
Intra-process: high throughput and performance, local security boundaries crossed
Intra-node: low throughput (depending on network), organisational security boundaries crossed
Advanced error handling support through contigency reaction
Each Execution Plan element which invokes executables can be annotated with contingency reaction triggers.
Unbound extensibility via providers for integration with different environments.
The system is designed in an extensible manner, allowing the transparent integration with a variety of providers for storage, resource registries, reporting systems, etc.


Design

Philosophy

The Execution Engine is designed to support an expressive, feature-rich workflow language. It aims to enable the execution of arbitrarily complex workflows of literally all kinds by offering a wide array of constructs, namely Execution Plan Elements, which can be used to invoke any kind of executable or to group collections of elements in execution flow structures. The uniform handling of such constructs by the Execution Engine allows the construction of such arbitrarily complex workflows.

As a constituent part of PE2ng, the Execution Engine is designed with a layered architecture decoupling the business domain, the infrastructure specific logic and the core execution functionality therefore allowing core re-usage to a multitude of use cases and avoiding sub-optimal compromises of strictly agnostic solutions.

Architecture

The Execution Engine comprises a single component, whose internal architecture corresponds to the constructs it provides. This grouping can be summarized as follows:

  • Execution Elements
  • Data Types
  • Events
  • Contingencies

Deployment

The Execution Engine, in its service wrapped version, should be deployed at:

  • Each node which should participate in the execution of Execution Plans of local or remote origin.
  • Each node which is aimed to act as a gateway to external infrastructures.

Large deployment

In case of high demands for computational power, the Execution Engine should be deployed on as many nodes as possible, so that the Workflow Engine instances which contact it are able to contact a large number of nodes and distribute the computational load evenly across the infrastructure.

Execution Engine large deployment

Small deployment

If the processing requirements in the infrastructure are low and/or there is no need to contact external infrastructures, the Execution Engine can be deployed only at the node which hosts also the Workflow Engine and acts as an entry point for incoming workflow processing requests. This means that execution will take place only at that node, locally. In this minimal deployment scenario, one need just deploy the Execution Engine as a library.

Execution Engine small deployment

Use Cases

Well suited Use Cases

The Execution Engine has been successfully used at the execution of all workflows involved in the use cases of the Workflow Engine, as the enabling element of the latter.

Less well suited Use Cases

As the Execution Engine aims to provide a generic facility for executing workflows, it cannot know the semantics of its input and output data. Applications which need such kind of data comprehension should instead opt for implementing special adaptors for the Workflow Engine.