Difference between revisions of "Workflow Management Facilities"

From Gcube Wiki
Jump to: navigation, search
(Created page with 'This is the template for Facilities Specifications. == Overview == Few lines with a promotional 'flavour', e.g. ''"gCube xxx facilities offer scalable, high-performance, reliab…')
 
Line 1: Line 1:
This is the template for Facilities Specifications.
 
 
 
== Overview ==
 
== Overview ==
Few lines with a promotional 'flavour', e.g. ''"gCube xxx facilities offer scalable, high-performance, reliable, open source instruments for ..."''
+
The gCube system exposes its Workflow Management capabilities through its <b>PE2ng</b> component, a process execution engine. PE2ng is a system to manage the execution of software elements in a distributed infrastructure under the coordination of a composite plan that defines the data dependencies among its actors. It provides a powerful, flow-oriented processing model that supports several computational middleware without performance compromises. Thus, a task can be designed as a workflow of invocation of code components (services, binary executables, scripts, map-reduce jobs, etc.) by ensuring that prerequisite data are prepared and delivered to their consumers through the control of the flow of data.
 +
 
 +
PE2ng aims to bring together and integrate at the level of execution computing paradigms, execution patterns and Infrastructures by bridging but not hiding Infrastructures, enabling targeting Infrastructures that best fits the current needs turning an Infrastructure into a utility. Overall, an unrestrictive meta-Infrastructure is comprised with a single submission, monitoring and access execution point offering a single language for "Programming in the Large" and "Small".
  
 
== Key Features ==
 
== Key Features ==
A bullet list highlighting the main features offered by the facilities. The 'flavour' should be catchy and user-friendly. Some examples are (from MongoDB):
+
;Orchestration for external computational and storage infrastructures.
 
+
:PE2ng allows users and systems to exploit computational resources that reside on multiple eInfrastructures in a single complex process workflow.
;Document-oriented storage
+
;Native computational infrastructure provider and manager.
:JSON-style documents with dynamic schemas offer simplicity and power.
+
:PE2ng can be used to exploit the computational capacity of the D4Science Infrastructure by executing tasks directly on the latter.
 
+
;Control and monitoring of a processing flow execution.
;Full Index Support
+
:PE2ng provides progress reporting and control constructs based on an event mechanism for tasks operating both on the D4Science and external infrastructures.
:Index on any attribute, just like you're used to.
+
;Handling of data staging among different storage providers.
 
+
:All PE2ng Adaptors handle data staging in a transparent way, without requiring any kind on external intervention.
;Replication & High Availability
+
;Handling of data streaming among computational elements.
:Mirror across LANs and WANs for scale and peace of mind.
+
:PE2ng exploits the high throughput point to point on demand communication facilities offered by [[Result Set Components|gRS2]]
 
+
;Expressive and powerful execution plan language
;Auto-Sharding
+
:The execution plan elements comprising the language can execute literally anything.
:Scale horizontally without compromising functionality.
+
;Unbound extensibility via providers for integration with different environments.
 +
:The system is designed in an extensible manner, allowing the transparent integration with a variety of providers for storage, resource registries, reporting systems, etc.
 +
;Alignment with cloud computing principles
 +
:Application as a Service (AaaS), Plarform as a Service (PaaS), Infrastructure as a Service (IaaS)
 +
;User Interface
 +
:PE2ng offers a User Interface to launch and monitor commonly used workflows and aims to provide a fully fledged Graphical User Interface for the graphical composition of arbitrary workflows.
  
 
== Subsystems ==
 
== Subsystems ==
  
Because
+
gCube Workflow Management Facilities are provided by PEng, which itself comprises the following two interrelated subsystems:
# the identified facilities might be quite extent / "fat" from the functional point of view and
+
# the information introduced so far is very generic from a technical point of view
+
one or more 'subsystem' pages should be created.
+
 
+
Each subsystem page is expected to provide the reader with a description capturing '''design''' and '''deployment aspects''' as well as '''supported use cases'''. The following template is proposed:
+
  
[[Subsystem Specification Template]]
+
[[Workflow Engine Specification]]
  
Next step will be the identification of the subsystems for each facility.
+
[[Execution Engine Specification]]

Revision as of 17:14, 30 April 2012

Overview

The gCube system exposes its Workflow Management capabilities through its PE2ng component, a process execution engine. PE2ng is a system to manage the execution of software elements in a distributed infrastructure under the coordination of a composite plan that defines the data dependencies among its actors. It provides a powerful, flow-oriented processing model that supports several computational middleware without performance compromises. Thus, a task can be designed as a workflow of invocation of code components (services, binary executables, scripts, map-reduce jobs, etc.) by ensuring that prerequisite data are prepared and delivered to their consumers through the control of the flow of data.

PE2ng aims to bring together and integrate at the level of execution computing paradigms, execution patterns and Infrastructures by bridging but not hiding Infrastructures, enabling targeting Infrastructures that best fits the current needs turning an Infrastructure into a utility. Overall, an unrestrictive meta-Infrastructure is comprised with a single submission, monitoring and access execution point offering a single language for "Programming in the Large" and "Small".

Key Features

Orchestration for external computational and storage infrastructures.
PE2ng allows users and systems to exploit computational resources that reside on multiple eInfrastructures in a single complex process workflow.
Native computational infrastructure provider and manager.
PE2ng can be used to exploit the computational capacity of the D4Science Infrastructure by executing tasks directly on the latter.
Control and monitoring of a processing flow execution.
PE2ng provides progress reporting and control constructs based on an event mechanism for tasks operating both on the D4Science and external infrastructures.
Handling of data staging among different storage providers.
All PE2ng Adaptors handle data staging in a transparent way, without requiring any kind on external intervention.
Handling of data streaming among computational elements.
PE2ng exploits the high throughput point to point on demand communication facilities offered by gRS2
Expressive and powerful execution plan language
The execution plan elements comprising the language can execute literally anything.
Unbound extensibility via providers for integration with different environments.
The system is designed in an extensible manner, allowing the transparent integration with a variety of providers for storage, resource registries, reporting systems, etc.
Alignment with cloud computing principles
Application as a Service (AaaS), Plarform as a Service (PaaS), Infrastructure as a Service (IaaS)
User Interface
PE2ng offers a User Interface to launch and monitor commonly used workflows and aims to provide a fully fledged Graphical User Interface for the graphical composition of arbitrary workflows.

Subsystems

gCube Workflow Management Facilities are provided by PEng, which itself comprises the following two interrelated subsystems:

Workflow Engine Specification

Execution Engine Specification