Workflow Management Facilities

From Gcube Wiki
Jump to: navigation, search

Overview

The gCube system exposes its Workflow Management capabilities through its PE2ng component, a process execution engine. PE2ng is a system to manage the execution of software elements in a distributed infrastructure under the coordination of a composite plan that defines the data dependencies among its actors. It provides a powerful, flow-oriented processing model that supports several computational middleware without performance compromises. Thus, a task can be designed as a workflow of invocation of code components (services, binary executables, scripts, map-reduce jobs, etc.) by ensuring that prerequisite data are prepared and delivered to their consumers through the control of the flow of data.

PE2ng aims to bring together and integrate at the level of execution computing paradigms, execution patterns and Infrastructures by bridging but not hiding Infrastructures, enabling targeting Infrastructures that best fits the current needs turning an Infrastructure into a utility. Overall, an unrestrictive meta-Infrastructure is comprised with a single submission, monitoring and access execution point offering a single language for "Programming in the Large" and "Small".

The following figure summarizes the architecture of PE2ng at the level of subsystem:

PE2ng Architecture

Key Features

Orchestration for external computational and storage infrastructures.
PE2ng allows users and systems to exploit computational resources that reside on multiple eInfrastructures in a single complex process workflow.
Native computational infrastructure provider and manager.
PE2ng can be used to exploit the computational capacity of the D4Science Infrastructure by executing tasks directly on the latter.
Control and monitoring of a processing flow execution.
PE2ng provides progress reporting and control constructs based on an event mechanism for tasks operating both on the D4Science and external infrastructures.
Handling of data staging among different storage providers.
All PE2ng Adaptors handle data staging in a transparent way, without requiring any kind on external intervention.
Handling of data streaming among computational elements.
PE2ng exploits the high throughput point to point on demand communication facilities offered by gRS2
Expressive and powerful execution plan language
The execution plan elements comprising the language can execute literally anything.
Unbound extensibility via providers for integration with different environments.
The system is designed in an extensible manner, allowing the transparent integration with a variety of providers for storage, resource registries, reporting systems, etc.
Alignment with cloud computing principles
Application as a Service (AaaS), Plarform as a Service (PaaS), Infrastructure as a Service (IaaS)
User Interface
PE2ng offers a User Interface to launch and monitor commonly used workflows and aims to provide a fully fledged Graphical User Interface for the graphical composition of arbitrary workflows.

Subsystems

gCube Workflow Management Facilities are provided by PEng, which itself comprises the following two interrelated subsystems:

Workflow Engine Specification

Execution Engine Specification

In order to better support the Workflow Execution in the gCube infrastructure the components of this subsystem exploit the facilities offered by a separate family of components:

Resource Registry Specification