Difference between revisions of "Tabular Data Flow Manager"

From Gcube Wiki
Jump to: navigation, search
m (Philosophy)
(Well suited Use Cases)
 
(17 intermediate revisions by 2 users not shown)
Line 3: Line 3:
 
|}
 
|}
  
A service for tabular data flow management. The aim is to provide a tool for support and management of tabular data flow between infrastructure services.
+
The goal of this facility is to realise an integrated environment supporting the definition and management of workflows of tabular data.
 +
Each workflow consists of a number of tabular data processing steps where each step is realized by an existing service component conceptually offered by a gCube based infrastructure.
  
This document outlines the design rationale, key features, and high-level architecture, as well as the options deployment.
+
In the following, the design rationale, key features, high-level architecture, as well as the deployment scenarios are described.
  
 
== Overview ==
 
== Overview ==
  
The goal of this service is to offer a central service for tabular data flow creation, management and monitoring. The data flow can touch different services in order to produce the desired output.  
+
The goal of this service is to offer a facilities for tabular data workflow management, execution and monitoring.
Planned flow can be scheduled for deferred execution and the user notified about the flow progress.
+
The workflow can involve a number of data manipulation steps each performed by potentially different service components to produce the desired output.
 +
 
 +
=== Key features ===
 +
The subsystem provides for:
 +
 
 +
;declarative approach
 +
:Instead of providing the user with means to describe the workflow as a set of transformation steps the user provides a table template as a set of properties a target table should comply with.
 +
 
 +
;flexible and open workflow definition mechanism
 +
:The set of workflow steps can be enriched providing wider capabilities for template descriptiveness;
 +
 
 +
;user-friendly interface
 +
:The subsystem offers a graphical user interface where users can define table templates. Moreover, the environment allow to actually perform a workflow by applying a template to an imported table;
  
 
== Design ==
 
== Design ==
Line 16: Line 29:
 
=== Philosophy ===
 
=== Philosophy ===
  
The goal of this service is to offer a central service for tabular data flow creation, management and monitoring. The data flow can touch different services in order to produce the desired output.  
+
Tabular Data Flow Manager offers a service for tabular data workflow creation, management and monitoring.
Planned flow can be scheduled for deferred execution and the user notified about the flow progress.
+
The underlying idea is to provide means to the service client to command multiple operations by providing a table template. A table template can be defined in terms of a set of properties the workflow resulting table should camply with. Table templates can be created by the end user with the UI and saved for later reuse. Applying a template to a target tabular data table results in the materialization of a set of workflow steps on the service, which can be monitored remotely.
 +
Each step is managed by a single software component which can also be invoked singularly.
 +
This approach aims at maximizing the exploitation and reuse of components offering data manipulation facilities.
  
 
=== Architecture ===
 
=== Architecture ===
 
The subsystem comprises the following components:
 
The subsystem comprises the following components:
  
* '''Tabular Data Flow Service''': the central system for the flow creation, management and monitoring;
+
* '''Flow Service''': A subset of Tabular Data Service functionalities that allows workflow creation, management, execution and monitoring;
 +
 
 +
* '''Flow UI''': the user interface of this functional area. It provides users with the web based user interface for creating, executing and monitoring the workflow(s);
  
* '''Tabular Data Flow UI''': the user interface of the service where the user can create, execute and monitor the data flow;
+
* '''Workflow Orchestrator''': A service components that ''unpacks'' a table template into a sequence of operations to be performed on a target table;
  
* '''Tabular Data Agent''': an helper component for the service that want to expose tabular data functionality to the data flow service.
+
* '''Operation modules''': A set of software modules, each one managing a specific operation (transformation,validation,import,export).
  
 
A diagram of the relationships between these components is reported in the following figure:
 
A diagram of the relationships between these components is reported in the following figure:
Line 33: Line 50:
  
 
== Deployment ==
 
== Deployment ==
The Service should be deployed in a single node, while the agents should be deployed with the service that want to offer his functionality to the flow service. The User Interface can be deployed in the infrastructure portal.
+
The Service should be deployed in a single node along with the operation modules. The User Interface can be deployed in the infrastructure portal along with the needed client library.
  
 
== Use Cases ==
 
== Use Cases ==
  
 
=== Well suited Use Cases ===
 
=== Well suited Use Cases ===
This component well fit all the cases where is necessary to manage a flow of tabular data between the infrastructure services. An example can be the enhancement of catch statistics offered by the Time Series Service and elaborated using both the Statistical Service and the Occurrence Service.
+
This component well fit all the cases where it is necessary to manage a defined flow of data manipulation steps. An example is the data flow that allows a user to curate a set of uncurated data, provided periodically by a data provider, apply a set of default transformation and validation procedures and merge all the curated data chunks into a single table at the end of the process.

Latest revision as of 13:45, 21 November 2013

The goal of this facility is to realise an integrated environment supporting the definition and management of workflows of tabular data. Each workflow consists of a number of tabular data processing steps where each step is realized by an existing service component conceptually offered by a gCube based infrastructure.

In the following, the design rationale, key features, high-level architecture, as well as the deployment scenarios are described.

Overview

The goal of this service is to offer a facilities for tabular data workflow management, execution and monitoring. The workflow can involve a number of data manipulation steps each performed by potentially different service components to produce the desired output.

Key features

The subsystem provides for:

declarative approach
Instead of providing the user with means to describe the workflow as a set of transformation steps the user provides a table template as a set of properties a target table should comply with.
flexible and open workflow definition mechanism
The set of workflow steps can be enriched providing wider capabilities for template descriptiveness;
user-friendly interface
The subsystem offers a graphical user interface where users can define table templates. Moreover, the environment allow to actually perform a workflow by applying a template to an imported table;

Design

Philosophy

Tabular Data Flow Manager offers a service for tabular data workflow creation, management and monitoring. The underlying idea is to provide means to the service client to command multiple operations by providing a table template. A table template can be defined in terms of a set of properties the workflow resulting table should camply with. Table templates can be created by the end user with the UI and saved for later reuse. Applying a template to a target tabular data table results in the materialization of a set of workflow steps on the service, which can be monitored remotely. Each step is managed by a single software component which can also be invoked singularly. This approach aims at maximizing the exploitation and reuse of components offering data manipulation facilities.

Architecture

The subsystem comprises the following components:

  • Flow Service: A subset of Tabular Data Service functionalities that allows workflow creation, management, execution and monitoring;
  • Flow UI: the user interface of this functional area. It provides users with the web based user interface for creating, executing and monitoring the workflow(s);
  • Workflow Orchestrator: A service components that unpacks a table template into a sequence of operations to be performed on a target table;
  • Operation modules: A set of software modules, each one managing a specific operation (transformation,validation,import,export).

A diagram of the relationships between these components is reported in the following figure:

Tabular Data Flow Manager, internal Architecture

Deployment

The Service should be deployed in a single node along with the operation modules. The User Interface can be deployed in the infrastructure portal along with the needed client library.

Use Cases

Well suited Use Cases

This component well fit all the cases where it is necessary to manage a defined flow of data manipulation steps. An example is the data flow that allows a user to curate a set of uncurated data, provided periodically by a data provider, apply a set of default transformation and validation procedures and merge all the curated data chunks into a single table at the end of the process.