Cube Manager

From Gcube Wiki
Revision as of 16:24, 28 November 2013 by Luigi.fortunati (Talk | contribs) (Cube Manager)

Jump to: navigation, search

Overview

This page documents concepts that enables the cube manager component and report information that could help developers in understanding the project organization.

Motivation and concepts

Tabular Data is a pluggable service that allows to manage the lifecycle of statistical data. Operation modules that perform operations on data (like import,export,transformation,validation) can be added in a modular fashion. Each operation module produce one or more tables as a result of its computation. A table in the Tabular Data system is made of some data (managed with a data backend) and some metadata, telling where data can be found and enriching its description with additional metadata. The system offers a common model for the representation of tables metadata (Tabular Model) which helps in describing tables structure, location, additional metadata. Cube manager is the low level components that enables the Tabular Data service. Its main features are:

  • Allow the creation of new tables described using the Tabular Model
  • Allow to clone or modify an existing table in terms of its structure or metadata.
  • Keeps track of created tables and their associated metadata (table structure, additional metadata)
  • Allow retrieval of metadata information to clients
  • Relational database (data backend) connection pooling

Prerequisites

In order to work with cube manager code there are some technologies that must be known first.

  • Maven: Cube Manager is a fairly complex project that uses Maven for project organization. The project itself is a multi-module maven project. Basic knowledge of maven and multi-module feature is required.
  • Dependency injection and CDI/Weld: In order to achieve modularity and Inversion of Control, Cube Manager (and the whole Tabular Data stack) relies on CDI (Context and Dependency Injection) and its Reference Implementation Weld both for standalone execution and testing.
  • PostgreSQL and Postgis: The data backend is a relational database (PostgreSQL) with PostGIS extension. Basic knowledge on the administration and management of a PostgreSQL database instance is required. Working versions of these technologies with cube manager are PostgreSQL 9.2 and PostGIS 2.0.
  • JPA: Cube manager keeps track of metadata associated to each table. It does so by leveraging a PostgreSQL backend with JPA technology, mapping tabular model beans directly into the database. The implementation of JPA that is used in the project is EclipseLink.

Overview

This page documents concepts that enables the cube manager component and report information that could help developers in understanding the project organization.

Motivation and concepts

Tabular Data is a pluggable service that allows to manage the lifecycle of statistical data. Operation modules that perform operations on data (like import,export,transformation,validation) can be added in a modular fashion. Each operation module produce one or more tables as a result of its computation. A table in the Tabular Data system is made of some data (managed with a data backend) and some metadata, telling where data can be found and enriching its description with additional metadata. The system offers a common model for the representation of tables metadata (Tabular Model) which helps in describing tables structure, location, additional metadata. Cube manager is the low level components that enables the Tabular Data service. Its main features are:

  • Allow the creation of new tables described using the Tabular Model
  • Allow to clone or modify an existing table in terms of its structure or metadata.
  • Keeps track of created tables and their associated metadata (table structure, additional metadata)
  • Allow retrieval of metadata information to clients
  • Relational database (data backend) connection pooling

Prerequisites

In order to work with cube manager code there are some technologies that must be known first.

  • Maven: Cube Manager is a fairly complex project that uses Maven for project organization. The project itself is a multi-module maven project. Basic knowledge of maven and multi-module feature is required.
  • Dependency injection and CDI/Weld: In order to achieve modularity and Inversion of Control, Cube Manager (and the whole Tabular Data stack) relies on CDI (Context and Dependency Injection) and its Reference Implementation Weld both for standalone execution and testing.
  • PostgreSQL and Postgis: The data backend is a relational database (PostgreSQL) with PostGIS extension. Basic knowledge on the administration and management of a PostgreSQL database instance is required. Working versions of these technologies with cube manager are PostgreSQL 9.2 and PostGIS 2.0.
  • JPA: Cube manager keeps track of metadata associated to each table. It does so by leveraging a PostgreSQL backend with JPA technology, mapping tabular model beans directly into the database. The implementation of JPA that is used in the project is EclipseLink.

Modules

Cube Manager is a multi-module project made of several subprojects:

  • Data Wrangler: module that handles table management on data backend (PostgreSQL) and mapping between the tabular model types and the DB.
  • Metadata Wrangler: module tha handles table metadata persistence on a PostgreSQL backend with JPA.
  • Cube Manager API: API of cube manager which exposes the functionalities offered by a cube manager. Could be used by clients at compile time.
  • Cube Manager: Default implementation of the cube manager.
  • Cube Manager Parent: maven parent project that wraps all of the aforementioned modules.

Database Wrangler

Database Wrangler is a single module (an API module tailored on a generic relational DB is missing) specifically designed to work with PostgreSQL DB. It provides methods for:

  • Creating or removing tables on the data bacckend
  • Adding or removing a column from a table on the data backend
  • Creating indexes or constraints on columns
  • Setting up a triggered procedure on a table

Database connection is provided by a DatabaseProvider implementation based on tomcat jdbc library connection pooling facilities. Coordinates of the data backend are automatically retrieved from the Information System by leveraging the [Database Resource] library.

Metadata Wrangler

Metadata Wrangler provides a simple API based on tabular model beans that allows for easy storage of metadata. It provides method for:

  • Saving a table
  • Recover tables by table type or get all registered tables
  • Recover a single table by its id
  • Remove a table, provided an id

The default implementation that uses JPA and a PostgreSQL DB, which coordinates are retrieved with [Database Resource] in a similar fashion as the Data Wrangler, performs an on-the-fly translation of tabular model beans into JPA annotated beans.

Cube Manager API

Cube Manager API exposes the interface that clients use. As with version 3.0

public interface CubeManager {
 
	public TableCreator createTable(TableType type);
 
	public TableMetaCreator modifyTableMeta(TableId tableId) throws NoSuchTableException;
 
	public Collection<Table> getTables();
 
	public Collection<Table> getTables(TableType tableType);
 
	public Table getTable(TableId id) throws NoSuchTableException;
 
	public void removeTable(TableId id) throws NoSuchTableException;
 
	public Table createTimeCodelist(PeriodType periodType);
 
}
 
public interface TableCreator {
 
	public TableCreator addColumn(Column column);
 
	public TableCreator addColumns(Column... columns);
 
	public TableCreator like(Table table, boolean copyData);
 
	public TableCreator like(Table table, boolean copyData, List<Column> columnsToRemove);
 
	public Table create() throws TableCreationException;
 
}
 
public interface TableMetaCreator {
 
	public TableMetaCreator setTableMetadata(TableMetadata... metadata);
 
	public TableMetaCreator removeTableMetadata(Class<? extends TableMetadata> metadataType);
 
	public TableMetaCreator removeAllTableMetadata();
 
	public TableMetaCreator setColumnMetadata(ColumnLocalId columnId, ColumnMetadata... metadata);
 
	public TableMetaCreator removeColumnMetadata(ColumnLocalId columnId, Class<? extends ColumnMetadata> metadataType);
 
	public TableMetaCreator removeAllColumnMetadata(ColumnLocalId columnId);
 
	public Table create() throws TableCreationException;
 
}

TableCreator and TableMetaCreator are builders that allows respectively to create/clone/modify a table structure and to modify some table metadata (without creating a new data table on the relational database) by creating a new metadata table.

Moreover Cube Manager allows simplified creation of time based dimension tables. These tables come with a predefined structure and metadata.

Cube Manager

Cube manager provides a default implementation for Cube Manager API. Since there are different policies on how to treat column indexes there are several different implementation of TableCreator, each one covering a different table type. This component relies both on the Data Wrangler and the Metadata Wrangler for the management of tables.

ComponentDependencies.png

Along with the java implementation there are several SQL scripts (stored procedures) that should be loaded into the PostgreSQL data DB in order for the methods that allows the creation of time dimension tables to work.