Data Consumption Software Consolidated Specifications

Overview

This page contains an overview about the components and facilities provided by the gCube Data Consumption Software, along with links to the software specifications and to the Developers' guides. The main aim is to provide a summary of the supported software at different granularities. The facilities regard the gCube components that deal with several aspects of Data Consumption, in particular: Retrieval, Manipulation, Mining, Visualisation and Semantic Data Analysis.

Key Features

The gCube Data Consumption facilities provide the following key features:

Data Retrieval

Declarative Query Language over a heterogeneous environment: gCube Data Retrieval framework unifies Data Sources that use different data representation and semantics through the CQL standard.

On the fly Integration of Data Sources: A Data Source that publishes its Information Retrieval capabilities can be on-the-fly involved in the IR process.

Scalability in the number of Data Sources: Planning and Optimization mechanisms detect the minimum number of Sources needed to be involved during query answering, along with an optimal plan for execution.

Direct Integration of External Information Providers: Through the OpenSearch standard, external Information Providers can be queried dynamically. The results they provide can be aggregated with the rest of results during query answering.

Indexing Capabilities for Replication and High Availability: Multidimensional and Full-text indexing capabilities using an architecture that efficiently supports replication and high availability.

Distributed Execution Environment offering High Performance and Flexibility: Efficient execution of search plans over a large heterogeneous environment.

Data Manipulation

Automatic transformation path identification: Given the content type of a source object and the target content type, framework finds out the appropriate transformation to use. In addition, there is the ability to dynamically form a path of a number of transformation steps to produce the final format. Shortest path length is favorable.

Fine-grained sub typing of formats: Providing an extensive freedom for supported types and for the parameters of them (e.g. resolution, fps etc).

Pluggable algorithms for content transformation: A generic transformation framework that is based on pluggable components termed transformation programs. Transformation programs reveal the transformation capabilities of the framework. With this approach we are able to furnish domain and application specific data transformations.

Exploitation of PE2ng Infrastructure: The integration with the PE2ng engine allows to have access to vast amounts of processing power and enables to handle virtually any transformation task thus consisting the standard Data Manipulation facility for gCube applications.

Data Mining

Parallel processing: Parallelization of statistical algorithms using a map-reduce approach; Cloud computing approach in a seamless way to the users

Pre-cooked state-of-the-art data mining algorithms: Algorithms oriented to biological-related problems supplied as-a-service; General purpose algorithms (e.g. Clustering, Principal Component Analysis, Artificial Neural Networks) supplied as-a-service

Data trends generation and analysis: Extraction of trends for biodiversity data; Inspection of time series of observations on biological species; Basic signal processing techniques to explore periodicities in trends

Ecological niche modelling: Algorithms to perform ecological niche modelling using either mechanistic or correlative approaches; Species distribution maps generation

Data Visualisation

Uniform access over geospatial GIS layers: Investigation over layers indexed by GeoNetwork;; Visualization of distributed layers;; Add of remote layers published in standard OGC formats (WMS or WFS);

Filtering and analysis capabilities: Possibility to perform CQL filters on layers;; Possibility to trace transect charts;; Possibility to select areas for investigating on environmental features;

Search and indexing capabilities: Possibility to sort over titles on a huge quantity of layers;; Possibility to search over titles and names on a huge quantity of layers;; Possibility to index layers by invoking GeoNetwork functionalities;

Semantic Data Analysis

Provision of results clustering over any search system: Returns textual snippets and for which there is an OpenSearch description.

Provision of snippet or contents-based entity recognition: Generic as well as vertical - based on predetermined entity categories and lists which can be obtained by querying SPARQL endpoints.

Provision of gradual faceted (session-based) search: Allows to gradually restrict the answer based on the selected entities and/or clusters.

Ability to fetch and display semantic information of an identified entity: Achieved by querying approprate SPARQL endpoints.

Ability to apply these services on any web page through a web browser: Using the functionality of bookmarklets.

Components

Data Retrieval

Search Planning and Execution Specification: which enables the integration of CQL-compliant Data Sources and are responsible for answering queries by combining Data Sources capabilities and Search Operators

Data Sources Specification: which aims to provide integration of the data from different data providers into our infrastructure

Data Manipulation

Data Transformation Service Specification: which transforms content and metadata among different formats and specifications

Data Mining

Statistical Manager: a Service allowing the management of statistical data and multi-user requests for computation

Ecological Modeling: a set of methods for performing Data Mining operations. These include experiments and techniques categorization

Signal_Processing: a set of methods to perform digital signal processing

Data Visualisation

Gis Viewer: a tool for visual analysis of geospatial layers stored on a GeoServer or remotely published by WFS or WMS protocols

Geo Explorer: a tool for search and browse geo-spatial data sets spread in a number of data providers linked to the infrastructure

Geospatial_Data_Processing#TIFFUploader_Algorithm: a tool for transforming geo-spatial data from a format into another accepted by common GIS visualisers

Semantic Data Analysis

X-Search: a meta-search engine that reads the description of an underlying search source, and is able to query that source and analyze in various ways the returned results and also exploit the availability of semantic repositories

Specifications

The specifications require preparatory information to be properly understood. In particular:

How to Develop a gCube Component: A basic guide to build a gCube Component

Buiding Components using the gCube Fetherweight Stack: A guide to develop libraries or clients for the gCube Services

Developer's Guide: the overall gCube Developer's Guide

Task oriented specifications can be found in the following:

Data Retrieval Specifications: the specifications for the Data Retrieval components

Data Manipulation Facilities: the specifications for the Data Manipulation components

Data Mining Specifications: the specifications for the Data Mining components

Data Visualisation Specifications: the specifications for the Data Visualisation components

Semantic Data Analysis: the specifications for the Semantic Data Analysis components

Data Consumption Software Consolidated Specifications

Contents

Overview

Key Features

Data Retrieval

Data Manipulation

Data Mining

Data Visualisation

Semantic Data Analysis

Components

Data Retrieval

Data Manipulation

Data Mining

Data Visualisation

Semantic Data Analysis

Specifications

Navigation menu

Views

Personal tools

gCube Wiki

gCube features

gCube documentation

Integration and Distribution

Search

Tools