Data Consumption Software Consolidated Specifications
From Gcube Wiki
Contents
Overview
This page contains an overview about the components and facilities provided by the gCube Data Consumption Software, along with links to the software specifications and to the Developers' guides. The main aim is to provide a summary of the supported software at different granularities. The facilities regard the gCube components that deal with several aspects of Data Consumption, in particular: Retrieval, Manipulation, Mining, Visualisation and Semantic Data Analysis.
Key Features
The gCube Data Consumption facilities provide the following key features:
Data Retrieval
- Declarative Query Language over a heterogeneous environment
- gCube Data Retrieval framework unifies Data Sources that use different data representation and semantics through the CQL standard.
- On the fly Integration of Data Sources
- A Data Source that publishes its Information Retrieval capabilities can be on-the-fly involved in the IR process.
- Scalability in the number of Data Sources
- Planning and Optimization mechanisms detect the minimum number of Sources needed to be involved during query answering, along with an optimal plan for execution.
- Direct Integration of External Information Providers
- Through the OpenSearch standard, external Information Providers can be queried dynamically. The results they provide can be aggregated with the rest of results during query answering.
- Indexing Capabilities for Replication and High Availability
- Multidimensional and Full-text indexing capabilities using an architecture that efficiently supports replication and high availability.
- Distributed Execution Environment offering High Performance and Flexibility
- Efficient execution of search plans over a large heterogeneous environment.
Data Manipulation
- Automatic transformation path identification
- Given the content type of a source object and the target content type, framework finds out the appropriate transformation to use. In addition, there is the ability to dynamically form a path of a number of transformation steps to produce the final format. Shortest path length is favorable.
- Fine-grained sub typing of formats
- Providing an extensive freedom for supported types and for the parameters of them (e.g. resolution, fps etc).
- Pluggable algorithms for content transformation
- A generic transformation framework that is based on pluggable components termed transformation programs. Transformation programs reveal the transformation capabilities of the framework. With this approach we are able to furnish domain and application specific data transformations.
- Exploitation of PE2ng Infrastructure
- The integration with the PE2ng engine allows to have access to vast amounts of processing power and enables to handle virtually any transformation task thus consisting the standard Data Manipulation facility for gCube applications.
Data Mining
- Parallel processing
- Parallelization of statistical algorithms using a map-reduce approach
- Cloud computing approach in a seamless way to the users
- Pre-cooked state-of-the-art data mining algorithms
- Algorithms oriented to biological-related problems supplied as-a-service
- General purpose algorithms (e.g. Clustering, Principal Component Analysis, Artificial Neural Networks) supplied as-a-service
- Data trends generation and analysis
- Extraction of trends for biodiversity data
- Inspection of time series of observations on biological species
- Basic signal processing techniques to explore periodicities in trends
- Ecological niche modelling
- Algorithms to perform ecological niche modelling using either mechanistic or correlative approaches
- Species distribution maps generation
Data Visualisation
- Uniform access over geospatial GIS layers
- Investigation over layers indexed by GeoNetwork;
- Visualization of distributed layers;
- Add of remote layers published in standard OGC formats (WMS or WFS);
- Filtering and analysis capabilities
- Possibility to perform CQL filters on layers;
- Possibility to trace transect charts;
- Possibility to select areas for investigating on environmental features;
- Search and indexing capabilities
- Possibility to sort over titles on a huge quantity of layers;
- Possibility to search over titles and names on a huge quantity of layers;
- Possibility to index layers by invoking GeoNetwork functionalities;
Semantic Data Analysis
- Provision of results clustering over any search system
- Returns textual snippets and for which there is an OpenSearch description.
- Provision of snippet or contents-based entity recognition
- Generic as well as vertical - based on predetermined entity categories and lists which can be obtained by querying SPARQL endpoints.
- Provision of gradual faceted (session-based) search
- Allows to gradually restrict the answer based on the selected entities and/or clusters.
- Ability to fetch and display semantic information of an identified entity
- Achieved by querying approprate SPARQL endpoints.
- Ability to apply these services on any web page through a web browser
- Using the functionality of bookmarklets.
Components
Data Retrieval
- Search Planning and Execution Specification
- which enables the integration of CQL-compliant Data Sources and are responsible for answering queries by combining Data Sources capabilities and Search Operators
- Data Sources Specification
- which aims to provide integration of the data from different data providers into our infrastructure
Data Manipulation
- Data Transformation Service Specification
- which transforms content and metadata among different formats and specifications
Data Mining
- Statistical Manager
- a Service allowing the management of statistical data and multi-user requests for computation
- Ecological Modeling
- a set of methods for performing Data Mining operations. These include experiments and techniques categorization
- Signal_Processing
- a set of methods to perform digital signal processing
Data Visualisation
- Gis Viewer
- a tool for visual analysis of geospatial layers stored on a GeoServer or remotely published by WFS or WMS protocols
- Geo Explorer
- a tool for search and browse geo-spatial data sets spread in a number of data providers linked to the infrastructure
- Geospatial_Data_Processing#TIFFUploader_Algorithm
- a tool for transforming geo-spatial data from a format into another accepted by common GIS visualisers
Semantic Data Analysis
- X-Search
- a meta-search engine that reads the description of an underlying search source, and is able to query that source and analyze in various ways the returned results and also exploit the availability of semantic repositories
Specifications
The specifications require preparatory information to be properly understood. In particular:
- How to Develop a gCube Component
- A basic guide to build a gCube Component
- Buiding Components using the gCube Fetherweight Stack
- A guide to develop libraries or clients for the gCube Services
- Developer's Guide
- the overall gCube Developer's Guide
Task oriented specifications can be found in the following:
- Data Retrieval Specifications
- the specifications for the Data Retrieval components
- Data Manipulation Facilities
- the specifications for the Data Manipulation components
- Data Mining Specifications
- the specifications for the Data Mining components
- Data Visualisation Specifications
- the specifications for the Data Visualisation components
- Semantic Data Analysis
- the specifications for the Semantic Data Analysis components