Difference between revisions of "Data Mining Facilities"
(→Specifications) |
(→Specifications) |
||
Line 32: | Line 32: | ||
;[[DataMiner_Algorithms | DataMiner Algorithms]] | ;[[DataMiner_Algorithms | DataMiner Algorithms]] | ||
: the complete list of algorithms supported by the [[DataMiner_Manager | DataMiner]] | : the complete list of algorithms supported by the [[DataMiner_Manager | DataMiner]] | ||
+ | ;[[How-to_Implement_Algorithms_for_DataMiner | How-to Implement Algorithms for DataMiner]] | ||
+ | : How to implement algorithms for DataMiner | ||
;[[Statistical_Algorithms_Importer | Statistical Algorithms Importer]] | ;[[Statistical_Algorithms_Importer | Statistical Algorithms Importer]] | ||
: a tool to import R processes on DataMiner | : a tool to import R processes on DataMiner |
Revision as of 03:02, 24 November 2016
Overview
Data Mining facilities include a set of features, services and methods for performing data processing and mining on biological information sets. These features face several aspects of biological data processing ranging from ecological modeling to niche modeling experiments. This set of services and libraries is used by the D4Science e-infrastructure to manage data mining problems even from a computational complexity point of view. Algorithms are executed in parallel and possibly distributed fashion, using the same D4Science nodes as working nodes. Furthermore, Services performing Data Mining operations are deployed according to a distributed architecture, in order to balance the load of those procedures requiring local resources.
By means of the above features, Data Mining aims to manage problems like (i) the prediction of the impact of climate changes on biodiversity, (ii) the prevention of the spread of invasive species, (iii) the identification of geographical and ecological aspects of disease transmission, (iv) the conservation planning, (v) the prediction of suitable habitats for marine species. By using the computational facilities of the D4Science e-Infrastructure, algorithms can run in a cost-effective way letting scientists perform more experiments and combine different techniques.
Key Features
The components part of the subsystem provide the following main key features:
- parallel processing
- parallelization of statistical algorithms using a map-reduce approach
- cloud computing approach in a seamless way to the users
- pre-cooked state-of-the-art data mining algorithms
- algorithms oriented to biological-related problems supplied as-a-service
- general purpose algorithms (e.g. Clustering, Principal Component Analysis, Artificial Neural Networks) supplied as-a-service
- data trends generation and analysis
- extraction of trends for biodiversity data
- inspection of time series of observations on biological species
- basic signal processing techniques to explore periodicities in trends
- ecological niche modelling
- algorithms to perform ecological niche modelling using either mechanistic or correlative approaches
- species distribution maps generation
Specifications
- DataMiner
- a Service allowing the management of statistical data and multi-user requests for computation
- DataMiner Algorithms
- the complete list of algorithms supported by the DataMiner
- How-to Implement Algorithms for DataMiner
- How to implement algorithms for DataMiner
- Statistical Algorithms Importer
- a tool to import R processes on DataMiner
- DataMiner Installation
- Installation guide for DataMiner
- How to Interact with the DataMiner by client
- Interacting with DataMiner from a thin client
- Ecological Modeling
- a set of methods for performing Data Mining operations. These include experiments and techniques categorization
- Signal Processing
- a set of methods to perform digital signal processing.
- Statistical Manager
- the previous gCube system for Cloud computing