Data Mining Facilities

From Gcube Wiki
Revision as of 13:58, 22 October 2012 by Gianpaolo.coro (Talk | contribs) (Specifications)

Jump to: navigation, search

Overview

Data Mining facilities include a set of features, services and methods for performing data processing and mining on biological information sets. These features face several aspects of biological data processing ranging from ecological modeling to niche modeling experiments. This set of services and libraries is used by the D4Science e-infrastructure to manage data mining problems even from a computational complexity point of view. Algorithms are executed in parallel and possibly distributed fashion, using the same D4Science nodes as working nodes. Furthermore, Services performing Data Mining operations are deployed according to a distributed architecture, in order to balance the load of those procedures requiring local resources.

By means of the above features, Data Mining in i-Marine aims to manage problems like (i) the prediction of the impact of climate changes on biodiversity, (ii) the prevention of the spread of invasive species, (iii) the identification of geographical and ecological aspects of disease transmission, (iv) the conservation planning, (v) the prediction of suitable habitats for marine species. By using the computational facilities of the D4Science e-Infrastructure, algorithms can run in a cost-effective way letting scientists perform more experiments and combine different techniques.

Specifications

Statistical Manager
a Service implementing allowing the management of statistical data and multi-user requests for computation
Ecological Modeling
a set of methods for performing Data Mining operations. This includes experiments and techniques categorization.