Difference between revisions of "Statistical Manager Algorithms"

From Gcube Wiki
Jump to: navigation, search
(Signal Processing Algorithms)
(Clustering Algorithms)
Line 31: Line 31:
 
||
 
||
 
|| '''A clustering algorithm for real value vectors that relies on the density-based spatial clustering of applications with noise (DBSCAN) algorithm. It accepts as input a table and some parameters characterising the expected result such as the epsilon and the minimum number of items in a cluster. It produces <output>. <limitation>. For more information see: Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu (1996-). "A density-based algorithm for discovering clusters in large spatial databases with noise". In Evangelos Simoudis, Jiawei Han, Usama M. Fayyad. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). AAAI Press. pp. 226–231. ISBN 1-57735-004-9.'''     
 
|| '''A clustering algorithm for real value vectors that relies on the density-based spatial clustering of applications with noise (DBSCAN) algorithm. It accepts as input a table and some parameters characterising the expected result such as the epsilon and the minimum number of items in a cluster. It produces <output>. <limitation>. For more information see: Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu (1996-). "A density-based algorithm for discovering clusters in large spatial databases with noise". In Evangelos Simoudis, Jiawei Han, Usama M. Fayyad. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). AAAI Press. pp. 226–231. ISBN 1-57735-004-9.'''     
 +
|-
 +
|| Type
 +
|| Clustering
 +
|-
 +
|| Execution
 +
|| ...
 +
|-
 +
 +
! colspan=2 bgcolor=lightgrey | <div id="LOF">LOF</div>
 +
|-
 +
|| Description
 +
|| Local Outlier Factor (LOF). A clustering algorithm for real valued vectors that relies on Local Outlier Factor algorithm, i.e. an algorithm for finding anomalous data points by measuring the local deviation of a given data point with respect to its neighbours. A Maximum of 4000 points is allowed.
 
|-
 
|-
 
|| Type
 
|| Type
Line 46: Line 58:
 
||
 
||
 
|| '''A clustering algorithm real value vectors that relies on the k-means algorithm, i.e. a method aiming to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. It accepts as input a table and some parameters characterising the result such as the number of expected clusters, the maximum number of iterations, the minimum number of points defining an outlier. It produces … . The implementation supports tables containing 4000 entries at maximum. For more information see: MacQueen, J. B. (1967). "Some Methods for classification and Analysis of Multivariate Observations". Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability 1. University of California Press. pp. 281–297. MR 0214227. Zbl 0214.46201. Retrieved 2009-04-07.'''
 
|| '''A clustering algorithm real value vectors that relies on the k-means algorithm, i.e. a method aiming to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. It accepts as input a table and some parameters characterising the result such as the number of expected clusters, the maximum number of iterations, the minimum number of points defining an outlier. It produces … . The implementation supports tables containing 4000 entries at maximum. For more information see: MacQueen, J. B. (1967). "Some Methods for classification and Analysis of Multivariate Observations". Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability 1. University of California Press. pp. 281–297. MR 0214227. Zbl 0214.46201. Retrieved 2009-04-07.'''
 +
|-
 +
|| Type
 +
|| Clustering
 +
|-
 +
|| Execution
 +
|| ...
 +
|-
 +
 +
! colspan=2 bgcolor=lightgrey | <div id="XMEANS">XMEANS</div>
 +
|-
 +
|| Description
 +
|| A clustering algorithm for occurrence points that relies on the X-Means algorithm, i.e. an extended version of the K-Means algorithm improved by an Improve-Structure part. A Maximum of 4000 points is allowed.
 
|-
 
|-
 
|| Type
 
|| Type

Revision as of 09:15, 14 June 2016

The complete list of algorithms supported by the Statistical Manager service is reported below.

Algorithms are clustered in the following categories: ... to be completed

DBSCAN, KMEANS
AQUAMAPSNN, AQUAMAPS_NATIVE, AQUAMAPS_NATIVE_2050, AQUAMAPS_NATIVE_NEURALNETWORK
A: ABSENCE_CELLS_FROM_AQUAMAPS;
B: BIOCLIMATE_HCAF, BIOCLIMATE_HSPEC, BIOCLIMATE_HSPEN, BIONYM, BIONYM_BIODIV, BIONYM_LOCAL;
F: FEED_FORWARD_ANN, FEED_FORWARD_A_N_N_DISTRIBUTION, FIN_GSAY_MATCH, FIN_TAXA_MATCH;
G: GET_OCCURRENCES_ALGORITHM, GET_TAXA_ALGORITHM;
H: HCAF_FILTER, HCAF_INTERPOLATION, HRS, HSPEN, HSPEN_FILTER
T: TIMEEXTRACTION
Z: ZETAEXTRACTION_TABLE

Clustering Algorithms

DBSCAN
Description A clustering algorithm for real valued vectors that relies on the density-based spatial clustering of applications with noise (DBSCAN) algorithm. A maximum of 4000 points is allowed.
A clustering algorithm for real value vectors that relies on the density-based spatial clustering of applications with noise (DBSCAN) algorithm. It accepts as input a table and some parameters characterising the expected result such as the epsilon and the minimum number of items in a cluster. It produces <output>. <limitation>. For more information see: Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu (1996-). "A density-based algorithm for discovering clusters in large spatial databases with noise". In Evangelos Simoudis, Jiawei Han, Usama M. Fayyad. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). AAAI Press. pp. 226–231. ISBN 1-57735-004-9.
Type Clustering
Execution ...
LOF
Description Local Outlier Factor (LOF). A clustering algorithm for real valued vectors that relies on Local Outlier Factor algorithm, i.e. an algorithm for finding anomalous data points by measuring the local deviation of a given data point with respect to its neighbours. A Maximum of 4000 points is allowed.
Type Clustering
Execution ...
KMEANS
Description A clustering algorithm for real valued vectors that relies on the k-means algorithm, i.e. a method aiming to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. A Maximum of 4000 points is allowed.
A clustering algorithm real value vectors that relies on the k-means algorithm, i.e. a method aiming to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. It accepts as input a table and some parameters characterising the result such as the number of expected clusters, the maximum number of iterations, the minimum number of points defining an outlier. It produces … . The implementation supports tables containing 4000 entries at maximum. For more information see: MacQueen, J. B. (1967). "Some Methods for classification and Analysis of Multivariate Observations". Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability 1. University of California Press. pp. 281–297. MR 0214227. Zbl 0214.46201. Retrieved 2009-04-07.
Type Clustering
Execution ...
XMEANS
Description A clustering algorithm for occurrence points that relies on the X-Means algorithm, i.e. an extended version of the K-Means algorithm improved by an Improve-Structure part. A Maximum of 4000 points is allowed.
Type Clustering
Execution ...

Ecological Modeling Algorithms

AQUAMAPSNN
Description The AquaMaps model trained using a Feed Forward Neural Network. This is a method to train a generic Feed Forward Artifical Neural Network to be used by the AquaMaps Neural Network algorithm. Produces a trained neural network in the form of a compiled file which can be used later.
A <type> algorithm that <what it does>. It accepts as input <input>. It produces <output>. <limitation>. For more information see: <citation/ref>
Type Models
Execution ...
AQUAMAPS_NATIVE
Description Algorithm for Native Distribution by AquaMaps. A distribution algorithm that generates a table containing species distribution probabilities on half-degree cells according to the AquaMaps approach for Native (Actual) distributions.
A distribution algorithm that generates a table containing species distribution probabilities on half-degree cells according to the AquaMaps approach with suitable distribution. It accepts as input a table containing species envelops (HSPEN), a table containing environmental parameters (HCAF) and a table containing species occurrences points (half-degree cells). It produces a table containing species distribution probabilities. <limitation>. For more information see: Kesner-Reyes, K., K. Kaschner, S. Kullander, C. Garilao, J. Barile, and R. Froese. 2012. AquaMaps: algorithm and data sources for aquatic organisms. In: Froese, R. and D. Pauly. Editors. 2012. FishBase. World Wide Web electronic publication. www.fishbase.org, version (04/2012).
Type Distributions
Execution ...
AQUAMAPS_NATIVE_2050
Description Algorithm for Native 2050 Distribution by AquaMaps. A distribution algorithm that generates a table containing species distribution probabilities on half-degree cells according to the AquaMaps approach with native distribution estimated for 2050.
Type Distributions
Execution ...
AQUAMAPS_NATIVE_NEURALNETWORK
Description Aquamaps Native Algorithm calculated by a Neural Network. A distribution algorithm that relies on Neural Networks and AquaMaps data for native distributions to generate a table containing species distribution probabilities on half-degree cells.
Type Distributions
Execution ...
AQUAMAPS_SUITABLE
Description Algorithm for Suitable Distribution by AquaMaps. A distribution algorithm that generates a table containing species distribution probabilities on half-degree cells according to the AquaMaps approach for suitable (potential) distributions.
Type Distributions
Execution ...
AQUAMAPS_SUITABLE_2050
Description Algorithm for Suitable 2050 Distribution by AquaMaps. A distribution algorithm that generates a table containing species distribution probabilities on half-degree cells according to the AquaMaps approach for suitable (potential) distributions for the 2050 scenario.
Type Distributions
Execution ...
AQUAMAPS_SUITABLE_NEURALNETWORK
Description Aquamaps Algorithm for Suitable Environment calculated by Neural Network. A distribution algorithm that relies on Neural Networks and AquaMaps data for suitable distributions to generate a table containing species distribution probabilities on half-degree cells.
Type Distributions
Execution ...

Signal Processing Algorithms

Time Series Analysis
Description An algorithms applying signal processing to a non uniform time series. A maximum of 10000 distinct points in time is allowed to be processed. The process uniformly samples the series, then extracts hidden periodicities and signal properties. The sampling period is the shortest time difference between two points. Finally, by using Caterpillar-SSA the algorithm forecasts the Time Series. The output shows the detected periodicity, the forecasted signal and the spectrogram.
Type Time Series Analysis
Execution ...

Miscellaneous Algorithms

ABSENCE_CELLS_FROM_AQUAMAPS
Description An algorithm producing cells and features (HCAF) for a species containing absence points taken by an Aquamaps Distribution.
A transducer algorithm that generates an Half-degree Cells Authority File (HCAF) dataset for species estimated absences points. It accepts as input a table xxx, a table xxx, the target species and the number of points to select. It produces an HCAF table containing environmental parameters on selected points. <limitation>. For more information see: Kesner-Reyes, K., K. Kaschner, S. Kullander, C. Garilao, J. Barile, and R. Froese. 2012. AquaMaps: algorithm and data sources for aquatic organisms. In: Froese, R. and D. Pauly. Editors. 2012. FishBase. World Wide Web electronic publication. www.fishbase.org, version (04/2012).
Type Transducer
Execution Single machine
BIOCLIMATE_HCAF
Description A transducer algorithm that generates an Half-degree Cells Authority File (HCAF) dataset for a certain time frame, with environmental parameters used by the AquaMaps approach. Evaluates the climatic changes impact on the variation of the ocean features contained in HCAF tables
Type Transducer
Execution Single machine
BIOCLIMATE_HSPEC
Description A transducer algorithm that generates a table containing an estimate of species distributions per half-degree cell (HSPEC) in time. Evaluates the climatic changes impact on species presence.
Type Transducer
Execution Single machine
BIOCLIMATE_HSPEN
Description A transducer algorithm that generates a table containing species envelops (HSPEN) in time, i.e. models capturing species tolerance with respect to environmental parameters, used by the AquaMaps approach. Evaluates the climatic changes impact on the variation of the salinity values in several ranges of a set of species envelopes
Type Transducer
Execution Single machine
BIONYM
Description An algorithm implementing BiOnym, a flexible workflow approach to taxon name matching. The workflow allows to activate several taxa names matching algorithms and to get the list of possible transcriptions for a list of input raw species names with possible authorship indication.
Type  ???
Execution  ???
BIONYM_BIODIV
Description An algorithm implementing BiOnym, a flexible workflow approach to taxon name matching. The workflow allows to activate several taxa names matching algorithms and to get the list of possible transcriptions for a list of input raw species names with possible authorship indication.
Type  ???
Execution  ???
BIONYM_LOCAL
Description A fast version of the algorithm implementing BiOnym, a flexible workflow approach to taxon name matching. The workflow allows to activate several taxa names matching algorithms and to get the list of possible transcriptions for a list of input raw species names with possible authorship indication.
Type  ???
Execution Single machine
DISCREPANCY_ANALYSIS
Description An evaluator algorithm that compares two tables containing real valued vectors. It drives the comparison by relying on a geographical distance threshold and a threshold for K-Statistic.
An evaluator algorithm that compares two tables containing estimations of species occurrence by species and half-degree cell (HSPEC). It accepts as input the two tables and some parameters driving the comparison such as the comparison threshold and the threshold for K-Statistic. It produces <output>. <limitation>. For more information see: <citation/ref>
Type Evaluator
Execution Single machine
FEED_FORWARD_ANN
Description A method to train a generic Feed Forward Artificial Neural Network in order to simulate a function from the features space (R^n) to R. Uses the Back-propagation method. Produces a trained neural network in the form of a compiled file which can be used in the FEED FORWARD NEURAL NETWORK DISTRIBUTION algorithm.
A modeling algorithm that relies on Neural Networks to <xxx>. It accepts as input a table containing the training dataset and some parameters affecting the algorithm behaviour such as the number of neurons, the learning threshold and the maximum number of iterations. It produces <output>. <limitation>. For more information see: <citation/ref>
Type Models
Execution  ???
FEED_FORWARD_A_N_N_DISTRIBUTION
Description A Bayesian method using a Feed Forward Neural Network to simulate a function from the features space (R^n) to R. A modeling algorithm that relies on Neural Networks to simulate a real valued function. It accepts as input a table containing the training dataset and some parameters affecting the algorithm behaviour such as the number of neurons, the learning threshold and the maximum number of iterations.
Type Distribution
Execution  ???
FIN_GSAY_MATCH
Description An algorithm for GSAy Matching with respect to the Fishbase database
Type Transducers
Execution Single Machine
FIN_TAXA_MATCH
Description An algorithm for Taxa Matching with respect to the Fishbase database
A transducer algorithm that compares a species nomenclature with the Fishbase database according to the TAXAMATCH approach. It accepts as input the species nomenclature (genus and species) and the comparison operators to use, e.g. equal, begins with, contains. It produces <output>. <limitation>. For more information see: Rees, T., 2008. Applications of fuzzy (approximate string) matching in taxonomic database searches, with an example multi-tiered approach. [Extended abstract]. Pp. 12-14 in Worcester, T., Bajona, L. & Branton, B. (eds): Proceedings of a Conference on Ocean Biodiversity Informatics, Bedford Institute of Oceanography, Dartmouth, Nova Scotia, 2-4 October 2007. Bedford Institute of Oceanography, 2008 (CSAS/SCCS Proceedings Series 2008/024).
Type Transducers
Execution Single Machine
GET_OCCURRENCES_ALGORITHM
Description An Algorithm that retrieves the occurrences from a data provided based on the given search options</Description>
A transducer algorithm that produces a dataset of species occurrences for a set of target species by retrieving these from major data providers including GBIF and OBIS. It accepts as input a list of species names and parameters including the data provider to use and query expansion criteria. It produces a DarwinCore file with the occurrences. <limitation>. For more information see: <citation/ref>
Type Transducers
Execution Single Machine
GET_TAXA_ALGORITHM
Description An Algorithm that retrieves the taxon from a data provided based on the given search options</Description>
A transducer algorithm that produces a dataset of species taxonomic information for a set of target species by retrieving these from major data providers including Catalogue of Life, OBIS, WoRMS. It accepts as input a list of species names and parameters including the data provider to use and query expansion criteria. It produces a DarwinCore file with the occurrences. It produces <output>. <limitation>. For more information see: <citation/ref>
Type Transducers
Execution Single Machine
HCAF_FILTER
Description An algorithm producing a HCAF table on a selected Bounding Box (default identifies Indonesia)
A transducer algorithm that produces a version of an Half-degree Cells Authority File (HCAF) dataset with environmental parameters to be used by the AquaMaps approach for a target area. It accepts as input the table and the bounding box representing the target area. It produces <output>. <limitation>. For more information see: <citation/ref>
Type Transducers
Execution Single Machine
HCAF_INTERPOLATION
Description Evaluates the climatic changes impact on species presence
A transducer algorithm that generates a number of Half-degree Cells Authority File (HCAF) dataset with environmental parameters to be used by the AquaMaps approach by interpolation. It accepts as input the HCAF table representing the starting case, the HCAF table representing the ending case and parameters affecting interpolation such as the number of tables to produce and the interpolation function to use, e.g. linear, parabolic. It produces <output>. <limitation>. For more information see: <citation/ref>
Type Transducers
Execution Single Machine
HRS
Description An evaluator algorithm that calculates the Habitat Representativeness Score, i.e. an indicator of the assessment of whether a specific survey coverage or another environmental features dataset, contains data that are representative of all available habitat variable combinations in an area.
A evaluator algorithm that calculate the Habitat Representativeness Score, i.e. an indicator of the assessment of whether a specific survey coverage contains data that are representative of all available habitat variable combinations in an area. It accepts as input the target area, a table with positive case, a table with negative cases. It produces <output>. <limitation>. For more information see: Colin D. MacLeod (2010). Habitat representativeness score (HRS): a novel concept for objectively assessing the suitability of survey coverage for modelling the distribution of marine species. Journal of the Marine Biological Association of the United Kingdom, 90, pp 1269-1277. doi:10.1017/S0025315410000408.
Type Evaluators
Execution Single Machine
HSPEN
Description The AquMaps HSPEN algorithm. A modeling algorithm that generates a table containing species envelops (HSPEN), i.e. models capturing species tolerance with respect to environmental parameters, to be used by the AquaMaps approach.
A modeling algorithm that generates a table containing species envelops (HSPEN), i.e. models capturing species tolerance with respect to environmental parameters, to be used by the AquaMaps approach. It accepts as input a starting version of the HSPEN table, a table containing Half-degree Cells Authority File (HCAF) dataset with environmental parameters, and a table containing species occurrences data in half-deegree cells. It produces <output>. <limitation>. For more information see: Kesner-Reyes, K., K. Kaschner, S. Kullander, C. Garilao, J. Barile, and R. Froese. 2012. AquaMaps: algorithm and data sources for aquatic organisms. In: Froese, R. and D. Pauly. Editors. 2012. FishBase. World Wide Web electronic publication. www.fishbase.org, version (04/2012).
Type Models
Execution Single Machine
HSPEN_FILTER
Description An algorithm producing a HSPEN table containing only the selected species
A transducer algorithm that generates a table containing species envelops (HSPEN), i.e. models capturing species tolerance with respect to environmental parameters, to be used by the AquaMaps approach for a set of target species. It accepts as input a starting version of the HSPEN table and a list of target species. It produces <output>. <limitation>. For more information see: Kesner-Reyes, K., K. Kaschner, S. Kullander, C. Garilao, J. Barile, and R. Froese. 2012. AquaMaps: algorithm and data sources for aquatic organisms. In: Froese, R. and D. Pauly. Editors. 2012. FishBase. World Wide Web electronic publication. www.fishbase.org, version (04/2012).
Type Transducers
Execution Single Machine
TIMEEXTRACTION
Description An algorithm to extract a time series of values associated to a geospatial features repository (e.g. NETCDF, ASC, GeoTiff files etc. ). The algorithm analyses the time series and automatically searches for hidden periodicities. It produces one chart of the time series, one table containing the time series values and possibly the spectrogram.
Type Transducer
Execution Single machine
ZETAEXTRACTION_TABLE
Description An algorithm to extract a time series of values associated to a table containing geospatial information. The algorithm analyses the time series and automatically searches for hidden periodicities. It produces one chart of the time series, one table containing the time series values and possibly the spectrogram.
Type Transducer
Execution Single machine