Difference between revisions of "Geospatial Data Mining"

From Gcube Wiki
Jump to: navigation, search
 
(10 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Geospatial Data Mining is a set of facilities that aim to (i) compare two geospatial distributions, (ii) to retrieve spatiotemporal information from a remotely hosted geospatial layer and (iii) to perform data mining on geographical layers containing environmental information. Geospatial data mining is included in the '''EcologicalEngineGeoSpatialExtension''' library of the gCube framework, as it relies on the data mining processes contained in the '''EcologicalEngine''' library.
+
[[Category:gCube Spatial Data Infrastructure]][[Category:TO BE REMOVED]]
 +
 
 +
Geospatial Data Mining is a set of facilities that aim to (i) compare two spatial probability\quantities distributions, (ii) retrieve spatiotemporal information from a remotely hosted geospatial layer and (iii) perform data mining on geographical layers containing environmental information. Geospatial Data Mining is included in the '''EcologicalEngineGeoSpatialExtension''' library of the gCube framework, and relies on the data mining processes contained in the '''EcologicalEngine''' library.
  
 
== Overview ==
 
== Overview ==
Geospatial processing is useful in many applications of marine sciences, including Niche Modeling, Vessels Information processing, Ecological Modeling and Biodiversity monitoring. Environmental characteristics are usually put in the format of n-dimensional vector of real values. Such vectors must be as independent as possible in order to properly describe a phenomenon. Dependent vectors correspond to redundant information are not useful to automatic models. Geospatial processing includes procedures to retrieve environmental information in the format of n-dimensional vectors and the processing needed to evaluate the differences between two datasets or the degree of completeness of a single dataset. This can be essential in calculating the difference between the presence distributions of a certain species in two different years, in order to understand if the distribution is wider or narrower.
+
Geospatial processing is useful in many applications of marine sciences, including Niche Modeling, Vessels Information processing, Ecological Modeling and Biodiversity monitoring. Environmental characteristics are usually put in the format of n-dimensional vectors of real values. Such vectors must be independent in order to properly describe a phenomenon. Dependent vectors, in fact, correspond to redundant information and are not useful to automatic models. Geospatial processing includes procedures to retrieve environmental information in the format of n-dimensional vectors and the processing needed to evaluate the differences between two datasets or the degree of completeness of a single dataset. This can be essential in calculating the difference between the presence distributions of a certain species in two different years, in order to understand if the distribution is wider or narrower.
  
 
== Features ==
 
== Features ==
 
The features currently supported by the Geospatial data mining facilities include:
 
The features currently supported by the Geospatial data mining facilities include:
  
* Environmental layers indexing on a GeoNetwork instance, with respect to the ISO19115:2003 specifications.
+
* Environmental layers indexing on a GeoNetwork instance, with respect to the ISO19115:2003 specifications;
* Retrieval of environmental parameters information associated to a coordinates triple. Such information is given according to the time instants included in the layer.
+
* Retrieval of environmental parameters information associated to a coordinates triple. Such information is given according to the time instants included in a layer;
* Retrieval of environmental parameters values associates to a set of points and an a time instant.
+
* Retrieval of environmental parameters values associates to a set of points and to a time instant;
* Automatic simulation of values in the points in which information is not defined.
+
* Automatic simulation of values in the points in which information is not defined;
* Management of WFS and OpenDap based layers in seamless way to the library users.
+
* Management of WFS and OpenDAP based layers in seamless way to the library users.
  
 
== Software ==
 
== Software ==
Line 24: Line 26:
 
</source>
 
</source>
  
An example to call the spectrogram analysis with STFT and produce the chart is:  
+
An example to call the retrieval of several layers metadata which are stored on a thredds instance in a gCube based e-infrastructure:  
  
 
<source lang="java">
 
<source lang="java">
SignalConversions.spectrogram(name, signal, samplingRate, windowshift, frameslength, display)
+
FeaturesManager featurer = new FeaturesManager();
 +
featurer.setScope("gcube/devsec");
 +
List<Metadata> metadata = featurer.getAllGNInfobyText("thredds", "1");
 
</source>
 
</source>
  
Where the input variables are:
+
An example to retrieve a grid of values according to a certain bounding box:
 
<source lang="java">
 
<source lang="java">
String name: the title of the chart
+
String layertitle = "temperature";
double[] signal: the sequence of values representing the trend
+
GeoIntersector intersector = new GeoIntersector("gcube/devsec", "./cfg/");
int samplingRate: the sampling frequency in integer value and multiple of 2
+
//takes the values contained in a bounding box according to the last time recorded in the layer
int windowshift: the window shift of the STFT in samples
+
double[][] valuesgrid = intersector.takeLastTimeChunk(layertitle, -10, 10, -10, 10, 0,1, 1);
int frameslength: the length of each window in samples
+
boolean display: a flag to ask the procedure to run an applet which displays the spectrogram
+
 
</source>
 
</source>
  
An example which performs a signal reconstruction is:
+
An example to write Metadata in ISO19115:2003 format on a GeoNetwork instance, associated to a certain parameter contained in a NetCDF file:
 
+
<source lang="java">
+
AlgorithmConfiguration config = new AlgorithmConfiguration();
+
config.setConfigPath(configDir);
+
config.initRapidMiner();
+
SignalProcessing.fillSignal(signal)
+
</source>
+
 
+
where the input parameters are defined as follows:
+
 
+
 
<source lang="java">
 
<source lang="java">
double[] signal: the sequence of values representing the trend
+
NetCDFMetadata metadataInserter = new NetCDFMetadata();
String configDir: a configuration folder containing the configuration files required by the Ecological Engine library
+
metadataInserter.setGeonetworkUrl("http://geoserver-dev.d4science-ii.research-infrastructures.eu/geonetwork/");
 +
metadataInserter.setGeonetworkUser("username");
 +
metadataInserter.setGeonetworkPwd("password");
 +
metadataInserter.setThreddsCatalogUrl("http://thredds.research-infrastructures.eu:8080/thredds/catalog/public/netcdf/catalog.xml");
 +
metadataInserter.setLayerUrl("http://thredds.research-infrastructures.eu:8080/thredds/dodsC/public/netcdf/04091217_ruc.nc");
 +
metadataInserter.setTitle("temperature (04091217ruc.nc)");
 +
//parameter name as defined in the NetCDF file
 +
metadataInserter.setLayerName("T");
 +
metadataInserter.setSourceFileName("04091217_ruc.nc");
 +
metadataInserter.setAbstractField("T: temperature (degK) from 04091217ruc.nc resident on a THREDDS instance");
 +
metadataInserter.setResolution(0.5);
 +
metadataInserter.setXLeftLow(-180);
 +
metadataInserter.setYLeftLow(-85.5);
 +
metadataInserter.setXRightUpper(180);
 +
metadataInserter.setYRightUpper(85.5);
 +
metadataInserter.insertMetaData();
 
</source>
 
</source>
  
The cfg directory and the Ecological Engine library are accessible at this svn link: http://svn.research-infrastructures.eu/d4science/gcube/trunk/data-analysis/EcologicalEngine
+
The cfg directory and the Ecological Engine GeoSpatial Extension library are accessible at this svn link: https://svn.d4science.research-infrastructures.eu/gcube/trunk/data-analysis/EcologicalEngineGeoSpatialExtension
  
 
== Experiments ==
 
== Experiments ==
 +
The complete list of environmental parameters, currently produced by the Geospatial Data Mining facilities and indexed on the D4Science e-infrastructure Geonetwork, can be found at the following link: [http://bscw.research-infrastructures.eu/bscw/bscw.cgi/d280656/ListofAvailableEnvironmentalParameters.xlsx ListofAvailableEnvironmentalParameters]

Latest revision as of 19:16, 6 July 2016


Geospatial Data Mining is a set of facilities that aim to (i) compare two spatial probability\quantities distributions, (ii) retrieve spatiotemporal information from a remotely hosted geospatial layer and (iii) perform data mining on geographical layers containing environmental information. Geospatial Data Mining is included in the EcologicalEngineGeoSpatialExtension library of the gCube framework, and relies on the data mining processes contained in the EcologicalEngine library.

Overview

Geospatial processing is useful in many applications of marine sciences, including Niche Modeling, Vessels Information processing, Ecological Modeling and Biodiversity monitoring. Environmental characteristics are usually put in the format of n-dimensional vectors of real values. Such vectors must be independent in order to properly describe a phenomenon. Dependent vectors, in fact, correspond to redundant information and are not useful to automatic models. Geospatial processing includes procedures to retrieve environmental information in the format of n-dimensional vectors and the processing needed to evaluate the differences between two datasets or the degree of completeness of a single dataset. This can be essential in calculating the difference between the presence distributions of a certain species in two different years, in order to understand if the distribution is wider or narrower.

Features

The features currently supported by the Geospatial data mining facilities include:

  • Environmental layers indexing on a GeoNetwork instance, with respect to the ISO19115:2003 specifications;
  • Retrieval of environmental parameters information associated to a coordinates triple. Such information is given according to the time instants included in a layer;
  • Retrieval of environmental parameters values associates to a set of points and to a time instant;
  • Automatic simulation of values in the points in which information is not defined;
  • Management of WFS and OpenDAP based layers in seamless way to the library users.

Software

The software is available on the gCube maven repository by including the following component in the pom.xml file:

<dependency>
  <groupId>org.gcube.dataanalysis</groupId>
  <artifactId>ecological-engine-geospatial-extensions</artifactId>
  <version>1.0.0-SNAPSHOT</version>
</dependency>

An example to call the retrieval of several layers metadata which are stored on a thredds instance in a gCube based e-infrastructure:

FeaturesManager featurer = new FeaturesManager();
featurer.setScope("gcube/devsec");
List<Metadata> metadata = featurer.getAllGNInfobyText("thredds", "1");

An example to retrieve a grid of values according to a certain bounding box:

String layertitle = "temperature";
GeoIntersector intersector = new GeoIntersector("gcube/devsec", "./cfg/");
//takes the values contained in a bounding box according to the last time recorded in the layer
double[][] valuesgrid = intersector.takeLastTimeChunk(layertitle, -10, 10, -10, 10, 0,1, 1);

An example to write Metadata in ISO19115:2003 format on a GeoNetwork instance, associated to a certain parameter contained in a NetCDF file:

NetCDFMetadata metadataInserter = new NetCDFMetadata();
metadataInserter.setGeonetworkUrl("http://geoserver-dev.d4science-ii.research-infrastructures.eu/geonetwork/");
metadataInserter.setGeonetworkUser("username");
metadataInserter.setGeonetworkPwd("password");
metadataInserter.setThreddsCatalogUrl("http://thredds.research-infrastructures.eu:8080/thredds/catalog/public/netcdf/catalog.xml");
metadataInserter.setLayerUrl("http://thredds.research-infrastructures.eu:8080/thredds/dodsC/public/netcdf/04091217_ruc.nc");
metadataInserter.setTitle("temperature (04091217ruc.nc)");
//parameter name as defined in the NetCDF file
metadataInserter.setLayerName("T");
metadataInserter.setSourceFileName("04091217_ruc.nc");
metadataInserter.setAbstractField("T: temperature (degK) from 04091217ruc.nc resident on a THREDDS instance");
metadataInserter.setResolution(0.5);
metadataInserter.setXLeftLow(-180);
metadataInserter.setYLeftLow(-85.5);
metadataInserter.setXRightUpper(180);
metadataInserter.setYRightUpper(85.5);
metadataInserter.insertMetaData();

The cfg directory and the Ecological Engine GeoSpatial Extension library are accessible at this svn link: https://svn.d4science.research-infrastructures.eu/gcube/trunk/data-analysis/EcologicalEngineGeoSpatialExtension

Experiments

The complete list of environmental parameters, currently produced by the Geospatial Data Mining facilities and indexed on the D4Science e-infrastructure Geonetwork, can be found at the following link: ListofAvailableEnvironmentalParameters