Difference between revisions of "Legacy applications Biological Observations Enrichment"

From Gcube Wiki
Jump to: navigation, search
(Created page with '== 1. Main Goal : == We aim at adding some values related to environmental parameters observations (taken from datasets which are coverages managed in multi-dimensionsal raster …')
 
(1. Main Goal :)
Line 12: Line 12:
 
The Figure 1 gives an example of biological observations which are enriched with environmental parameters. In this case points describing fishing operations are extracted from the BALBAYA database managed in a spatial RDBMS (Postgres & Postgis). The user indicates the URL giving access to a netCDF-CF file (through OPeNDAP protocol) to extract values related to the SST variable.
 
The Figure 1 gives an example of biological observations which are enriched with environmental parameters. In this case points describing fishing operations are extracted from the BALBAYA database managed in a spatial RDBMS (Postgres & Postgis). The user indicates the URL giving access to a netCDF-CF file (through OPeNDAP protocol) to extract values related to the SST variable.
  
[[File:biological_observations_enrichment.jpg]]  
+
[[File:biological_observations_enrichment.jpg|center|thumb|600px|Figure 1 : biological observations enrichment: SQL extraction from a fisheries database and enrichment with SST values from a netCDF-CF (OPeNDAP).]]  
  
 
Figure 1 : biological observations enrichment: SQL extraction from a fisheries database and enrichment with SST values from a netCDF-CF (OPeNDAP).
 
Figure 1 : biological observations enrichment: SQL extraction from a fisheries database and enrichment with SST values from a netCDF-CF (OPeNDAP).
  
 
In the next sections, we describe the possible options to collect environmental parameters in different ways and apply different kinds of calculation methods (mean value with or without ponderation, error bar…).
 
In the next sections, we describe the possible options to collect environmental parameters in different ways and apply different kinds of calculation methods (mean value with or without ponderation, error bar…).

Revision as of 12:34, 12 June 2014

1. Main Goal :

We aim at adding some values related to environmental parameters observations (taken from datasets which are coverages managed in multi-dimensionsal raster data format: netCDF-CF or HDF or any gdal data formats) to existing biological observations which are georeferenced with various geometries / Features (points, lines, polygons…) and managed in vector formats (in spatial RDBMS or any ogr data formats: shapefiles..). This process has thus to deal with usual rasterization and vectorization issues which are not obvious for many scientists in ecology. Depending on data sources, this kind of process has to take into account some cases where the coverages expected to supply additionnal environmental observations are not available for some given dates and given locations (cloud cover…). When there is a lack of coverage, data search has to be extended to a set of coverages which are within a given spatio-temporal frame (through buffers). Moreover, according to the data types and the users, depending on the kinds of biological and environmental parameters, the spatio-temporal distance between Features and Coverages used to collect a set of raster cells as well as statistical methods which can be applied to calculate missing values will differ. The ability to tune the execution of each process with a set of input parameters is key. However default methods are used for users who don’t know how to manage underlying spatial analysis.

We will enrich step by step a process (written in R) whose structure is made of functions in charge of: rasterization / vectorization issues, temporal and spatial buffers to collect additional values (populations of pixels) when expected ones are missing, statistical methods to analyze the characteristics of the population of pixels returned by function 2 and calculate a set of values (mean value, standard deviation…).

The Figure 1 gives an example of biological observations which are enriched with environmental parameters. In this case points describing fishing operations are extracted from the BALBAYA database managed in a spatial RDBMS (Postgres & Postgis). The user indicates the URL giving access to a netCDF-CF file (through OPeNDAP protocol) to extract values related to the SST variable.

Figure 1 : biological observations enrichment: SQL extraction from a fisheries database and enrichment with SST values from a netCDF-CF (OPeNDAP).

Figure 1 : biological observations enrichment: SQL extraction from a fisheries database and enrichment with SST values from a netCDF-CF (OPeNDAP).

In the next sections, we describe the possible options to collect environmental parameters in different ways and apply different kinds of calculation methods (mean value with or without ponderation, error bar…).