Difference between revisions of "Legacy applications Biological Observations Enrichment"

From Gcube Wiki
Jump to: navigation, search
(1. Main Goal :)
(2. Default options of the process)
Line 25: Line 25:
 
By default,  this R script executes a simple temporal buffer 'ClosestTime' which collects a set of coverages (datasets) for which the temporal distance (for given time units, days by default) is the closest from each geometry related to a biological observation (brought by users).
 
By default,  this R script executes a simple temporal buffer 'ClosestTime' which collects a set of coverages (datasets) for which the temporal distance (for given time units, days by default) is the closest from each geometry related to a biological observation (brought by users).
  
[[File:/Biological_observations_enrichment_figure_2.jpg|center|thumb|600px|Figure 2 : Selection of coverages (from netCDF-CF) which are the closest in time for a given set of features related to biological observations attributes.]]  
+
[[File:Biological_observations_enrichment_figure_2.jpg|center|thumb|600px|Figure 2 : Selection of coverages (from netCDF-CF) which are the closest in time for a given set of features related to biological observations attributes.]]  
  
  
Line 33: Line 33:
 
Whatever the type of features, it is required to rasterize the location of each biological observation to extract the values of environmental parameters in the proper cells of coverages which are stored in netCDF files. To achieve this we use different spatial analysis functions according the type of geometry for each feature (points, lines, polygons).
 
Whatever the type of features, it is required to rasterize the location of each biological observation to extract the values of environmental parameters in the proper cells of coverages which are stored in netCDF files. To achieve this we use different spatial analysis functions according the type of geometry for each feature (points, lines, polygons).
  
[[File:/Biological_observations_enrichment_figure_3.jpg|center|thumb|600px|Figure 3 : Rasterization of a geometry (feature location of the biological observation) to get corresponding pixels in raster mode.]]  
+
[[File:Biological_observations_enrichment_figure_3.jpg|center|thumb|600px|Figure 3 : Rasterization of a geometry (feature location of the biological observation) to get corresponding pixels in raster mode.]]  
  
  
Line 45: Line 45:
 
**alternative methods for spatial buffers are described in section 3.2.1
 
**alternative methods for spatial buffers are described in section 3.2.1
  
[[File:/Biological_observations_enrichment_figure_4.jpg|center|thumb|600px|Figure  4: Default spatial buffer when geometry type is "POINT".]]  
+
[[File:Biological_observations_enrichment_figure_4.jpg|center|thumb|600px|Figure  4: Default spatial buffer when geometry type is "POINT".]]  
  
 
====    2.2.2 Method for biological observation locations given by polygons ====
 
====    2.2.2 Method for biological observation locations given by polygons ====
Line 56: Line 56:
  
  
[[File:/Biological_observations_enrichment_figure_5.jpg|center|thumb|600px|Figure 5: Default spatial buffer when geometry type is "POLYGON".]]  
+
[[File:Biological_observations_enrichment_figure_5.jpg|center|thumb|600px|Figure 5: Default spatial buffer when geometry type is "POLYGON".]]  
  
  
 
====    2.2.3 Method for biological observation locations given by lines ====
 
====    2.2.3 Method for biological observation locations given by lines ====

Revision as of 11:59, 12 June 2014

1. Main Goal :

We aim at adding some values related to environmental parameters observations (taken from datasets which are coverages managed in multi-dimensionsal raster data format: netCDF-CF or HDF or any gdal data formats) to existing biological observations which are georeferenced with various geometries / Features (points, lines, polygons…) and managed in vector formats (in spatial RDBMS or any ogr data formats: shapefiles..). This process has thus to deal with usual rasterization and vectorization issues which are not obvious for many scientists in ecology. Depending on data sources, this kind of process has to take into account some cases where the coverages expected to supply additionnal environmental observations are not available for some given dates and given locations (cloud cover…). When there is a lack of coverage, data search has to be extended to a set of coverages which are within a given spatio-temporal frame (through buffers). Moreover, according to the data types and the users, depending on the kinds of biological and environmental parameters, the spatio-temporal distance between Features and Coverages used to collect a set of raster cells as well as statistical methods which can be applied to calculate missing values will differ. The ability to tune the execution of each process with a set of input parameters is key. However default methods are used for users who don’t know how to manage underlying spatial analysis.

We will enrich step by step a process (written in R) whose structure is made of functions in charge of: rasterization / vectorization issues, temporal and spatial buffers to collect additional values (populations of pixels) when expected ones are missing, statistical methods to analyze the characteristics of the population of pixels returned by function 2 and calculate a set of values (mean value, standard deviation…).

The Figure 1 gives an example of biological observations which are enriched with environmental parameters. In this case points describing fishing operations are extracted from the BALBAYA database managed in a spatial RDBMS (Postgres & Postgis). The user indicates the URL giving access to a netCDF-CF file (through OPeNDAP protocol) to extract values related to the SST variable.

Figure 1 : biological observations enrichment: SQL extraction from a fisheries database and enrichment with SST values from a netCDF-CF (OPeNDAP).

In the next sections, we describe the possible options to collect environmental parameters in different ways and apply different kinds of calculation methods (mean value with or without ponderation, error bar…).


2. Default options of the process

2.1 Default function for temporal buffer:

By default, this R script executes a simple temporal buffer 'ClosestTime' which collects a set of coverages (datasets) for which the temporal distance (for given time units, days by default) is the closest from each geometry related to a biological observation (brought by users).

Figure 2 : Selection of coverages (from netCDF-CF) which are the closest in time for a given set of features related to biological observations attributes.


2.2 Default function for spatial buffer:

Whatever the type of features, it is required to rasterize the location of each biological observation to extract the values of environmental parameters in the proper cells of coverages which are stored in netCDF files. To achieve this we use different spatial analysis functions according the type of geometry for each feature (points, lines, polygons).

Figure 3 : Rasterization of a geometry (feature location of the biological observation) to get corresponding pixels in raster mode.


2.2.1 Method for biological observation locations given by points

In this method, we first search the location (“longitude – latitude”) of points related to biological observations et we harvest environmental data values which are:

  • either exactly at this location (matching cells in the raster file) as shown in Figure 2,
  • or (no values / NaN in matching cells) we enlarge the search area by using differents kinds of spatial buffer:
    • the default spatial buffer is illustrated in Figure 4: we collect the values of the 4 closest pixels,
    • alternative methods for spatial buffers are described in section 3.2.1
Figure 4: Default spatial buffer when geometry type is "POINT".

2.2.2 Method for biological observation locations given by polygons

For this type of geometry, we extract a population of pixels / cells which are located in rasterized biological observations areas. To achieve this, we aim at proposing different methods:

  • default method (illustration in Figure 5): the default spatial analysis enables to collect the pixels which are within or overlapping the observation area. For each cell of this pixel population, a P factor (weight for ponderation) is returned:
    • P = 1 for pixels within the polygon.
    • P ∊ ]0,1[ for pixels only overlapping the polygon.
  • alternative methods for spatial buffers are described in section 3.2.2


Figure 5: Default spatial buffer when geometry type is "POLYGON".


2.2.3 Method for biological observation locations given by lines