Difference between revisions of "Kernel density"

From Gcube Wiki
Jump to: navigation, search
 
Line 1: Line 1:
 +
[[Category: gCube Spatial Data Infrastructure]][[Category:TO BE REMOVED]]
 +
 
In statistics, [http://en.wikipedia.org/wiki/Kernel_density_estimation ''Kernel Density Estimation (KDE)''] is a non-parametric way to estimate the probability density function of a random variable.
 
In statistics, [http://en.wikipedia.org/wiki/Kernel_density_estimation ''Kernel Density Estimation (KDE)''] is a non-parametric way to estimate the probability density function of a random variable.
 
In our case, we have a two-dimensional variable, representing occurrence points of a given species in a given data range.
 
In our case, we have a two-dimensional variable, representing occurrence points of a given species in a given data range.

Latest revision as of 18:27, 6 July 2016


In statistics, Kernel Density Estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable. In our case, we have a two-dimensional variable, representing occurrence points of a given species in a given data range.

The Wps Service associated with this specific algorithm is called Species Occurrences Kernel Density Estimation (SOKDE)


Overview

The goal of this wps-service is to offer a functional, scalable and OGC-compliant tool to generate, from a set of species in a given temporal range, a Shapefile (for each species) , that represent the KDE on the relatives occurrence points sets. To do this, the service first takes the occurrence points from the species scientific name, then applies the KDE on the generated occurrence points. This is done for each species in the set.

The KDE algorithm is provided by the Institut de recherche pour le développement (IRD), written in R language. This algorithm gets an occurrence file and a set of percentages to generate the shapefile with polygons for each probability density given percentage.

The occurrence points generation from species names and data range is done by Species Product Discovery Tools, a CLI application that generate an occurrence csv file exploiting the Species Product Discovery Service.

The whole process make parallelization around all given species, using Wps-Hadoop specifications.


Wps Request Syntax

The process Identifier is com.terradue.wps_hadoop.processes.ird.KernelDensity.

The process description is obtained by the wps DescribeProcess request:

<wps_host>
?service=wps
&version=1.0.0
&request=DescribeProcess
&identifier=com.terradue.wps_hadoop.processes.kernel_density.KernelDensity

The process execution is obtained by the wps Execute request:

<wps_host>
?Service=WPS
&version=1.0.0
&Request=execute
&identifier=com.terradue.wps_hadoop.processes.ird.KernelDensity
&DataInputs=<data_inputs>

Data Inputs Parameters

  • species: list of species, using scientific name (multiple, at least one required)
  • fromDate': lower limit to apply a filter to the temporal occurrence, using ISO 8601 date format (like "1980-11-04T00:00Z") (single, optional)
  • toDate: upper limit to apply a filter to the temporal occurrence, using ISO 8601 date format (single, optional)
  • percentages: percentages of total density estimation for contour lines (multiple, optional, default={25, 50, 75, 80, 90, 95, 98})


Wps Response

The KDE wps process, in accordance with all wps-hadoop-streaming processes, return a complex_data object, which is an xml document with this structure:

<streamingOutput>
	<algorithmName> algorithmName </algorithmName>
	<jobId> jobId </jobId>
	<executionResult>
		<inputData>
			<url> input data file url </url>
		</inputData>
		<outputData>
			<url> output data file url 1 </url>
			<url> output data file url 2 </url>
			...
			<url> output data file url n </url>
		</outputData>
	</executionResult>
	<executionResult>
		...
	</executionResult>
	<executionResult>
		...
	</executionResult>
	...
</streamingOutput>


In particular, the KDE return this output:

<streamingOutput>
	<algorithmName>kernelDensity</algorithmName>
	<jobId> jobId </jobId>
	<executionResult>
		<inputData>
			<url> input data file url (values of species name, fromDate, toDate, percentages) </url>
		</inputData>
		<outputData>
			<url> occurrence file url </url>
			<url> shapefile url (tar.gz) </url>
		</outputData>
	</executionResult>
	<executionResult>
		...
	</executionResult>
	<executionResult>
		...
	</executionResult>
	...
</streamingOutput>


Uses cases

Base Example

Kernel Density of Carcharodon carcharias (White Shark), without date filter and with default set of contour percentages.

Request:

wps01.i-marine.d4science.org/wps/WebProcessingService
?Service=WPS
&version=1.0.0
&Request=execute
&identifier=com.terradue.wps_hadoop.processes.ird.KernelDensity
&dataInputs=
species=Carcharodon carcharias;


Response:

<?xml version="1.0" encoding="UTF-8" ?>
<ns:ExecuteResponse xmlns:ns="http://www.opengis.net/wps/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/wps/1.0.0 http://schemas.opengis.net/wps/1.0.0/wpsExecute_response.xsd" serviceInstance="http://wps01.i-marine.d4science.org:80/wps/WebProcessingService?REQUEST=GetCapabilities&SERVICE=WPS"
xml:lang="en-US" service="WPS" version="1.0.0">
    <ns:Process ns:processVersion="1.0.0">
        <ns1:Identifier xmlns:ns1="http://www.opengis.net/ows/1.1">com.terradue.wps_hadoop.processes.kernel_density.KernelDensity</ns1:Identifier>
        <ows:Title xmlns:wps="http://www.opengis.net/wps/1.0.0" xmlns:ows="http://www.opengis.net/ows/1.1" xmlns:xlink="http://www.w3.org/1999/xlink">Species Occurrences Kernel Density</ows:Title>
    </ns:Process>
    <ns:Status creationTime="2013-04-24T15:14:33.571+02:00">
        <ns:ProcessSucceeded>Process has succeeded</ns:ProcessSucceeded>
    </ns:Status>
    <ns:ProcessOutputs>
        <ns:Output>
            <ns1:Identifier xmlns:ns1="http://www.opengis.net/ows/1.1">result</ns1:Identifier>
            <ows:Title xmlns:wps="http://www.opengis.net/wps/1.0.0" xmlns:ows="http://www.opengis.net/ows/1.1" xmlns:xlink="http://www.w3.org/1999/xlink">result</ows:Title>
            <ns:Data>
                <ns:ComplexData mimeType="application/xml">
                    <streamingOutput>
                        <algorithmName>kernelDensity</algorithmName>
                        <jobId>c95d212d-c7b5-490c-9ed0-e1696fe41c41</jobId>
                        <executionResult>
                            <inputData>
                                <url>http://wps01.i-marine.d4science.org:80/wps/store/c95d212d-c7b5-490c-9ed0-e1696fe41c41/output/files/exec0/inputData.txt</url>
                            </inputData>
                            <outputData>
                                <url>http://wps01.i-marine.d4science.org:80/wps/store/c95d212d-c7b5-490c-9ed0-e1696fe41c41/output/files/exec0/occ.csv</url>
                                <url>http://wps01.i-marine.d4science.org:80/wps/store/c95d212d-c7b5-490c-9ed0-e1696fe41c41/output/files/exec0/output.tgz</url>
                            </outputData>
                        </executionResult>
                    </streamingOutput>
                </ns:ComplexData>
            </ns:Data>
        </ns:Output>
    </ns:ProcessOutputs>
</ns:ExecuteResponse>


Input data contents:

species="Carcharodon carcharias", fromDate=, toDate=, percentages=25 50 75 80 90 95 98

Complex Example

Kernel Density of Amphiprion percula (Percula Clownfish), Thunnus atlanticus (Blackfin Tuna) and Architeuthis (Giant Squid), with date from 1990 to 2005, and with set of contour percentages: (25%, 50%, 75%, 90%)

Request:

wps01.i-marine.d4science.org/wps/WebProcessingService
?Service=WPS
&version=1.0.0
&Request=execute
&identifier=com.terradue.wps_hadoop.processes.ird.KernelDensity
&dataInputs=
species=Amphiprion percula;
species=Thunnus atlanticus;
species=Architeuthis;
fromDate=1990-01-01T00:00Z;
toDate=2005-01-01T00:00Z


Response:

<?xml version="1.0" encoding="UTF-8" ?>
<ns:ExecuteResponse xmlns:ns="http://www.opengis.net/wps/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/wps/1.0.0 http://schemas.opengis.net/wps/1.0.0/wpsExecute_response.xsd" serviceInstance="http://wps01.i-marine.d4science.org:80/wps/WebProcessingService?REQUEST=GetCapabilities&amp;SERVICE=WPS"
xml:lang="en-US" service="WPS" version="1.0.0">
    <ns:Process ns:processVersion="1.0.0">
        <ns1:Identifier xmlns:ns1="http://www.opengis.net/ows/1.1">com.terradue.wps_hadoop.processes.kernel_density.KernelDensity</ns1:Identifier>
        <ows:Title xmlns:wps="http://www.opengis.net/wps/1.0.0" xmlns:ows="http://www.opengis.net/ows/1.1" xmlns:xlink="http://www.w3.org/1999/xlink">Species Occurrences Kernel Density</ows:Title>
    </ns:Process>
    <ns:Status creationTime="2013-04-24T15:14:41.949+02:00">
        <ns:ProcessSucceeded>The service succesfully processed the request.</ns:ProcessSucceeded>
    </ns:Status>
    <ns:ProcessOutputs>
        <ns:Output>
            <ns1:Identifier xmlns:ns1="http://www.opengis.net/ows/1.1">result</ns1:Identifier>
            <ows:Title xmlns:wps="http://www.opengis.net/wps/1.0.0" xmlns:ows="http://www.opengis.net/ows/1.1" xmlns:xlink="http://www.w3.org/1999/xlink">result</ows:Title>
            <ns:Data>
                <ns:ComplexData mimeType="application/xml">
                    <streamingOutput>
                        <algorithmName>kernelDensity</algorithmName>
                        <jobId>b5f1caa3-125b-4f0c-81a9-30dae0bb2873</jobId>
                        <executionResult>
                            <inputData>
                                <url>http://wps01.i-marine.d4science.org:80/wps/store/b5f1caa3-125b-4f0c-81a9-30dae0bb2873/output/files/exec0/inputData.txt</url>
                            </inputData>
                            <outputData>
                                <url>http://wps01.i-marine.d4science.org:80/wps/store/b5f1caa3-125b-4f0c-81a9-30dae0bb2873/output/files/exec0/occ.csv</url>
                                <url>http://wps01.i-marine.d4science.org:80/wps/store/b5f1caa3-125b-4f0c-81a9-30dae0bb2873/output/files/exec0/output.tgz</url>
                            </outputData>
                        </executionResult>
                        <executionResult>
                            <inputData>
                                <url>http://wps01.i-marine.d4science.org:80/wps/store/b5f1caa3-125b-4f0c-81a9-30dae0bb2873/output/files/exec1/inputData.txt</url>
                            </inputData>
                            <outputData>
                                <url>http://wps01.i-marine.d4science.org:80/wps/store/b5f1caa3-125b-4f0c-81a9-30dae0bb2873/output/files/exec1/occ.csv</url>
                                <url>http://wps01.i-marine.d4science.org:80/wps/store/b5f1caa3-125b-4f0c-81a9-30dae0bb2873/output/files/exec1/output.tgz</url>
                            </outputData>
                        </executionResult>
                        <executionResult>
                            <inputData>
                                <url>http://wps01.i-marine.d4science.org:80/wps/store/b5f1caa3-125b-4f0c-81a9-30dae0bb2873/output/files/exec2/inputData.txt</url>
                            </inputData>
                            <outputData>
                                <url>http://wps01.i-marine.d4science.org:80/wps/store/b5f1caa3-125b-4f0c-81a9-30dae0bb2873/output/files/exec2/occ.csv</url>
                                <url>http://wps01.i-marine.d4science.org:80/wps/store/b5f1caa3-125b-4f0c-81a9-30dae0bb2873/output/files/exec2/output.tgz</url>
                            </outputData>
                        </executionResult>
                    </streamingOutput>
                </ns:ComplexData>
            </ns:Data>
        </ns:Output>
    </ns:ProcessOutputs>
</ns:ExecuteResponse>


Input data contents (for the three executions one for each species):

species="Architeuthis", fromDate=1990-01-01T00:00Z, toDate=2005-01-01T00:00Z, percentages=25 50 75 90

species="Amphiprion percula", fromDate=1990-01-01T00:00Z, toDate=2005-01-01T00:00Z, percentages=25 50 75 90

species="Thunnus atlanticus", fromDate=1990-01-01T00:00Z, toDate=2005-01-01T00:00Z, percentages=25 50 75 90