Difference between revisions of "Occurrence Data Reconciliation"
(Created page with '{| align="right" ||__TOC__ |} A service for performing assessment and harmonization on occurrence points of species. The aim is to provide users with an interface and methods fo…') |
m |
||
(9 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
+ | [[Category:gCube Features]] | ||
{| align="right" | {| align="right" | ||
||__TOC__ | ||__TOC__ | ||
|} | |} | ||
− | A service for performing assessment and harmonization on occurrence points of species. The aim is to provide users with an interface and methods for assessing if occurrence points are repeated, anomalous or for performing | + | A service for performing assessment and harmonization on occurrence points of species. |
− | This document outlines the design rationale, key features, and high-level architecture, as well as the | + | The aim is to provide users with an interface and methods for assessing if occurrence points are repeated, anomalous or for performing processing and aggregation operations on such data. |
+ | |||
+ | This document outlines the design rationale, key features, and high-level architecture, as well as the deployment context. | ||
== Overview == | == Overview == | ||
− | The goal of this service is to offer a single entry for processing, assessing and harmonizing occurrence points belonging to species observations. Data can come from the Species Discovery Service or they could be uploaded from a user by means of a web interface. | + | The goal of this service is to offer a single entry point, in a certain scope, for processing, assessing and harmonizing occurrence points belonging to species observations. |
+ | Data can come from the [[Biodiversity Access | Species Discovery Service]] or they could be uploaded from a user by means of a web interface. | ||
The service is able to interface to other infrastructural services in order to expand the number of functionalities and applications to the data under analysis. | The service is able to interface to other infrastructural services in order to expand the number of functionalities and applications to the data under analysis. | ||
+ | |||
+ | === Key features === | ||
+ | |||
+ | * Occurrence Points Enrichment | ||
+ | * Occurrence Points visualization, aggregation and transformation | ||
== Design == | == Design == | ||
Line 16: | Line 25: | ||
=== Philosophy === | === Philosophy === | ||
− | This represents an endpoint for users who want to process species observation in order to explore their coherence and to extract some hidden properties from | + | This represents an endpoint for users who want to process species observation in order to explore their coherence and to extract some hidden properties from collected data coming from difference sources. |
+ | This is meant as a complement to other services for species and occurrence points analysis. | ||
=== Architecture === | === Architecture === | ||
Line 25: | Line 35: | ||
* '''Occurrence Point Processors''': a set of internal objects which can invoke external systems in order to process data or extract hidden properties from them. These include Clustering, Anomaly Points Detection etc.; | * '''Occurrence Point Processors''': a set of internal objects which can invoke external systems in order to process data or extract hidden properties from them. These include Clustering, Anomaly Points Detection etc.; | ||
− | * '''Occurrence Points Enrichment''': a connector to another d4Science service dealing with the enrichment of occurrence points | + | * '''Occurrence Points Enrichment''': a connector to another d4Science service (the [[Occurrence Data Enrichment Service]]) dealing with the enrichment of occurrence points associated information, which is able to add indications about the chemical and physical characteristics of the oceans and earth; |
* '''Occurrence Points Operations''': a connector to another d4Science interface which is able to operate on tabular data, by performing visualization, aggregation and transformations. | * '''Occurrence Points Operations''': a connector to another d4Science interface which is able to operate on tabular data, by performing visualization, aggregation and transformations. | ||
Line 33: | Line 43: | ||
A diagram of the relationships between these components is reported in the following figure: | A diagram of the relationships between these components is reported in the following figure: | ||
− | [[Image:occpointsreco.png|frame|center|Occurrence Points Reconciliation Service, internal | + | [[Image:occpointsreco.png|frame|center|Occurrence Points Reconciliation Service, internal architecture]] |
== Deployment == | == Deployment == | ||
− | All the components of the service must be deployed together in a single node. This subsystem can be replicated | + | All the components of the service must be deployed together in a single node. This subsystem can be replicated on multiple hosts and scopes, this does not guarantee a performance improvement because it is a management system for a single input dataset. |
=== Small deployment === | === Small deployment === | ||
Line 42: | Line 52: | ||
The deployment follows the following schema as it needs the presence of other complementary services. | The deployment follows the following schema as it needs the presence of other complementary services. | ||
− | [[Image:occpointsarchitecture.png|frame|center|Occurrence Points Reconciliation Service, | + | [[Image:occpointsarchitecture.png|frame|center|Occurrence Points Reconciliation Service, deployment schema]] |
== Use Cases == | == Use Cases == | ||
Line 48: | Line 58: | ||
=== Well suited Use Cases === | === Well suited Use Cases === | ||
− | The subsystem is particularly suited when experiment have to be performed on occurrence points referring to a certain species or family. The set of operations which can be applied, even lying on state-of-the-art algorithms | + | The subsystem is particularly suited when experiment have to be performed on occurrence points referring to a certain species or family. The set of operations which can be applied, even lying on state-of-the-art and general purpose algorithms, have been studied and developed for managing such kind of information. |
== Subsystems == | == Subsystems == | ||
− | Data | + | Occurrence Data Reconciliation Service depends on the following subsystems, where each subsystem specializes along the structure or the semantics of the data: |
*[[Statistical_Manager | Statistical Manager]] | *[[Statistical_Manager | Statistical Manager]] | ||
*[[Occurrence_Data_Enrichment_Service | Occurrence Data Enrichment Service]] | *[[Occurrence_Data_Enrichment_Service | Occurrence Data Enrichment Service]] | ||
*[[Tabular_Data_Manager | Tabular Data Manager]] | *[[Tabular_Data_Manager | Tabular Data Manager]] |
Latest revision as of 09:26, 24 July 2013
A service for performing assessment and harmonization on occurrence points of species. The aim is to provide users with an interface and methods for assessing if occurrence points are repeated, anomalous or for performing processing and aggregation operations on such data.
This document outlines the design rationale, key features, and high-level architecture, as well as the deployment context.
Overview
The goal of this service is to offer a single entry point, in a certain scope, for processing, assessing and harmonizing occurrence points belonging to species observations. Data can come from the Species Discovery Service or they could be uploaded from a user by means of a web interface.
The service is able to interface to other infrastructural services in order to expand the number of functionalities and applications to the data under analysis.
Key features
- Occurrence Points Enrichment
- Occurrence Points visualization, aggregation and transformation
Design
Philosophy
This represents an endpoint for users who want to process species observation in order to explore their coherence and to extract some hidden properties from collected data coming from difference sources. This is meant as a complement to other services for species and occurrence points analysis.
Architecture
The subsystem comprises the following components:
- Inputs Managers: a set of internal processors which manage the variety of inputs that could come from users or from other services;
- Occurrence Point Processors: a set of internal objects which can invoke external systems in order to process data or extract hidden properties from them. These include Clustering, Anomaly Points Detection etc.;
- Occurrence Points Enrichment: a connector to another d4Science service (the Occurrence Data Enrichment Service) dealing with the enrichment of occurrence points associated information, which is able to add indications about the chemical and physical characteristics of the oceans and earth;
- Occurrence Points Operations: a connector to another d4Science interface which is able to operate on tabular data, by performing visualization, aggregation and transformations.
- Processing Orchestrator: an internal process which manages the interaction and the usage of the other components. It accepts and dispatches requests coming from outside the service.
A diagram of the relationships between these components is reported in the following figure:
Deployment
All the components of the service must be deployed together in a single node. This subsystem can be replicated on multiple hosts and scopes, this does not guarantee a performance improvement because it is a management system for a single input dataset.
Small deployment
The deployment follows the following schema as it needs the presence of other complementary services.
Use Cases
Well suited Use Cases
The subsystem is particularly suited when experiment have to be performed on occurrence points referring to a certain species or family. The set of operations which can be applied, even lying on state-of-the-art and general purpose algorithms, have been studied and developed for managing such kind of information.
Subsystems
Occurrence Data Reconciliation Service depends on the following subsystems, where each subsystem specializes along the structure or the semantics of the data: