Occurrence Data Reconciliation

From Gcube Wiki
Jump to: navigation, search

A service for performing assessment and harmonization on occurrence points of species. The aim is to provide users with an interface and methods for assessing if occurrence points are repeated, anomalous or for performing processing and aggregation operations on such data.

This document outlines the design rationale, key features, and high-level architecture, as well as the deployment context.

Overview

The goal of this service is to offer a single entry point, in a certain scope, for processing, assessing and harmonizing occurrence points belonging to species observations. Data can come from the Species Discovery Service or they could be uploaded from a user by means of a web interface.

The service is able to interface to other infrastructural services in order to expand the number of functionalities and applications to the data under analysis.

Key features

  • Occurrence Points Enrichment
  • Occurrence Points visualization, aggregation and transformation

Design

Philosophy

This represents an endpoint for users who want to process species observation in order to explore their coherence and to extract some hidden properties from collected data coming from difference sources. This is meant as a complement to other services for species and occurrence points analysis.

Architecture

The subsystem comprises the following components:

  • Inputs Managers: a set of internal processors which manage the variety of inputs that could come from users or from other services;
  • Occurrence Point Processors: a set of internal objects which can invoke external systems in order to process data or extract hidden properties from them. These include Clustering, Anomaly Points Detection etc.;
  • Occurrence Points Enrichment: a connector to another d4Science service (the Occurrence Data Enrichment Service) dealing with the enrichment of occurrence points associated information, which is able to add indications about the chemical and physical characteristics of the oceans and earth;
  • Occurrence Points Operations: a connector to another d4Science interface which is able to operate on tabular data, by performing visualization, aggregation and transformations.
  • Processing Orchestrator: an internal process which manages the interaction and the usage of the other components. It accepts and dispatches requests coming from outside the service.

A diagram of the relationships between these components is reported in the following figure:

Occurrence Points Reconciliation Service, internal architecture

Deployment

All the components of the service must be deployed together in a single node. This subsystem can be replicated on multiple hosts and scopes, this does not guarantee a performance improvement because it is a management system for a single input dataset.

Small deployment

The deployment follows the following schema as it needs the presence of other complementary services.

Occurrence Points Reconciliation Service, deployment schema

Use Cases

Well suited Use Cases

The subsystem is particularly suited when experiment have to be performed on occurrence points referring to a certain species or family. The set of operations which can be applied, even lying on state-of-the-art and general purpose algorithms, have been studied and developed for managing such kind of information.

Subsystems

Occurrence Data Reconciliation Service depends on the following subsystems, where each subsystem specializes along the structure or the semantics of the data: