Occurrence Data Enrichment Service

From Gcube Wiki
Revision as of 11:03, 2 July 2012 by Gianpaolo.coro (Talk | contribs) (Key features)

Jump to: navigation, search

A service for performing the enrichment of occurrence points of species with additional information, e.g. environmental parameters characterising the points. The aim is to provide users with an interface for searching among the available environmental information that can be attached to the occurrence points under analysis.

This document outlines the design rationale, key features, and high-level architecture, as well as the options deployment.

Overview

The goal of this service is to offer a single entry point for enriching information associated to the coordinates corresponding to some occurrence points set. Data can come from the Species Discovery Service, from the Occurrence Data Reconciliation Service or they could be uploaded from a user by means of a web interface.

The service is able to interface to other infrastructural services in order to expand the number of functionalities and applications to the data under analysis.

The environmental information will be supplied by the Environmental Service of d4Science along with the list of the available information resident in the infrastructure.

Key features

    • Merge, Subtraction and Intersection operations
    • Points Clustering
    • Anomaly Point Detection

Design

Philosophy

This represents an endpoint for users who want to add some environmental information to coordinates associated to occurrence points. It is meant as a complement to other services for species and occurrence points analysis.

Architecture

The subsystem comprises the following components:

  • Inputs Managers: a set of internal processors which manage the variety of inputs that could come from users or from other services. Data can come from the Occurrence Data Reconciliation;
  • Occurrence Points Sets Operations: a set of internal objects which can invoke external systems in order to process data sets. Merge, Subtraction and Intersection operations can be invoked by interfacing to the Statistical Manager;
  • Occurrence Points Enrichment: a connector to the Environmental Service for (i) retrieving discoverable information (ii) retrieving environmental data yet present in d4Science (iii) produce data by interpolation or kriging if necessary;
  • Processing Orchestrator: an internal process which manages the interaction and the usage of the other components. It accepts and dispatches requests coming from outside the service.

A diagram of the relationships between these components is reported in the following figure:

Occurrence Points Enrichment Service, internal architecture

Deployment

All the components of the service must be deployed together in a single node. This subsystem can be replicated on multiple hosts and scopes, this does not guarantee a performance improvement because this is a management system for a single input dataset.

Small deployment

The deployment follows the following schema as it needs the presence of other complementary services.

Occurrence Points Enrichment Service, deployment schema

Use Cases

Well suited Use Cases

The subsystem is particularly suited when users want to investigate marine properties of the places where species live. This helps in understanding the characteristics of the places they prefer. The advantage to have environmental information discovered by an external service (e.g. the Environmental Service) can boost the investigation of species habitat, which normally requires a big amount of time to scholars.

Subsystems

Occurrence Data Enrichment Service depends on the following subsystems, where each specializes along the structure or the semantics of the data: