CMEMS Dataset Importer

From Gcube Wiki
Jump to: navigation, search

Overview

D4Science facilitates the access to CMEMS[1] data with a Dataset Importer facility to allow users to browse and select datasets, configure the area of interest, specify a basic post-processing action (e.g. average) and have data published and regularly updated in the D4Science THREDDS Data Server and indexed in GeoNetwork. Products can then made available in all interested VREs. The adopted approach simplifies the access to CMEMS metadata and datasets, overcoming some of the limitations of the Motu server and client and provides users with an easy-to-use command-line and web interface to manage synchronized datasets.

Key Features

  • Support one-time and scheduled CMEMS Dataset import tasks;
  • Web and Command-line interfaces to search, browse and management of import tasks;
  • Full Motu[2] API[3] implementation (either XML and HTML). HTML API are used where no equivalent XML operations are available;
  • Support for CAS authentication;
  • Support for large dataset downloads (i.e. > 1GB). The request is split in a number of smaller requests and finally merged in a single output;
  • Parallel request submission and dataset download, within the limits enforced by the server;
  • Robust parameter management. Default values are set in the client to avoid server errors;

Design

The core of the CMEMS Importer is realized as a plugin for the SmartExecutor, offering facilities to execute tasks in a scheduled and reliable way, and monitor their progress and status. The following picture shows the overall architecture of the importer.

Cmems-importer-architecture.png

The CMEMS Importer consists of the following components:

  • cmems-importer: is a graphical user interface (GUI) with the same capabilities offered by the CLI above, integrated in the gCube Portal.
  • cmems-importer-service: is a SmartGear service...
  • cmems-client: is responsible for the interaction with CMEMS services. It provides:
    • cmems-search including facilities to browse and search the CMEMS catalogue, as available online[4];
    • motu-client a complete Java implementation of a Motu client, exposing most-useful REST and HTML APIs[5]; the Motu client overcomes the limit of 1GB maximum download size enforced by most Motu servers by automatically splitting the request in small chunks and merging them back to a single dataset.
  • cmems-importer-client: is a client library to easily interact with the SmartExecutor to schedule, start, stop and monitor the execution of an import task, as well as to list the tasks currently running/scheduled. It also includes a command-line interface (CLI) to perform all the relevant actions (search and browse datasets, import task management).
  • cmems-importer-se-plugin: is the component of the Importer in charge of downloading the relevant (part of the) dataset from the various CMEMS servers, uploading it to the gCube THREDDS Server and keeping it updated. It’s implemented as a plugin for the SmartExecutor, the import/update action can be either executed once or on a periodic basis, as requested by the user.
  • dataset-importer-common: a set of utility classes and methods and clients for other gCube services common to the other CMEMS Importer components.

The CMEMS Importer interact with the following gCube/external services:

  • CMEMS online catalogue: holding metadata about all datasets along with references to the servers delivering them with various protocols (Subsetter, WMS, DGF, FTP);
  • Motu servers: a number of servers, operated by different research institutions, providing authenticated access to CMEMS datasets;
  • CAS authentication server: a single Identity Provider performing user authentication on behalf of all services in the CMEMS environment.
  • Data Tranfer service on the gCube THREDDS Data Server: is the gCube web server holding metadata and data access to all the CMEMS imported datasets.
  • SmartExecutor: manages the scheduled execution of all import tasks.

Deployment

The CMEMS Dataset Importer is delivered as a set of three deployable artefacts:

  • The CMEMS Importer (embedding the cmemes-importer) is deployed as a portlet in the gCube Infrastructure Gateway.
  • The CMEMS Importer Service (embedding the cmems-importer-client and dataset-importer-common) can be deployed on any host, also outside the gCube infrastructure, equipped with an adequate authorization token.
  • The SmartExecutor plugin (embedding the cmems-importer-se-plugin, cmems-client and dataset-importer-common) is deployed on a SmartExecutor node in the gCube infrastructure.

References

  1. http://marine.copernicus.eu
  2. https://github.com/clstoulouse/motu
  3. https://github.com/clstoulouse/motu#ClientsAPI
  4. http://marine.copernicus.eu/services-portfolio/access-to-products
  5. https://github.com/clstoulouse/motu#ClientsAPI