GFeed

From Gcube Wiki
Jump to: navigation, search

Part of the , this service focuses on coordinating data retrieval from multiple sources and transformation/publication towards multiple destination.

This document outlines the service design rationale, key features, and high-level architecture as well as the options for their deployment.

Overview

gFeed is an extendable service which manages :

  • Data Retrieval from multiple sources
  • Data Transformation and publication towards multiple destinations

Key Features

The subsystem provides for:

Publication towards multiple sources
depending on available implementations, data collected can be published to multiple destination in the same execution
Data Retrieval from multiple sources
depending on available implementations, data can be collected from multiple sources in the same execution
extendable behaviour
Business logic can be extended by deploying multiple plugins both for data retrieval and data publication
re-usability orientation
The subsystem is conceived to promote the reuse of its facilities by separating data retrieval and publication implementations

Design

Philosophy

The system is designed to maximize modularity in order to isolate implementations.

Data Retrieval and transformation is managed by collector implementations, and a common framework is provided in order to maximize code reusal. Each implementation declares which destination can support, delegating to the collector the transformation logic, which heavily depends on the data represented.

Data Publication is managed by controller implementation, and a common framework is provided in order to maximize code reusal. Each implementation deals with a particular destination, dealing with the logic needed to interact with a specific service.

gFeed itself takes care of orchestrating triggered executions, offering monitoring mechanisms through HTTP interfaces.

Architecture

GFeed-Service

Deployment

gFeed service is distributed as a WAR, and depends on SmartGears logic for common business logic (authentication, accounting, etc) so it needs to be deployed on a SmartGears node.

Plugins implementation are looked for at run time in the service classpath, so they need to be deployed along with the service, providing also their required libraries.

A typical way to distribute plugins is as "uber-jar", which can be deployed as expected by the container.

Use Cases

  • Publish DataMiner algorithms towards multiple catalogues at the same time (i.e. CKAN, GeoNetwork..)
  • Gather information from multiple sources in order to populate one catalogue (i.e. IS information and GeoNetwork information towards CKAN)
  • Bulk update of metadata from multiple sources into multiple destinations