Difference between revisions of "SDMX Data Source"

From Gcube Wiki
Jump to: navigation, search
(High level Architecture and deployment)
(Commands/Data flow)
Line 20: Line 20:
 
This deployment model enables the Data Source to use the (single) container token associated to the (single) VRE to perform the required queries on the Information System and to get exported (so, for what has been mentioned before, ''pre-shared'') data from the Tabular Data Management Service.
 
This deployment model enables the Data Source to use the (single) container token associated to the (single) VRE to perform the required queries on the Information System and to get exported (so, for what has been mentioned before, ''pre-shared'') data from the Tabular Data Management Service.
  
=Commands/Data flow=
+
=Controls/Data flow=
An SDMX Client, which, in this case, is a REST client (since the Service supports only SDMX REST requsts) asks for some data. The requested data must have been already exported in SDMX format by Tabular Data Management Service. The request contains a gCube token associated with a certain VRE. If the token is valid, the Service:
+
An SDMX Client, which, in this case, is a REST client (since the Service supports only SDMX REST requsts) asks for some data. The requested data must have been already exported in SDMX format by Tabular Data Management Service. As it receives the request, the Service:
  
 
1. gets from the Information System the URL of the SDMX Registry associated with that VRE
 
1. gets from the Information System the URL of the SDMX Registry associated with that VRE
Line 32: Line 32:
  
 
5. creates a SDMX Data Document and sends the response to the Client.
 
5. creates a SDMX Data Document and sends the response to the Client.
 
  
 
=Supported versions, REST URL and examples=
 
=Supported versions, REST URL and examples=

Revision as of 13:08, 21 December 2017

Introduction

GCube SDMX Data Source Service is a REST web service compliant with SDMX standard versions 2.0 and 2.1 and enables to export tabular data stored in D4Science Infrastructure in SDMX format. The service leverages the Tabular Data Facilities and the Information_System to export data.

High level Architecture and deployment

GCube SDMX Data Source Service is a web service deployed on Tomcat and configured as a Smart Gear application supporting anonymous access. A set of gCube services is used to support its functionalities, the following picture shows the model:


SDMX-exporter.png

The Service gets all the references from the Information System, in particular it gets the following pieces of information:

  • URL of the associated SDMX Registry
  • References of Tabular Resources and Tables
  • References of Time Dimension and Primary Measure columns.

Tabular data are obtained in real-time from Tabular Data Management Service basing on the information get from the IS and the Data Structures obtained from SDMX Registry. The SDMX Data Source Service creates an SDMX Document of the requested version and provides the client with requested data.

Technically speaking, for the Tabular Data Management Service a SDMX exporting operation on a Dataset means that the content of the exported table is shared with the whole VRE of the caller. In general a VRE enabled to request SDMX operations is associated with a single Fusion Registry and at least one Data Source. Each Data Source must be associated to a single VRE. This association is obtained at Node level: for this reason an SDMX Data Source must be deployed on a Smart Gear Node running on a single VRE.

This deployment model enables the Data Source to use the (single) container token associated to the (single) VRE to perform the required queries on the Information System and to get exported (so, for what has been mentioned before, pre-shared) data from the Tabular Data Management Service.

Controls/Data flow

An SDMX Client, which, in this case, is a REST client (since the Service supports only SDMX REST requsts) asks for some data. The requested data must have been already exported in SDMX format by Tabular Data Management Service. As it receives the request, the Service:

1. gets from the Information System the URL of the SDMX Registry associated with that VRE

2. gets from the SDMX Registry the associated Data Structure Definition

3. gets from the Information System the IDs of the Tabular Resource, Table, Time Dimension Column and Primary Measure Column associated with that Data Structure Definition

4. gets the tables from Tabular Data Management Service

5. creates a SDMX Data Document and sends the response to the Client.

Supported versions, REST URL and examples

Currently the Service supports the following SDMX versions:

  • Structure specific time series version 2.1 (Data type: application/vnd.sdmx.structurespecifictimeseriesdata+xml;version=2.1)
  • Generic time series version 2.1 (Data type: application/vnd.sdmx.generictimeseriesdata+xml;version=2.1)
  • Structure specific time series version 2.0 (Data type: application/vnd.sdmx.generictimeseriesdata+xml;version=2.0)
  • Generic time series version 2.0 (Data type: application/vnd.sdmx.generictimeseriesdata+xml;version=2.0)
  • Structure specific cross sectional data version 2.0 (Data type: application/vnd.sdmx.structurespecificdata+xml;version=2.0).

The client can ask for a certain version by including one or mode Data types on the Accept Header of the request message. Of no valid data types are in the Accept Header, the default Generic time series version 2.1 version is used. If more than one valid data type is chosen, the priority is identical than the order of the list above.

The REST URL used to get the data is (almost) compliant to SDMX standard:


<sdmx-service-base-url>/ws/data/<data-flow-agency>,<data-flow-id>,<data-flow-version>/<dimensions-filters>/?<optional-parameters>&gcube-token=<token>

The only non-standard field is gcube-token parameter, which is used by Smart Gear to authenticate the user and to define the VRE. The other fields are compliant to the standard, in particular:

  • data-flow-agency,data-flow-id,data-flow-version: only data-flow-id is mandatory, but if it is not enough to unambiguously define a data flow and error is returned. If data-flow-agency or data-flow-version are not set, the field is left blank and the comma is not used
  • dimensions-filters: this optional field is a filter on the dimensions (and not on attributes). Standard dot based notation is used, multiple filters are supported, please refer to the standard for more details
  • <optional-parameters: the current version supports startPeriod, endPeriod, firstNObservations, endNObservations, dimensionAtObservation and detail.


For more information, please refer to specific SDMX documentation.

A valid example is the following


Header:

Accept: application/vnd.sdmx.structurespecifictimeseriesdata+xml;version=2.1 </code>


URL:

GET <sdmx-service-base-url>/ws/data/BlueBridge,NEW_DS_DIVISION_dataFlow/1/?startperiod=2005&endPeriod=2011&gcube-token=<token>


This request asks for data associated to the last version of the data flow NEW_DS_DIVISION_dataFlow, maintained by BlueBridge agency. The response should contain only data whose first (and unique) dimension (according with the order defined in the SDMX Registry) is 1 and are referred to the period from 2005 to 2011.