Difference between revisions of "SDMX Data Source"

From Gcube Wiki
Jump to: navigation, search
(Introduction)
 
(10 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
=Introduction=
 
=Introduction=
GCube ''SDMX Data Source Service'' is a REST web service compliant with SDMX standard versions 2.0 and 2.1 and enables to export tabular data stored in D4Science Infrastructure in SDMX format. The service leverages the [[Tabular Data Facilities]] and the [[Information_System]] to export data.
+
GCube ''SDMX Data Source Service'' is a REST web service compliant with SDMX standard versions 2.0 and 2.1 and enables to export tabular data stored in D4Science Infrastructure in SDMX format. The service leverages the [[Tabular Data Facilities]] and the [[Information System]] to export data.
  
=High level Architecture=
+
=High level Architecture and deployment=
GCube ''SDMX Data Source Service'' is a web service deployed on Tomcat and configured as a Smart Gear application: this means that [[SmartGears#Authorization | Smart Gear Security Model]] is applyed on it (i.e. a valid token is needed). The Service leverages a set of gCube services to work, the following picture shows the model:
+
GCube ''SDMX Data Source Service'' is a web service deployed on Tomcat and configured as a Smart Gear application supporting anonymous access. A set of gCube services is used to support its functionalities, the following picture shows the model:
  
  
Line 16: Line 16:
 
Tabular data are obtained in real-time from ''Tabular Data Management Service'' basing on the information get from the IS and the Data Structures obtained from SDMX Registry. The SDMX Data Source Service creates an SDMX Document of the requested version and provides the client with requested data.
 
Tabular data are obtained in real-time from ''Tabular Data Management Service'' basing on the information get from the IS and the Data Structures obtained from SDMX Registry. The SDMX Data Source Service creates an SDMX Document of the requested version and provides the client with requested data.
  
=Commands/Data flow=
+
Technically speaking, for the Tabular Data Management Service a SDMX exporting operation on a ''Dataset'' means that the content of the exported table is shared with the whole VRE of the caller. In general a VRE enabled to request SDMX operations is associated with a single ''Fusion Registry'' and at least one Data Source. Each Data Source '''must be associated to a single VRE'''. This association is obtained at Node level: for this reason '''an SDMX Data Source must be deployed on a Smart Gear Node running on a single VRE'''.
An SDMX Client, which, in this case, is a REST client (since the Service supports only SDMX REST requsts) asks for some data. The requested data must have been already exported in SDMX format by Tabular Data Management Service. The request contains a gCube token associated with a certain VRE. If the token is valid, the Service:
+
 
 +
This deployment model enables the Data Source to use the (single) container token associated to the (single) VRE to perform the required queries on the Information System and to get exported (so, for what has been mentioned before, ''pre-shared'') data from the Tabular Data Management Service.
 +
 
 +
Concerning the application configuration (<code>gcube-app.xml</code> file), two important considerations should be taken into account:
 +
 
 +
1. the application name should be used to register the Data Source on the Information System
 +
 
 +
2. the ''request-validation'' handler should be excluded.
 +
 
 +
An example of valid <code>gcube-app.xml</code> file is the following:
 +
<code>
 +
<application mode='online'>
 +
  <name>SDMXDataSource1</name>
 +
  <group>DataPublishing</group>
 +
  <version>0.0.1-SNAPSHOT</version>
 +
  <description>SDMX Data Source linked with Tabman</description>
 +
  <local-persistence location='target' />
 +
  <exclude handlers='request-validation'>/*</exclude>
 +
</application>
 +
</code>
 +
 
 +
Tha application name is SDMXDataSource1 and must be present on the Information System, in the same VRE of the associated SDMX Registry with the following mandatory parameters:
 +
 
 +
* '''Service Endpoint''' of type ''RuntimeResource''
 +
* '''Category''' ''SDMXDataSources''
 +
* '''Interface/Endpoint''' the base url of the service, for instance <code>http://sdmx-datasource-d.dev.d4science.org/sdmxdatasource/ws/data/</code>
 +
 
 +
If the Data Source is not registered on the Information System, the SDMX Exporter module of the Tabular Data Management Service will not be able to see it and will not be able to associate any Tabular Data to it.
 +
 
 +
A '''Data Provider''' with the same name must be present on the SDMX Registry, associated with all the agencies that will be used to export data.
 +
 
 +
=Controls/Data flow=
 +
An SDMX Client, which, in this case, is a REST client (since the Service supports only SDMX REST requsts) asks for some data. The requested data must have been already exported in SDMX format by Tabular Data Management Service. As it receives the request, the Service:
  
 
1. gets from the Information System the URL of the SDMX Registry associated with that VRE
 
1. gets from the Information System the URL of the SDMX Registry associated with that VRE
Line 28: Line 60:
  
 
5. creates a SDMX Data Document and sends the response to the Client.
 
5. creates a SDMX Data Document and sends the response to the Client.
 
  
 
=Supported versions, REST URL and examples=
 
=Supported versions, REST URL and examples=
Line 44: Line 75:
  
  
<code><sdmx-service-base-url>/ws/data/<data-flow-agency>,<data-flow-id>,<data-flow-version>/<dimensions-filters>/?<optional-parameters>&gcube-token=<token></code>
+
<code><sdmx-service-base-url>/ws/data/<data-flow-agency>,<data-flow-id>,<data-flow-version>/<dimensions-filters>/?<optional-parameters></code>
  
The only non-standard field is ''gcube-token'' parameter, which is used by Smart Gear to authenticate the user and to define the VRE. The other fields are compliant to the standard, in particular:
+
All the fields are compliant to SDMX standard, in particular:
  
 
* ''data-flow-agency'',''data-flow-id'',''data-flow-version'': only ''data-flow-id'' is mandatory, but if it is not enough to unambiguously define a data flow and error is returned. If ''data-flow-agency'' or ''data-flow-version'' are not set, the field is left blank and the comma is not used
 
* ''data-flow-agency'',''data-flow-id'',''data-flow-version'': only ''data-flow-id'' is mandatory, but if it is not enough to unambiguously define a data flow and error is returned. If ''data-flow-agency'' or ''data-flow-version'' are not set, the field is left blank and the comma is not used
Line 65: Line 96:
 
'''URL:'''
 
'''URL:'''
  
<code>GET <sdmx-service-base-url>/ws/data/BlueBridge,NEW_DS_DIVISION_dataFlow/1/?startperiod=2005&endPeriod=2011&gcube-token=<token></code>
+
<code>GET <sdmx-service-base-url>/ws/data/BlueBridge,NEW_DS_DIVISION_dataFlow/1/?startperiod=2005&endPeriod=2011</code>
  
  
 
This request asks for data associated to the ''last version'' of the data flow ''NEW_DS_DIVISION_dataFlow'', maintained by ''BlueBridge'' agency. The response should contain only data whose first (and unique) dimension (according with the order defined in the SDMX Registry) is ''1'' and are referred to the period from ''2005'' to ''2011''.
 
This request asks for data associated to the ''last version'' of the data flow ''NEW_DS_DIVISION_dataFlow'', maintained by ''BlueBridge'' agency. The response should contain only data whose first (and unique) dimension (according with the order defined in the SDMX Registry) is ''1'' and are referred to the period from ''2005'' to ''2011''.
 +
 +
 +
 +
=Summary of configuration steps=
 +
 +
{| class="wikitable" style="width: 600px; height: 200px;"
 +
|+ Configuration Steps
 +
|-
 +
! Step
 +
! Component
 +
! Description
 +
|-
 +
| Define Data Source name
 +
| The Smart Gear Node containing the Service
 +
| Every Data Source in a certain VRE must have a unique name
 +
|-
 +
| Define a single token and a single VRE
 +
| The Smart Gear Node containing the Service
 +
| Every Data Source must be associated to a single VRE
 +
|-
 +
| Exclude the request validation handler
 +
| The Smart Gear Node containing the Service
 +
| The Data Source is a view only service for data that have been exported, so security must be disabled
 +
|-
 +
| Define the Service Endpoint pointing to the Data Source Service
 +
| Information System
 +
| The Name of the endpoint must be the Data Source name defined above
 +
|-
 +
| Define a Data Provider for the Data Source Service
 +
| SDMX Registry
 +
| The Name of the Data Provider must be the Data Source name defined above
 +
|}

Latest revision as of 16:51, 26 January 2018

Introduction

GCube SDMX Data Source Service is a REST web service compliant with SDMX standard versions 2.0 and 2.1 and enables to export tabular data stored in D4Science Infrastructure in SDMX format. The service leverages the Tabular Data Facilities and the Information System to export data.

High level Architecture and deployment

GCube SDMX Data Source Service is a web service deployed on Tomcat and configured as a Smart Gear application supporting anonymous access. A set of gCube services is used to support its functionalities, the following picture shows the model:


SDMX-exporter.png

The Service gets all the references from the Information System, in particular it gets the following pieces of information:

  • URL of the associated SDMX Registry
  • References of Tabular Resources and Tables
  • References of Time Dimension and Primary Measure columns.

Tabular data are obtained in real-time from Tabular Data Management Service basing on the information get from the IS and the Data Structures obtained from SDMX Registry. The SDMX Data Source Service creates an SDMX Document of the requested version and provides the client with requested data.

Technically speaking, for the Tabular Data Management Service a SDMX exporting operation on a Dataset means that the content of the exported table is shared with the whole VRE of the caller. In general a VRE enabled to request SDMX operations is associated with a single Fusion Registry and at least one Data Source. Each Data Source must be associated to a single VRE. This association is obtained at Node level: for this reason an SDMX Data Source must be deployed on a Smart Gear Node running on a single VRE.

This deployment model enables the Data Source to use the (single) container token associated to the (single) VRE to perform the required queries on the Information System and to get exported (so, for what has been mentioned before, pre-shared) data from the Tabular Data Management Service.

Concerning the application configuration (gcube-app.xml file), two important considerations should be taken into account:

1. the application name should be used to register the Data Source on the Information System

2. the request-validation handler should be excluded.

An example of valid gcube-app.xml file is the following:

<application mode='online'>
  <name>SDMXDataSource1</name>
  <group>DataPublishing</group>
  <version>0.0.1-SNAPSHOT</version>
  <description>SDMX Data Source linked with Tabman</description>
  <local-persistence location='target' />
  <exclude handlers='request-validation'>/*</exclude>
</application>

Tha application name is SDMXDataSource1 and must be present on the Information System, in the same VRE of the associated SDMX Registry with the following mandatory parameters:

If the Data Source is not registered on the Information System, the SDMX Exporter module of the Tabular Data Management Service will not be able to see it and will not be able to associate any Tabular Data to it.

A Data Provider with the same name must be present on the SDMX Registry, associated with all the agencies that will be used to export data.

Controls/Data flow

An SDMX Client, which, in this case, is a REST client (since the Service supports only SDMX REST requsts) asks for some data. The requested data must have been already exported in SDMX format by Tabular Data Management Service. As it receives the request, the Service:

1. gets from the Information System the URL of the SDMX Registry associated with that VRE

2. gets from the SDMX Registry the associated Data Structure Definition

3. gets from the Information System the IDs of the Tabular Resource, Table, Time Dimension Column and Primary Measure Column associated with that Data Structure Definition

4. gets the tables from Tabular Data Management Service

5. creates a SDMX Data Document and sends the response to the Client.

Supported versions, REST URL and examples

Currently the Service supports the following SDMX versions:

  • Structure specific time series version 2.1 (Data type: application/vnd.sdmx.structurespecifictimeseriesdata+xml;version=2.1)
  • Generic time series version 2.1 (Data type: application/vnd.sdmx.generictimeseriesdata+xml;version=2.1)
  • Structure specific time series version 2.0 (Data type: application/vnd.sdmx.generictimeseriesdata+xml;version=2.0)
  • Generic time series version 2.0 (Data type: application/vnd.sdmx.generictimeseriesdata+xml;version=2.0)
  • Structure specific cross sectional data version 2.0 (Data type: application/vnd.sdmx.structurespecificdata+xml;version=2.0).

The client can ask for a certain version by including one or mode Data types on the Accept Header of the request message. Of no valid data types are in the Accept Header, the default Generic time series version 2.1 version is used. If more than one valid data type is chosen, the priority is identical than the order of the list above.

The REST URL used to get the data is (almost) compliant to SDMX standard:


<sdmx-service-base-url>/ws/data/<data-flow-agency>,<data-flow-id>,<data-flow-version>/<dimensions-filters>/?<optional-parameters>

All the fields are compliant to SDMX standard, in particular:

  • data-flow-agency,data-flow-id,data-flow-version: only data-flow-id is mandatory, but if it is not enough to unambiguously define a data flow and error is returned. If data-flow-agency or data-flow-version are not set, the field is left blank and the comma is not used
  • dimensions-filters: this optional field is a filter on the dimensions (and not on attributes). Standard dot based notation is used, multiple filters are supported, please refer to the standard for more details
  • <optional-parameters: the current version supports startPeriod, endPeriod, firstNObservations, endNObservations, dimensionAtObservation and detail.


For more information, please refer to specific SDMX documentation.

A valid example is the following


Header:

Accept: application/vnd.sdmx.structurespecifictimeseriesdata+xml;version=2.1 </code>


URL:

GET <sdmx-service-base-url>/ws/data/BlueBridge,NEW_DS_DIVISION_dataFlow/1/?startperiod=2005&endPeriod=2011


This request asks for data associated to the last version of the data flow NEW_DS_DIVISION_dataFlow, maintained by BlueBridge agency. The response should contain only data whose first (and unique) dimension (according with the order defined in the SDMX Registry) is 1 and are referred to the period from 2005 to 2011.


Summary of configuration steps

Configuration Steps
Step Component Description
Define Data Source name The Smart Gear Node containing the Service Every Data Source in a certain VRE must have a unique name
Define a single token and a single VRE The Smart Gear Node containing the Service Every Data Source must be associated to a single VRE
Exclude the request validation handler The Smart Gear Node containing the Service The Data Source is a view only service for data that have been exported, so security must be disabled
Define the Service Endpoint pointing to the Data Source Service Information System The Name of the endpoint must be the Data Source name defined above
Define a Data Provider for the Data Source Service SDMX Registry The Name of the Data Provider must be the Data Source name defined above