SDMX Data Source
Contents
Introduction
GCube SDMX Data Source Service is a REST web service compliant with SDMX standard versions 2.0 and 2.1 and enables to export tabular data stored in D4Science Infrastructure in SDMX format. The service leverages the Tabular Data Facilities and the Information_System to export data.
High level Architecture and deployment
GCube SDMX Data Source Service is a web service deployed on Tomcat and configured as a Smart Gear application supporting anonymous access. A set of gCube services is used to support its functionalities, the following picture shows the model:
The Service gets all the references from the Information System, in particular it gets the following pieces of information:
- URL of the associated SDMX Registry
- References of Tabular Resources and Tables
- References of Time Dimension and Primary Measure columns.
Tabular data are obtained in real-time from Tabular Data Management Service basing on the information get from the IS and the Data Structures obtained from SDMX Registry. The SDMX Data Source Service creates an SDMX Document of the requested version and provides the client with requested data.
Technically speaking, for the Tabular Data Management Service a SDMX exporting operation on a Dataset means that the content of the exported table is shared with the whole VRE of the caller. In general a VRE enabled to request SDMX operations is associated with a single Fusion Registry and at least one Data Source. Each Data Source must be associated to a single VRE. This association is obtained at Node level: for this reason an SDMX Data Source must be deployed on a Smart Gear Node running on a single VRE.
This deployment model enables the Data Source to use the (single) container token associated to the (single) VRE to perform the required queries on the Information System and to get exported (so, for what has been mentioned before, pre-shared) data from the Tabular Data Management Service.
Concerning the application configuration (gcube-app.xml
file), two important considerations should be taken into account:
1. the application name should be used to register the Data Source on the Information System 2. the request-validation handler should be excluded.
An example of valid gcube-app.xml
file is the following:
<application mode='online'> <name>SDMXDataSource1</name> <group>DataPublishing</group> <version>0.0.1-SNAPSHOT</version> <description>SDMX Data Source linked with Tabman</description> <local-persistence location='target' /> <exclude handlers='request-validation'>/*</exclude> </application>
Tha application name is SDMXDataSource1 and must be present on the Information System, in the same VRE of the associated SDMX Registry with the following mandatory parameters:
- Service Endpoint of type RuntimeResource
- Category SDMXDataSources
- Interface/Endpoint the base url of the service, for instance
http://sdmx-datasource-d.dev.d4science.org/sdmxdatasource/ws/data/
If the Data Source is not registered on the Information System, the SDMX Exporter module of the Tabular Data Management Service will not be able to see it and will not be able to associate any Tabular Data to it.
Controls/Data flow
An SDMX Client, which, in this case, is a REST client (since the Service supports only SDMX REST requsts) asks for some data. The requested data must have been already exported in SDMX format by Tabular Data Management Service. As it receives the request, the Service:
1. gets from the Information System the URL of the SDMX Registry associated with that VRE
2. gets from the SDMX Registry the associated Data Structure Definition
3. gets from the Information System the IDs of the Tabular Resource, Table, Time Dimension Column and Primary Measure Column associated with that Data Structure Definition
4. gets the tables from Tabular Data Management Service
5. creates a SDMX Data Document and sends the response to the Client.
Supported versions, REST URL and examples
Currently the Service supports the following SDMX versions:
- Structure specific time series version 2.1 (Data type: application/vnd.sdmx.structurespecifictimeseriesdata+xml;version=2.1)
- Generic time series version 2.1 (Data type: application/vnd.sdmx.generictimeseriesdata+xml;version=2.1)
- Structure specific time series version 2.0 (Data type: application/vnd.sdmx.generictimeseriesdata+xml;version=2.0)
- Generic time series version 2.0 (Data type: application/vnd.sdmx.generictimeseriesdata+xml;version=2.0)
- Structure specific cross sectional data version 2.0 (Data type: application/vnd.sdmx.structurespecificdata+xml;version=2.0).
The client can ask for a certain version by including one or mode Data types on the Accept Header of the request message. Of no valid data types are in the Accept Header, the default Generic time series version 2.1 version is used. If more than one valid data type is chosen, the priority is identical than the order of the list above.
The REST URL used to get the data is (almost) compliant to SDMX standard:
<sdmx-service-base-url>/ws/data/<data-flow-agency>,<data-flow-id>,<data-flow-version>/<dimensions-filters>/?<optional-parameters>
All the fields are compliant to SDMX standard, in particular:
- data-flow-agency,data-flow-id,data-flow-version: only data-flow-id is mandatory, but if it is not enough to unambiguously define a data flow and error is returned. If data-flow-agency or data-flow-version are not set, the field is left blank and the comma is not used
- dimensions-filters: this optional field is a filter on the dimensions (and not on attributes). Standard dot based notation is used, multiple filters are supported, please refer to the standard for more details
- <optional-parameters: the current version supports startPeriod, endPeriod, firstNObservations, endNObservations, dimensionAtObservation and detail.
For more information, please refer to specific SDMX documentation.
A valid example is the following
Header:
Accept: application/vnd.sdmx.structurespecifictimeseriesdata+xml;version=2.1 </code>
URL:
GET <sdmx-service-base-url>/ws/data/BlueBridge,NEW_DS_DIVISION_dataFlow/1/?startperiod=2005&endPeriod=2011
This request asks for data associated to the last version of the data flow NEW_DS_DIVISION_dataFlow, maintained by BlueBridge agency. The response should contain only data whose first (and unique) dimension (according with the order defined in the SDMX Registry) is 1 and are referred to the period from 2005 to 2011.