Catalogue restful service

From Gcube Wiki
Jump to: navigation, search

Catalogue Web Service has been DISMISSED. Please consider to use GCat Service instead.

Catalogue Web Service

Starting from gCube 4.5 a new RESTFul Web Service has been created to let external services/users interact with the D4Science's catalogues. It exploits the underneath CKAN APIs [1], but at the same time allows to perform operations on the catalogue relying on the security gCube Security Token (see below).

You are suggested to read the main GCube_Resource_Catalogue#gCube_Data_Catalogue_Metadata catalogue wiki page before continuing.

Retrieve web service endpoint

The service endpoint url of the catalogue service depends on the context you are interested in publishing. You could either access to the Infrastructure Monitor of the Infrastructure and looking for a GCore EndPoint having as name Catalogue-WS and Service Class Data-Catalogue, or perform the same look up operation programmatically with the IC-Client library.

For the development environment, it's value is:

http://catalogue-ws-d-d4s.d4science.org/catalogue-ws/rest

For the pre-production environment, it is:

https://catalogue-ws-t.pre.d4science.org/catalogue-ws/rest

For the production environment, instead, it is:

http://catalogue-ws.d4science.org/catalogue-ws/rest

Item vs Package vs Dataset

The term used by CKAN to refer to what can be published into the catalogue is 'dataset' or 'package'. We decided to use the term 'item' since it is a more generic term. However, keep in mind that the three terms refer to the same thing.

Supported operations

The APIs allow to interact with the catalogue and perform the following operations on it. Please note that they return and accept a JSON object. Be sure to set the following headers in your http requests

Content-type : application/json
Accept: application/json
gcube-token: Token

The gcube-token can be set into the http header of the request or as query parameter. In the following, we are assuming it has been set in the header.

Operation result and HTTP Code

Every response is a JSON object that looks like this. Please notice that methods will always return HTTP Status Code 200, so you should always check the "message"/"success" fields within the response object.

{
 
    "success": true,
    "message": ...,
    "result": {....}
 
}

The field message reports an error message, if any.

Licenses

Retrieve the list of licenses (GET)

Return the list of licenses available for datasets on the site.

SERVICE_ENDPOINT/api/licenses/list/

Organizations

Show organization (GET)

Return the details of a organization. Parameters:

  • id (boolean) – the id or name of the organization
  • include_datasets (boolean) – include a truncated list of the org’s items (optional, default: False)
  • include_dataset_count (boolean) – include the full item count (optional, default: True)
  • include_extras – include the organization’s extra fields (optional, default: True)
  • include_users – include the organization’s users (optional, default: True)
  • include_groups – include the organization’s sub groups (optional, default: True)
  • include_tags – include the organization’s tags (optional, default: True)
  • include_followers – include the organization’s number of followers (optional, default: True)

Path

SERVICE_ENDPOINT/api/organizations/show?id=organization_name

Show organizations list (GET)

Return a list of the names of the site’s organizations. Parameters:

  • order_by (string) – the field to sort the list by, must be 'name' or 'packages' (optional, default: 'name') Deprecated use sort.
  • sort (string) – sorting of the search results. Optional. Default: “name asc” string of field name and sort-order. The allowed fields are ‘name’, ‘package_count’ and ‘title’
  • limit (int) – if given, the list of organizations will be broken into pages of at most limit organizations per page and only one page will be returned at a time (optional)
  • offset (int) – when limit is given, the offset to start returning organizations from
  • organizations (list of strings) – a list of names of the groups to return, if given only groups whose names are in this list will be returned (optional)
  • all_fields (boolean) – return group dictionaries instead of just names. Only core fields are returned - get some more using the include_* options. Returning a list of packages is too expensive, so the packages property for each group is deprecated, but there is a count of the packages in the package_count property. (optional, default: False)
  • include_dataset_count (boolean) – if all_fields, include the full items count (optional, default: True)
  • include_extras (boolean) – if all_fields, include the organization extra fields (optional, default: False)
  • include_tags (boolean) – if all_fields, include the organization tags (optional, default: False)
  • include_groups – if all_fields, include the organizations the organizations are in (optional, default: False)
  • include_users (boolean) – if all_fields, include the organization users (optional, default: False).

Path

SERVICE_ENDPOINT/api/organizations/list

Create organization (POST)

Create a new organization.

You must be authorized to create organizations.

Plugins may change the parameters of this function depending on the value of the type parameter, see the IGroupForm plugin interface. Parameters:

  • name (string) – the name of the organization, a string between 2 and 100 characters long, containing only lowercase alphanumeric characters, - and _
  • id (string) – the id of the organization (optional)
  • title (string) – the title of the organization (optional)
  • description (string) – the description of the organization (optional)
  • image_url (string) – the URL to an image to be displayed on the organization’s page (optional)
  • state (string) – the current state of the organization, e.g. 'active' or 'deleted', only active organizations show up in search results and other lists of organizations, this parameter will be ignored if you are not authorized to change the state of the *organization (optional, default: 'active')
  • approval_status (string) – (optional)
  • extras (list of dataset extra dictionaries) – the organization’s extras (optional), extras are arbitrary (key: value) metadata items that can be added to organizations, each extra dictionary should have keys 'key' (a string), 'value' (a string), and optionally 'deleted'
  • packages (list of dictionaries) – the datasets (packages) that belong to the organization, a list of dictionaries each with keys 'name' (string, the id or name of the dataset) and optionally 'title' (string, the title of the dataset)
  • users (list of dictionaries) – the users that belong to the organization, a list of dictionaries each with key 'name' (string, the id or name of the user) and optionally 'capacity' (string, the capacity in which the user is a member of the organization)

Returns:

the newly created organization (unless ‘return_id_only’ is set to True in the context, in which case just the organization id will be returned)

Path

SERVICE_ENDPOINT/api/organizations/create

Update organization (POST)

Update a organization.

You must be authorized to edit the organization. For further parameters see organization create.

Parameters:

  • id (string) – the name or id of the organization to update

Path

SERVICE_ENDPOINT/api/organizations/update

Patch organization (POST)

Patch an organization

Parameters: id (string) – the id or name of the organization The difference between the update and patch methods is that the patch will perform an update of the provided parameters, while leaving all other parameters unchanged, whereas the update methods deletes all parameters not explicitly provided in the data_dict

Path

SERVICE_ENDPOINT/api/organizations/patch

Delete organization (DELETE)

Delete an organization.

You must be authorized to delete the organization.

Parameters:

  • id (string) – the name or id of the organization

Path

SERVICE_ENDPOINT/api/organizations/delete/

Purge organization (DELETE)

Purging an organization completely removes the organization from the catalogue, whereas deleting an organization simply marks the organization as deleted (it will no longer show up in the frontend, but is still in the db).

Datasets owned by the organization will remain, just not in an organization any more.

You must be authorized to purge the organization.

Parameters:

  • id (string) – the name or id of the organization to be purged

Path

SERVICE_ENDPOINT/api/organizations/purge/

Groups

Group show (GET)

Return the details of a group.

Parameters:

  • id (boolean) – the id or name of the group
  • include_datasets (boolean) – include a truncated list of the group’s datasets (optional, default: False)
  • include_dataset_count (boolean) – include the full package_count (optional, default: True)
  • include_extras – include the group’s extra fields (optional, default: True)
  • include_users – include the group’s users (optional, default: True)
  • include_groups – include the group’s sub groups (optional, default: True)
  • include_tags – include the group’s tags (optional, default: True)
  • include_followers – include the group’s number of followers (optional, default: True)

Path

SERVICE_ENDPOINT/api/groups/show?id=group_name

Groups list (GET)

Return a list of the names of the site’s groups.

Parameters:

  • order_by (string) – the field to sort the list by, must be 'name' or 'packages' (optional, default: 'name') Deprecated use sort.
  • sort (string) – sorting of the search results. Optional. Default: “name asc” string of field name and sort-order. The allowed fields are ‘name’, ‘package_count’ and ‘title’
  • limit (int) – if given, the list of groups will be broken into pages of at most limit groups per page and only one page will be returned at a time (optional)
  • offset (int) – when limit is given, the offset to start returning groups from
  • groups (list of strings) – a list of names of the groups to return, if given only groups whose names are in this list will be returned (optional)
  • all_fields (boolean) – return group dictionaries instead of just names. Only core fields are returned - get some more using the include_* options. Returning a list of packages is too expensive, so the packages property for each group is deprecated, but there is a count of the packages in the package_count property. (optional, default: False)
  • include_dataset_count (boolean) – if all_fields, include the full package_count (optional, default: True)
  • include_extras (boolean) – if all_fields, include the group extra fields (optional, default: False)
  • include_tags (boolean) – if all_fields, include the group tags (optional, default: False)
  • include_groups (boolean) – if all_fields, include the groups the groups are in (optional, default: False).
  • include_users (boolean) – if all_fields, include the group users (optional, default: False).

Path

SERVICE_ENDPOINT/api/groups/list

Group create (POST)

Create a new group.

You must be authorized to create groups.

Parameters:

  • name (string) – the name of the group, a string between 2 and 100 characters long, containing only lowercase alphanumeric characters, - and _
  • id (string) – the id of the group (optional)
  • title (string) – the title of the group (optional)
  • description (string) – the description of the group (optional)
  • image_url (string) – the URL to an image to be displayed on the group’s page (optional)
  • type (string) – the type of the group (optional), IGroupForm plugins associate themselves with different group types and provide custom group handling behaviour for these types Cannot be ‘organization’
  • state (string) – the current state of the group, e.g. 'active' or 'deleted', only active groups show up in search results and other lists of groups, this parameter will be ignored if you are not authorized to change the state of the group (optional, default: 'active')
  • approval_status (string) – (optional)
  • extras (list of dataset extra dictionaries) – the group’s extras (optional), extras are arbitrary (key: value) metadata items that can be added to groups, each extra dictionary should have keys 'key' (a string), 'value' (a string), and optionally 'deleted'
  • packages (list of dictionaries) – the datasets (packages) that belong to the group, a list of dictionaries each with keys 'name' (string, the id or name of the dataset) and optionally 'title' (string, the title of the dataset)
  • groups (list of dictionaries) – the groups that belong to the group, a list of dictionaries each with key 'name' (string, the id or name of the group) and optionally 'capacity' (string, the capacity in which the group is a member of the group)
  • users (list of dictionaries) – the users that belong to the group, a list of dictionaries each with key 'name' (string, the id or name of the user) and optionally 'capacity' (string, the capacity in which the user is a member of the group)

Returns: the newly created group (unless ‘return_id_only’ is set to True in the context, in which case just the group id will be returned) Path

SERVICE_ENDPOINT/api/groups/create/

Group delete (DELETE)

Delete a group.

You must be authorized to delete the group.

Parameters:

  • id (string) – the name or id of the group

Path

SERVICE_ENDPOINT/api/groups/create/

Group purge (DELETE)

Purge a group.

Purging a group cannot be undone! Purging a group completely removes the group from the catalogue, whereas deleting a group simply marks the group as deleted (it will no longer show up in the frontend, but is still in the db).

Datasets in the organization will remain, just not in the purged group.

You must be authorized to purge the group.

Parameters:

  • id (string) – the name or id of the group to be purged

Path

SERVICE_ENDPOINT/api/groups/purge/

Group update (POST)

Update a group.

You must be authorized to edit the group.

For further parameters see group create.

Parameters:

  • id (string) – the name or id of the group to update

Path

SERVICE_ENDPOINT/api/groups/update/

Group patch (POST)

Patch a group

Parameters:

  • id (string) – the id or name of the group

The difference between the update and patch methods is that the patch will perform an update of the provided parameters, while leaving all other parameters unchanged, whereas the update methods deletes all parameters not explicitly provided in the data_dict

Path

SERVICE_ENDPOINT/api/groups/patch/

Resources

Resource show (GET)

Return the metadata of a resource.

Parameters: id (string) – the id of the resource

Path

SERVICE_ENDPOINT/api/resources/show?id=....

Resource create (POST)

Add a new resource to an Item

Parameters:

  • package_id (string) – id of item that the resource should be added to.
  • url (string) – url of resource
  • revision_id (string) – (optional)
  • description (string) – (optional)
  • format (string) – (optional)
  • hash (string) – (optional)
  • name (string) – (optional)
  • resource_type (string) – (optional)
  • mimetype (string) – (optional)
  • mimetype_inner (string) – (optional)
  • cache_url (string) – (optional)
  • size (int) – (optional)
  • created (iso date string) – (optional)
  • last_modified (iso date string) – (optional)
  • cache_last_updated (iso date string) – (optional)
  • upload (FieldStorage (optional) needs multipart/form-data) – (optional)

Path

SERVICE_ENDPOINT/api/resources/create/

Resource delete (DELETE)

Delete a resource from a dataset.

You must be a sysadmin or the owner of the resource to delete it.

Parameters:

  • id (string) – the id of the resource

Path

SERVICE_ENDPOINT/api/resources/delete/

Resource update (POST)

Update a resource.

To update a resource you must be authorized to update the item that the resource belongs to.

For further parameters see resource create.

Parameters:

  • id (string) – the id of the resource to update

Path

SERVICE_ENDPOINT/api/resources/update/

Resource patch (POST)

Patch a resource

Parameters:

  • id (string) – the id of the resource

The difference between the update and patch methods is that the patch will perform an update of the provided parameters, while leaving all other parameters unchanged, whereas the update methods deletes all parameters not explicitly provided in the data_dict

Path

SERVICE_ENDPOINT/api/resources/patch/

Items Profiles

Show profiles in context (GET)

Returns the names of the available item profiles in this context

Path

SERVICE_ENDPOINT/api/profiles/profile_names/

Show profile source in XML (GET)

Returns the names of the available item profile in this context

Parameters

  • name (string) the name of the profile to show

Path

SERVICE_ENDPOINT/api/profiles/profile?name=...

You need to properly set the "Accept" HTTP header field.

Show profile namespaces/categories (GET)

Returns the namespaces available in a given context. Specifically, for each namespace, the following information is returned

  • id
  • name
  • title
  • description

Path

SERVICE_ENDPOINT/api/profiles/namespaces/

Items

Item create (POST)

Create a new item.

You must be authorized to create new items, which means you need to have the Catalogue-Editor (at least) in the context you want to publish.

If at least one profile has been defined within this context, then you need to specify the profile's type when creating the item. You need to insert, among the extras of the JSON object describing the item, a "system:type" property with the proper value (i.e. its value must be equal to the type property contained in the profile). The validation of the submitted request will be performed against the profile whose type has been specified. The other profile's properties need to be specified within extras field as well.

If no profile has been defined, then no validation will be performed. Thus you do not need to set any system:type property.

See below for a complete example.

Parameters:

  • name (string) – the name of the new dataset, must be between 2 and 100 characters long and contain only lowercase alphanumeric characters, - and _, e.g. 'warandpeace'
  • title (string) – the title of the item (optional, default: same as name)
  • private (bool) – If True creates a private dataset
  • author (string) – the name of the dataset’s author (automatically compiled)
  • author_email (string) – the email address of the dataset’s author (automatically compiled)
  • maintainer (string) – the name of the dataset’s maintainer (optional)
  • maintainer_email (string) – the email address of the dataset’s maintainer (optional)
  • license_id (license id string) – the id of the dataset’s license, see license_list() for available values (mandatory)
  • notes (string) – a description of the dataset (optional)
  • version (string, no longer than 100 characters) – (optional)
  • resources (list of resource dictionaries) – the dataset’s resources must have a name/url property and optionally a description property
  • tags (list of tag dictionaries) – the dataset’s tags must have a name property
  • extras (list of dataset extra dictionaries) – the dataset’s extras (optional), extras are arbitrary (key: value) metadata items that can be added to datasets, each extra dictionary should have keys 'key' (a string), 'value' (a string)
  • groups (list of dictionaries) – the groups to which the dataset belongs (optional), each group dictionary should have one or more of the following keys which identify an existing group: 'id' (the id of the group, string), or 'name' (the name of the group, string), to see which groups exist call group_list()
  • owner_org (string) – the id of the dataset’s owning organization

Please note that:

  • owner_org is automatically derived from the gcube-token you provide, if it is bound to a VRE (otherwise you need to specify the owner_org on submission);
  • author and author email are automatically derived from the gcube-token you provide;
SERVICE_ENDPOINT/api/items/create
Example

The steps needed to publish an item in the catalogue are schematically reported in the following figure Catalogue Service - Publication Steps (3).png

Suppose you discover that in the context you are willing to publish are defined the following profiles: 'A', 'B' and 'C'.

You are interested in publishing an item with profile 'A', thus you download the XML of profile 'A' which looks like the following

<metadataformat type="Type A">
   <metadatafield categoryref="category1">
      <fieldName>Field 1</fieldName>
      <mandatory>false</mandatory>
      <dataType>String</dataType>
      <defaultValue />
      <note>Write something here</note>
      <validator />
      <tagging create="true" separator="-">onFieldName</tagging>
   </metadatafield>
   <metadatafield categoryref="category1">
      <fieldName>Field 2</fieldName>
      <mandatory>false</mandatory>
      <dataType>Boolean</dataType>
      <defaultValue>true</defaultValue>
      <note>Set true or false to the checkbox</note>
      <validator />
   </metadatafield>
   <metadatafield categoryref="category2">
      <fieldName>Field 3</fieldName>
      <mandatory>true</mandatory>
      <dataType>String</dataType>
      <defaultValue>A</defaultValue>
      <note>A listbox of values</note>
      <vocabulary isMultiSelection="true">
         <vocabularyField>A3</vocabularyField>
         <vocabularyField>B3</vocabularyField>
         <vocabularyField>C3</vocabularyField>
         <vocabularyField>D3</vocabularyField>
         <vocabularyField>E3</vocabularyField>
         <vocabularyField>F3</vocabularyField>
      </vocabulary>
      <validator />
      <tagging create="true" separator="-">onValue</tagging>
   </metadatafield>
   <metadatafield categoryref="category2">
      <fieldName>Field 4</fieldName>
      <mandatory>true</mandatory>
      <dataType>Number</dataType>
      <defaultValue>4</defaultValue>
      <validator />
   </metadatafield>
</metadataformat>

The previous information can be easily discovered with the aforementioned Item Profile operations. Then, the JSON object of the create request will look like

{
   "name":"my-item-type-a",
   "title":"My Item with Type A",
   "license_id":"cc-by",
   "tags":[
      {
         "name":"my-tag"
      },
      ....
   ],
   "groups":[
      {
         "name":"group-1"
      },
      ....
   ],
   "extras":[
      {
         "key":"system:type",
         "value":"Type A"
      },
      {
         "key":"category1:Field 1",
         "value":"Field 1 value"
      },
      {
         "key":"category2:Field 3",
         "value":"A3"
      },
      {
         "key":"category2:Field 3",
         "value":"F3"
      },
      {
         "key":"category2:Field 4",
         "value":"5"
      }
   ]
}

The category/namespaces are no longer transparently managed.


Item update (POST)

Update an item.

You must be authorized to update the item.

The content of the request must be the json representation of the item to update. The item to update is identified by 'id' field in json representation.

SERVICE_ENDPOINT/api/items/update

Item show (GET)

Return the metadata of an item and its resources.

Parameters:

  • id (string) – the id or name of the dataset
SERVICE_ENDPOINT/api/items/show?id=...

Item delete (DELETE)

Delete an item.

This makes the item disappear from all web & API views, apart from the trash.

You must be authorized to delete the item.

Parameters:

  • id (string) – the id or name of the item to delete
SERVICE_ENDPOINT/api/items/delete

Item purge (DELETE)

Purge an item.

Purging an item cannot be undone!

Purging an item completely removes the item from the catalogue, whereas deleting a dataset simply marks the item as deleted (it will no longer show up in the front-end, but is still in the db).

You must be authorized to purge the item.

Parameters:

  • id (string) – the name or id of the item to be purged
SERVICE_ENDPOINT/api/items/purge

Other useful information

Retrieve your gCube Security Token

A security token is an UUID bound to yourself and a given Infrastructure context. To retrieve it, you just need to go to a VRE for which you are interested in retrieving it, and use the Authorisation Options portlet (see below)

Authorisation option.png

Click on Show button and select the token.
  1. An overview about this technology and the APIs it offers can be found here https://docs.ckan.org/en/2.6/api/index.html