Difference between revisions of "GCat Background"
m (→SoBigData.eu: Dataset Metadata) |
m (→SoBigData.eu: Dataset Metadata) |
||
Line 191: | Line 191: | ||
|- | |- | ||
| Creators | | Creators | ||
− | | Author is there, unfortunately there is only one author per Dataset. | + | | Author is there, unfortunately there is only one author per Dataset. Moreover, the technology supports only key value pairs ... no complex types. |
+ | <pre> | ||
+ | <fieldName>Creator.Name</fieldName> | ||
+ | <mandatory>true</mandatory> | ||
+ | <isBoolean>false</isBoolean> | ||
+ | <defaulValue></defaulValue> | ||
+ | <note>The name of the creator. The format should be: family, given. Examples: Smith, John; Miller, Elizabeth | ||
+ | </note> | ||
+ | <vocabulary></vocabulary> | ||
+ | <validator></validator> | ||
+ | </pre> | ||
|- | |- | ||
| Creation Date | | Creation Date |
Revision as of 11:36, 28 June 2016
** THIS DOCUMENT IS A DRAFT **
gCube Data Catalogue.... using CKAN.
CKAN is a powerful data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data. CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data see: http://ckan.org/
Contents
gCube Data Catalogue Metadata
A Metadata in the gCube Data Catalogue is made by two parts: CKAN's default metadata fields and gCube Metadata Profile.
CKAN's default metadata fields
Those are metadata fields common for all metadata types in the gCube Data Catalogue (and used by default in the CKAN platform).
Label | Field Name (API) | Definition | Guidelines | Example |
---|---|---|---|---|
Title* | title | Name given to the dataset. | Short phrase, written in plain language. Should be sufficiently descriptive to allow for search and discovery. | Aquaculture Production and Consumption in Africa (2011) |
Description | description | Short description explaining the content and its origins. | Description of a few sentences, written in plain language. Should,provide a sufficiently comprehensive overview of the resource for anyone,to understand its content, origins, and any continuing work on it. The,description can be written at the end, since it summarizes key,information from the other metadata fields. | This dataset contains attributes of aquaculture production and,consumption for each of Africa’s provinces in 2011. The data was,provided by……… |
Tags | tags | An array of Taxonomic terms stored as tags | Taxonomic terms | Access to education, Bamboo |
License* | lincese_title | the license that applies to published dataset. | ||
Organization* | organization | Organization the datasets belongs to | See list of organizations on | D4Science |
Version | version | Version of dataset | Increase manually after editing | 1.0 |
Author* | Owner of dataset | The person who created the dataset | Joe Bloggs | |
Author Contact | Contact details of owner | The email or other contact details of the person who created the dataset. | joe@example.com | |
Mantainer | Mantainer of the dataset | The person who maintains the dataset | Joe Bloggs | |
Mantainer
Contact |
Contact details of mantainer | The email or other contact details of the person who maintains the dataset. | joe@example.com |
mandatory fields are marked with an asterisk (*)
gCube Metadata Profile
gCube Metadata Profile defines a Metadata schema XML-based for adding custom metadata fields.
A gCube Metadata Profile is composed by one Metadata Format (<metadataformat>) that contains one or many (<metadatafield>). The schema is the following:
<?xml version="1.0" encoding="UTF-8"> <metadataformat> <metadatafield> <fieldName>Name</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue>default value</defaulValue> <note>shown as suggestions in the insert/update metadata form of CKAN</note> <vocabulary> <vocabularyField>field1</vocabularyField> <vocabularyField>field2</vocabularyField> <!-- ... others vocabulary fields --> </vocabulary> <validator> <regularExpression>a regular expression for validating values</regularExpression> </validator> </metadatafield> <!-- ... others metadata fields --> </metadataformat>
It's possible to validate a Metadata Format schema using following DTD
<?xml version="1.0" encoding="UTF-8"?> <!ELEMENT metadataformat (metadatafield+)> <!ELEMENT metadatafield (fieldName, mandatory, isBoolean?, defaulValue?, note?, vocabulary?, validator?)> <!ELEMENT fieldName (#PCDATA)> <!ELEMENT mandatory (#PCDATA)> <!ELEMENT isBoolean (#PCDATA)> <!-- MUST BE (true|false) --> <!ELEMENT defaulValue (#PCDATA)> <!ELEMENT note (#PCDATA)> <!ELEMENT vocabulary (vocabularyField+)> <!ELEMENT vocabularyField (#PCDATA)> <!ELEMENT validator (regularExpression)> <!ELEMENT regularExpression (#PCDATA)>
A possible instance of Metadata Field (<metadatafield>):
<metadatafield> <fieldName>Accessibility</fieldName> <mandatory>true</mandatory> <defaulValue>virtual/public</defaulValue> <vocabulary> <vocabularyField>virtual/public</vocabularyField> <vocabularyField>virtual/private</vocabularyField> <vocabularyField>transactional</vocabularyField> </vocabulary> </metadatafield>
SoBigData.eu: Dataset Metadata
The current list of fields characterising a SoBigData resource is available at https://docs.google.com/spreadsheets/d/1kuhvmDVKpmqt2foyCB9wDo3HgzoAiCuRQ8CjRS-DVOM/edit?usp=sharing
The following fields have been identified:
Field | In Catalogue | Notes |
---|---|---|
Internal Fields | ||
Internal Identifier | Automatically created | |
Creation Date | Automatically created | |
Last Modification | Automatically updated | |
General Description | ||
Title | Title | |
Identifier |
<fieldName>External Identifier</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>This applies only to datasets that have been already published. Insert here a DOI, an handle, and any other Identifier assigned when publishing the dataset alsewhere.</note> <vocabulary></vocabulary> <validator></validator> | |
Creators | Author is there, unfortunately there is only one author per Dataset. Moreover, the technology supports only key value pairs ... no complex types.
<fieldName>Creator.Name</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>The name of the creator. The format should be: family, given. Examples: Smith, John; Miller, Elizabeth </note> <vocabulary></vocabulary> <validator></validator> | |
Creation Date | ||
Distributor | ||
Publisher | ||
Publication Date | ||
Contact | Isn't this the Author / Maintainer? | |
Thematic Cluster | ||
Area | ||
Semantic Coverage | ||
Time Coverage Start Date | ||
Time Coverage End Date | ||
Geo Location | ||
ProcessingDegree | ||
ManifestationType | ||
Language | ||
Description | ||
RelatedLiterature | ||
RelatedDataset | ||
Accessibility properties | ||
Accessibility | ||
AccessibilityMode | ||
Privacy | ||
Technical properties | ||
Size | ||
DiskSize | ||
Format | ||
FormatSchema | ||
Api | ||
Legally and Ethical Aspects | ||
Personal data/ Non Personal | ||
Personal sensitive data | ||
Data set contains data of children | ||
Consent of the data subject | ||
Consent obtained also covers the envisaged transfer of the personal data outside the EU | ||
Personal data was manifestly made public by the data subject | ||
Data Protection Directive | ||
Intellectual properties | ||
IP/Copyrights | ||
Link to the source | ||
| ||
License | ||
Link to the license | ||
Field/Scope of use | ||
Basic rights | ||
Restrictions on use | ||
Prohibited actions | ||
Sublicense rights | ||
Attribution requirements | ||
Display requirements | ||
Distribution requirements | ||
Territory of use | ||
License term | ||
Requirement of non-disclosure
(confidentiality mark) | ||
<metadatafield> <fieldName>'''xxx'''</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue>default value</defaulValue> <note>shown as suggestions in the insert/update metadata form of CKAN</note> <vocabulary> <vocabularyField>field1</vocabularyField> <vocabularyField>field2</vocabularyField> <!-- ... others vocabulary fields --> </vocabulary> <validator> <regularExpression>a regular expression for validating values</regularExpression> </validator> </metadatafield> |
SoBigData.eu: Method Metadata
gCube Data Catalogue: Geo Harvesting
This extension contains plugins (ckanext-geonetwork and others) that add geospatial capabilities to CKAN (https://github.com/geosolutions-it/ckanext-geonetwork/wiki).
Several harvesters to import geospatial metadata into CKAN from other sources in ISO 19139 format and others has been created in gCube Data Catalogue. In particular all metadata created into gCube Geonetwork (GeoNetwork is the catalog application to manage spatially referenced resources generated into D4Science Infrastructure) are harvested through the 'Geoentwork Resolver' a "middle tier" able to:
- use the Geonetwork Manager in order to harvest private metadata (via authentication) stored in gCube Geonetwork on CKAN Data Catalogue (ex. http://data-d.d4science.org/geonetwork/gcube_devsec_devVRE to harvest private metadata generated from scope /gcube/devsec/devVRE);
- create a CKAN Harvester that skip all public metadata via configuration during scope harvesting (ex. http://data-d.d4science.org/geonetwork/gcube_devsec_devVRE%23filterpublicids to filter public ids during harvesting of /gcube/devsec/devVRE);
- create a CKAN Harvester to harvest only public metadata (saved on Geonetwork) avoiding the Geonetwork authentication via configuration (ex. http://data-d.d4science.org/geonetwork/gcube_devsec_devVRE%23noauthentication).
Mapping (among fields) from an ISO19139 Metadata to Ckan Dataset via ckanext-geonetwork is showed in the following table:
ISO19139 | Ckan Dataset |
---|---|
Title | Title |
Description | Description |
Digital Transfer Option | Data and Resource |
CI_OnlineResource | |
gmd:url | URL |
gmd:name | Name |
gmd:description | Description |
Descriptive Keywords | |
gmd:keyword | Tag |
Additional Info | |
bbox, metadata language, age,
reference system, etc. |
key/value |