Difference between revisions of "Zenodo Publication"
(→Automatic Metadata Generation) |
|||
(6 intermediate revisions by the same user not shown) | |||
Line 37: | Line 37: | ||
== 3. Description == | == 3. Description == | ||
− | It is generated starting from the README and changelog.xml files. The two file are parsed removing redundant (e.g. license, because it is published already in | + | It is generated starting from the README and changelog.xml files. The two file are parsed removing redundant (e.g. license because it is published already in [[#14. License]], source code, because it is published already in [[#5. GitHub link]]) or useless information (e.g. links to root page of the gCube wiki). |
== 4. Files == | == 4. Files == | ||
Line 46: | Line 46: | ||
== 6. DOI == | == 6. DOI == | ||
− | assigned by Zenodo to this deposition. | + | This is the unique identifier assigned by Zenodo to this deposition. |
== 7. Keywords == | == 7. Keywords == | ||
Line 64: | Line 64: | ||
Static list of grants. The same for all depositions | Static list of grants. The same for all depositions | ||
− | == 9. | + | == 9. Binary artifact reference == |
− | + | A link to the binary package (stored in Nexus) generated from the compilation of the source package uploaded in [[#4. Files]]. | |
− | == 10. GitHub Reference == | + | == 10. Documentation Reference == |
+ | A reference to the documentation of the binary artifact. If present, this is a link to the jar file that contains the javadoc | ||
+ | |||
+ | == 11. GitHub Reference == | ||
(same as 5) | (same as 5) | ||
− | == | + | == 12. PackageId == |
internal identifier to uniquely identify the depositions | internal identifier to uniquely identify the depositions | ||
− | == | + | == 13. Commiunity == |
All depositions belongs to the gCube community | All depositions belongs to the gCube community | ||
− | == | + | == 14. License == |
Statically set to EUPL-1.1 | Statically set to EUPL-1.1 | ||
− | = | + | == 15. Versions == |
+ | This is automatically generated by Zenodo to list all the other versions of the same component published in Zenodo. | ||
+ | |||
+ | = Support for .zenodo.json file = | ||
+ | |||
+ | It is possible to override all the metadata information generated automatically by adding a '''.zenodo.json''' file in the root of the source code of the component. This is a mechanism that Zenodo offers for the publication through GitHub and that we implemented also in our publisher. | ||
+ | |||
+ | The .zenodo.json file will be read after the metadata generation from the source package and each metadata field generated will be replaced by the same field in the file (if present). | ||
+ | |||
+ | Some examples of the format of the .zenodo.json file are [https://github.com/nipy/nipype/blob/master/.zenodo.json here] [https://github.com/numenta/htmresearch/blob/master/.zenodo.json here] | ||
+ | |||
+ | |||
+ | = Troubleshooting = | ||
+ | |||
+ | == Wrong authors published == | ||
− | + | The automated publisher takes the list of authors from the <code>vcs.authors</code> property in the ETICS configuration. In turn, ETICS generated the <code>vcs.authors</code> property analysing the SVN/Git commits in the component repository. The analysis is based on an implementation of the [https://arxiv.org/pdf/1604.06766.pdf truck factor] that takes the authors of all the commits and weights them on the basis of the type of commits they did. It also uses thresholds to try to remove people that contributed less to the component. | |
− | + | It could happen for some components that authors published in Zenodo do not match the actual authors of the components. This can be caused by | |
+ | # a not updated <code>vcs.authors</code> property, | ||
+ | # not appropriate thresholds in the algorithm implementation or | ||
+ | # the authors did not contributed (enough) to the code (e.g. because they just designed the component, because the code has been imported in SVN by someone else). | ||
− | + | '''Solutions''' | |
− | + | For already published depositions, contact the release manager that will take care of fixing the depositions on the Zenodo portal directly | |
− | + | For future releases of the component: | |
− | + | # try to trigger an update of the <code>vcs.authors</code> property by modifying something in the configuration or in the code. If the configuration is locked, you need to clone the configuration and release it (attach to the current gCube release) | |
+ | # if after the update the property still report wrong authors, contact ETICS support team: they will try to adjust algorithm thresholds | ||
+ | # if the algorithm cannot manage to extract correct autors (case 3), you can override the list of authors providing a .zenodo.json file in the component codebase root (see [[#Support for .zenodo.json file]]). |
Latest revision as of 14:56, 12 April 2018
Zenodo is a portal (launched in May 2013) that collects outputs from researches in all fields of science to promote open access and open data. It is supported by the OpenAIRE initiative and developed and hosted by CERN.
Each outcome uploaded (called Deposition in the Zenodo gergo) is stored by Zenodo along with a rich set of metadata, searchable via the portal and harvestable via the OAI-PMH protocol.
Zenodo assigns to each deposition a unique Digital Object Identifier (DOI) to to make the upload easily and uniquely citeable.
Contents
gCube Community
The gCube Community is a community created with the objective of collecting all the depositions related to gCube software. New software artifacts are automatically uploaded at every gCube release and metadata is generated automatically (see #Automatic Metadata Generation).
Automatic Metadata Generation
The content for each deposition is generated automatically by the Zenodo Publisher from the information in the component source code and in ETICS.
The figure below shows the structure and the information published for each software artifact.
1. Title
The title of the deposition is built as follow:
gCube <name>
where:
- name is the content of <name> tag in the pom.xml. If not present, the artifactId is used
Previously also the component version and the gcube release were added in the title, but since the versioning of depositions is supported, this information has been removed because redundant.
2. Authors
Built from the content of "vcs.authors" property in ETICS configuration. That property is computed by ETICS looking at the VCS history of the component, If the user in ETICS has associated an ORCID identifier, it is added (Zenodo shows it as a green circle before the Author name).
3. Description
It is generated starting from the README and changelog.xml files. The two file are parsed removing redundant (e.g. license because it is published already in #14. License, source code, because it is published already in #5. GitHub link) or useless information (e.g. links to root page of the gCube wiki).
4. Files
The only file uploaded is the source package. It is generated during ETICS builds and contains everything checked-out from the VCS
5. GitHub link
The link to the position of the source code in GitHub
6. DOI
This is the unique identifier assigned by Zenodo to this deposition.
7. Keywords
Keywords for each component are stored in ETICS in the the description field of the corresponding ETICS module (not the configuration). For instance, keywords for "Common Authorization" are stored in the description of "org.gcube.common.authorization-common" module. They are encoded as follow:
#keywords=keyword1, keyword 2, key word3, ...
If keywords are not found in the component, they are searched in the subsystem and if also in the subsystem are not found, the keywords for the project are used. org.gcube project define these keywords:
#keywords=gCube, Java, Data Management System, Hybrid Data Infrastructure
8. Grants
Static list of grants. The same for all depositions
9. Binary artifact reference
A link to the binary package (stored in Nexus) generated from the compilation of the source package uploaded in #4. Files.
10. Documentation Reference
A reference to the documentation of the binary artifact. If present, this is a link to the jar file that contains the javadoc
11. GitHub Reference
(same as 5)
12. PackageId
internal identifier to uniquely identify the depositions
13. Commiunity
All depositions belongs to the gCube community
14. License
Statically set to EUPL-1.1
15. Versions
This is automatically generated by Zenodo to list all the other versions of the same component published in Zenodo.
Support for .zenodo.json file
It is possible to override all the metadata information generated automatically by adding a .zenodo.json file in the root of the source code of the component. This is a mechanism that Zenodo offers for the publication through GitHub and that we implemented also in our publisher.
The .zenodo.json file will be read after the metadata generation from the source package and each metadata field generated will be replaced by the same field in the file (if present).
Some examples of the format of the .zenodo.json file are here here
Troubleshooting
Wrong authors published
The automated publisher takes the list of authors from the vcs.authors
property in the ETICS configuration. In turn, ETICS generated the vcs.authors
property analysing the SVN/Git commits in the component repository. The analysis is based on an implementation of the truck factor that takes the authors of all the commits and weights them on the basis of the type of commits they did. It also uses thresholds to try to remove people that contributed less to the component.
It could happen for some components that authors published in Zenodo do not match the actual authors of the components. This can be caused by
- a not updated
vcs.authors
property, - not appropriate thresholds in the algorithm implementation or
- the authors did not contributed (enough) to the code (e.g. because they just designed the component, because the code has been imported in SVN by someone else).
Solutions
For already published depositions, contact the release manager that will take care of fixing the depositions on the Zenodo portal directly
For future releases of the component:
- try to trigger an update of the
vcs.authors
property by modifying something in the configuration or in the code. If the configuration is locked, you need to clone the configuration and release it (attach to the current gCube release) - if after the update the property still report wrong authors, contact ETICS support team: they will try to adjust algorithm thresholds
- if the algorithm cannot manage to extract correct autors (case 3), you can override the list of authors providing a .zenodo.json file in the component codebase root (see #Support for .zenodo.json file).