Difference between revisions of "Statistical Algorithms Importer: Docker Support"

From Gcube Wiki
Jump to: navigation, search
(The Docker Image Executor algorithm)
 
(39 intermediate revisions by 3 users not shown)
Line 2: Line 2:
 
||__TOC__
 
||__TOC__
 
|}
 
|}
 
This page explains how to create and run docker images on the D4Science infrastructure through the [[DataMiner_Manager|DataMiner Manager]] service and the algorithms developed with the [[Statistical_Algorithms_Importer|Statistical Algorithms Importer (SAI)]]. Currently for this purpose there is the [https://services.d4science.org/group/rprototypinglab/data-miner?OperatorId=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.DOCKER_IMAGE_EXECUTOR Docker Image Executor (DIE)] algorithm. More information on Docker can be found [https://www.docker.com/ here].
 
  
== The Algorithm Docker Image Executor (DIE) ==
+
A Docker image represents an easy-way to deliver software in packages called containers. Containers are isolated from one another and bundle their own software, libraries and configuration files and thus they may contribute to simplifying the configuration of the D4Science infrastructure.
The [https://services.d4science.org/group/rprototypinglab/data-miner?OperatorId=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.DOCKER_IMAGE_EXECUTOR Docker Image Executor (DIE)] algorithm is already present and accessible on the D4Science infrastructure:
+
D4Science delivers a solution allowing to exploit Docker while preserving the main features of the D4Science infrastructure: replicability, reusability, sharing, accounting of the execution will all be preserved by following and exploiting the [https://services.d4science.org/group/rprototypinglab/data-miner?OperatorId=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.DOCKER_IMAGE_EXECUTOR Docker Image Executor] algorithm.
  
[[Image:DockerImageExecutor1.png|thumb|center|800px|Docker Image Executor (DIE), Docker Support]]
+
This page explains how to create and run Docker Images in the D4Science infrastructure through the [[DataMiner_Manager|DataMiner Manager]] service and the algorithms developed with the [[Statistical_Algorithms_Importer|Statistical Algorithms Importer (SAI)]]. More information on Docker can be found [https://www.docker.com/ here].
  
This algorithm allows you to retrieve the image that you intend to run on the D4Science Swarm cluster from a [http://hub.docker.com Docker Hub] repository. To run the algorithm the user must enter:
+
== The Docker Image Executor algorithm  ==
 +
The [https://services.d4science.org/group/rprototypinglab/data-miner?OperatorId=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.DOCKER_IMAGE_EXECUTOR Docker Image Executor] algorithm allows its users to retrieve and run an image in D4Science Swarm cluster, from a [http://hub.docker.com Docker Hub] repository (only public repositories are supported!).
 +
The algorithm is already published and made available by the D4Science infrastructure:
 +
 
 +
[[Image:DockerImageExecutor1.png|thumb|center|800px|Docker Image Executor, Docker Support]]
 +
 
 +
To run the algorithm the user must enter:
  
 
* '''Image''', the name of the repository (e.g. d4science/sortapp)
 
* '''Image''', the name of the repository (e.g. d4science/sortapp)
 
* '''CommandName''', the name of the command to invoke when the service is started (e.g. sortapp)
 
* '''CommandName''', the name of the command to invoke when the service is started (e.g. sortapp)
* '''FileParam''', a file present in the user's workspace to be passed as an input parameter along with the run command (e.g. [https://data.d4science.net/XH5K sortableelements.txt])
+
* '''ItemParam''', a file or a folder stored into the user's workspace to be passed as an input parameter along with the run command (e.g. [https://data.d4science.net/XH5K sortableelements.txt], there is no specific constraint for the file format or content, every app developer is free to use the most suitable one)
  
This algorithm will take care of retrieving the user token and passing the parameters to the docker service in this format:
+
This algorithm will take care of retrieving the user token and passing the parameters to the Docker Service in this format:
  
 
<pre>
 
<pre>
<command-name> <token> <file-item-id> <temp-dir-item-id>
+
<command-name> <token> <item-id> <temp-dir-item-id>
 
</pre>
 
</pre>
  
In addition to passing the token and the input file, the algorithm also passes the id of the temporary folder that was created on the [[StorageHub_REST_API|StorageHub]] service to contain the computation results. The service created from the chosen image will be responsible for saving the data of its own computation in the folder indicated by the algorithm interacting with the [[StorageHub_REST_API|StorageHub]] service. When the execution of the docker service is finished, that is, it will have created the results and saved on the temporary folder, then the [https://services.d4science.org/group/rprototypinglab/data-miner?OperatorId=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.DOCKER_IMAGE_EXECUTOR Docker Image Executor (DIE)] algorithm will take care of returning the result as a zip file of the temporary folder. So, it is important that the docker image is written with these constraints in mind.
+
The algorithm is called to deliver the token and the input item to the service created starting from the chosen image. Moreover, the algorithm delivers the id of the temporary folder that was created on the [[StorageHub_REST_API|StorageHub]] service. This folder is expected to be used by the algorithm to store the results of a computation. The service created from the chosen image will be responsible for saving the data of its own computation in the folder indicated by the algorithm interacting with the [[StorageHub_REST_API|StorageHub]] service. When the execution of the Docker Service is completed, the [https://services.d4science.org/group/rprototypinglab/data-miner?OperatorId=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.DOCKER_IMAGE_EXECUTOR Docker Image Executor] algorithm will take care of returning the content of the folder as a zip file. So, it is important that the Docker Image is written with these constraints in mind.
 +
Attention, the Docker Image Executor algorithm is not capable of executing a generic image present on the docker hub but only those images that are capable of interacting correctly with the infrastructure.
 +
 
 +
=== How Create A Docker Image ===
 +
The image could be constructed using different languages and base images.
 +
It is instead mandatory that the software packed in the image accepts the parameters as passed by the Docker Image Executor and respects the constraint to save the results in the temporary folder on [[StorageHub_REST_API|StorageHub]] as indicated.
 +
Here are some examples:
 +
 
 +
==== Python ====
 +
An example of how to create a Docker Image by Python suitable for running via the Docker Image Executor algorithm is shown [https://code-repo.d4science.org/gCubeSystem/sortapp here]. This image is built starting from the base python:3.6-alpine image and installing the sortapp application written in python3.6 (see [https://code-repo.d4science.org/gCubeSystem/sortapp/src/branch/master/Dockerfile Dockerfile]).
 +
 
 +
: The image is available by Docker Hub here: [https://hub.docker.com/r/d4science/sortapp d4science/sortapp]
 +
 
 +
The sortapp application built in this example simply does the sorting of strings. The strings are contained in the file indicated by the ItemParam parameter.
 +
 
 +
To run by Docker Image Executor:
 +
* '''Image''' = d4science/sortapp
 +
* '''CommandName''' = sortapp
 +
* '''ItemParam''' = the file [https://data.d4science.net/XH5K sortableelements.txt]
 +
 
 +
==== R ====
 +
An example of how to create a Docker Image by R suitable for running via the Docker Image Executor algorithm is shown [https://code-repo.d4science.org/gCubeSystem/sortappr here]. This image is built starting from the base rocker/r-base image and installing the sortapp application written in R (see [https://code-repo.d4science.org/gCubeSystem/sortappr/src/branch/master/Dockerfile Dockerfile]).
 +
 
 +
: The image is available by Docker Hub here: [https://hub.docker.com/r/d4science/sortappr d4science/sortappr]
  
=== How Create Docker Image ===
+
The sortappr application built in this example simply does the sorting of strings. The strings are contained in the file indicated by the ItemParam parameter.
  
An example of how create a docker image suitable for running via the DIE algorithm is shown [https://code-repo.d4science.org/gCubeSystem/sortapp here]. :
+
Different versions of R images are available according to the type of need, see [https://hub.docker.com/u/rocker/ here].
This image is built starting from the base python:3.6-alpine image and installing the sortapp application written in python3.6 (see [https://code-repo.d4science.org/gCubeSystem/sortapp/src/branch/master/Dockerfile Dockerfile]).
+
  
:The image is available on docker hub here: [https://hub.docker.com/repository/docker/d4science/sortapp d4science/sortapp]
+
To run by Docker Image Executor:
 +
* '''Image''' = d4science/sortappr
 +
* '''CommandName''' = sortapp
 +
* '''ItemParam''' = the file [https://data.d4science.net/XH5K sortableelements.txt]
  
So in general, the image could also be constructed using other languages and different base images.
 
What remains binding is that the image that is created accepts the parameters as passed by the DIE and respects the constraint to save the results in the temporary folder on [[StorageHub_REST_API|StorageHub]] as indicated.
 
  
 
<!--
 
<!--

Latest revision as of 16:44, 4 May 2022

A Docker image represents an easy-way to deliver software in packages called containers. Containers are isolated from one another and bundle their own software, libraries and configuration files and thus they may contribute to simplifying the configuration of the D4Science infrastructure. D4Science delivers a solution allowing to exploit Docker while preserving the main features of the D4Science infrastructure: replicability, reusability, sharing, accounting of the execution will all be preserved by following and exploiting the Docker Image Executor algorithm.

This page explains how to create and run Docker Images in the D4Science infrastructure through the DataMiner Manager service and the algorithms developed with the Statistical Algorithms Importer (SAI). More information on Docker can be found here.

The Docker Image Executor algorithm

The Docker Image Executor algorithm allows its users to retrieve and run an image in D4Science Swarm cluster, from a Docker Hub repository (only public repositories are supported!). The algorithm is already published and made available by the D4Science infrastructure:

Docker Image Executor, Docker Support

To run the algorithm the user must enter:

  • Image, the name of the repository (e.g. d4science/sortapp)
  • CommandName, the name of the command to invoke when the service is started (e.g. sortapp)
  • ItemParam, a file or a folder stored into the user's workspace to be passed as an input parameter along with the run command (e.g. sortableelements.txt, there is no specific constraint for the file format or content, every app developer is free to use the most suitable one);

This algorithm will take care of retrieving the user token and passing the parameters to the Docker Service in this format:

<command-name> <token> <item-id> <temp-dir-item-id>

The algorithm is called to deliver the token and the input item to the service created starting from the chosen image. Moreover, the algorithm delivers the id of the temporary folder that was created on the StorageHub service. This folder is expected to be used by the algorithm to store the results of a computation. The service created from the chosen image will be responsible for saving the data of its own computation in the folder indicated by the algorithm interacting with the StorageHub service. When the execution of the Docker Service is completed, the Docker Image Executor algorithm will take care of returning the content of the folder as a zip file. So, it is important that the Docker Image is written with these constraints in mind. Attention, the Docker Image Executor algorithm is not capable of executing a generic image present on the docker hub but only those images that are capable of interacting correctly with the infrastructure.

How Create A Docker Image

The image could be constructed using different languages and base images. It is instead mandatory that the software packed in the image accepts the parameters as passed by the Docker Image Executor and respects the constraint to save the results in the temporary folder on StorageHub as indicated. Here are some examples:

Python

An example of how to create a Docker Image by Python suitable for running via the Docker Image Executor algorithm is shown here. This image is built starting from the base python:3.6-alpine image and installing the sortapp application written in python3.6 (see Dockerfile).

The image is available by Docker Hub here: d4science/sortapp

The sortapp application built in this example simply does the sorting of strings. The strings are contained in the file indicated by the ItemParam parameter.

To run by Docker Image Executor:

R

An example of how to create a Docker Image by R suitable for running via the Docker Image Executor algorithm is shown here. This image is built starting from the base rocker/r-base image and installing the sortapp application written in R (see Dockerfile).

The image is available by Docker Hub here: d4science/sortappr

The sortappr application built in this example simply does the sorting of strings. The strings are contained in the file indicated by the ItemParam parameter.

Different versions of R images are available according to the type of need, see here.

To run by Docker Image Executor: