Difference between revisions of "How to use the DataMiner Pool Manager"

From Gcube Wiki
Jump to: navigation, search
(Configuration and Testing)
(Process (From SAI to Production VRE))
 
(36 intermediate revisions by the same user not shown)
Line 3: Line 3:
  
 
== Maven coordinates ==
 
== Maven coordinates ==
The second version of the the service has been released in gCube 4.6.0.
+
The second version of the the service has been released in gCube 4.6.1.
 
The maven artifact coordinates are:  
 
The maven artifact coordinates are:  
 
  <dependency>
 
  <dependency>
Line 14: Line 14:
 
==Overview==
 
==Overview==
  
The service may accept an algorithm descriptor, including its dependencies (either OS, R and custom packages), queries the IS for dataminers in the current scope, generates (via templating) ansible playbook, inventory and roles for relevant stuff (algorithm installer, algorithms, dependencies), executes ansible playbook on a Staging DataMiner, udpdate the list of dependendencies and algorithms that will be used from Cron for the installation.
+
The service may accept an algorithm descriptor, including its dependencies, generates (via templating) ansible playbook, inventory and roles for the relevant stuff (algorithm installer, algorithms, dependencies), executes ansible playbook on a Staging DataMiner, and finally udpdates the lists of dependendencies and algorithms that will be used from a Cron-job for the installation.
 
+
In such sense, the service accepts as input, among the others, the url of an algorithm package (including jar, and metadata), extracts the information needed to installation, installs the script, update the list of dependencies, publishs the new algorithm in the Information System and returns asynchronously the execution outcome to the caller.
+
  
 +
In such sense, the service accepts as input, the url of an algorithm package (including jar, and metadata), extracts the information needed to installation, installs the script, updates the list of dependencies, publishes the new algorithm in the Information System and returns asynchronously the execution outcome to the caller.
  
 
==Architecture==
 
==Architecture==
Line 28: Line 27:
 
** dataminer1-devnext.d4science.org for the development environment
 
** dataminer1-devnext.d4science.org for the development environment
 
** dataminer-proto-ghost.d4science.org for the production environment
 
** dataminer-proto-ghost.d4science.org for the production environment
* '''SVN Dependencies Lists''': lists (in files on SVN) of dependencies that must be installed on Dataminer machines. There is one list for type of dependency (system, github, cran) both for RPrototypingLab and for Production.
+
* '''SVN Dependencies Lists''': lists (in files on SVN) of dependencies that must be installed on Dataminer machines. There is one list for type of dependency both for Dev, RProto and Production.
* '''SVN Algorithms List''': lists (in files on SVN) of algorithms that must be installed on Dataminer machines. The service uses two different lists, one for the proto environment, and another one for the production.
+
* '''SVN Algorithms List''': lists (in files on SVN) of algorithms that must be installed on Dataminer machines. The service uses three different lists, one for the Dev environment, one for RProto and another one for the production.
* The '''Cron job''': runs on every Dataminer and periodically (every minute) aligns the packages and the algorithms installed on the machine with the SVN Dependencies List and the SVN Algorithms List (both Production and RPrototypingLab). Concerning the Algorithms, The Cron Job should have to be configured to run the command line available as record of SVN list, while as far as the Dependencies concerns, the Cron Job should have to be configured in order to read and install from both the set of dependencies lists. The lists to consider are the following:
+
* The '''Cron job''': runs on every Dataminer and periodically (every minute) aligns the packages and the algorithms installed on the machine with the SVN Dependencies List and the SVN Algorithms Lists. Concerning the Algorithms, The Cron Job should have to be configured to run the command line available as record of SVN list, while as far as the Dependencies concerns, the Cron Job should have to be configured in order to just read from all the set of dependencies lists. The lists to consider are the following:
 
** Production Algorithms:  
 
** Production Algorithms:  
 
     http://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/algorithms/prod/algorithms
 
     http://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/algorithms/prod/algorithms
 
** RProto Algorithms:
 
** RProto Algorithms:
     http://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/algorithms/proto/algorithms)
+
     http://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/algorithms/proto/algorithms
** Production Dependencies:
+
** Dev Algorithms:
    https://svn.d4science.research-infrastructures.eu/gcube/trunk/data-analysis/RConfiguration/RPackagesManagement/r_cran_pkgs.txt
+
  http://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/algorithms/dev/algorithms
    https://svn.d4science.research-infrastructures.eu/gcube/trunk/data-analysis/RConfiguration/RPackagesManagement/r_deb_pkgs.txt
+
    https://svn.d4science.research-infrastructures.eu/gcube/trunk/data-analysis/RConfiguration/RPackagesManagement/r_github_pkgs.txt
+
**RProto Dependencies
+
    https://svn.d4science.research-infrastructures.eu/gcube/trunk/data-analysis/RConfiguration/RPackagesManagement/test_r_cran_pkgs.txt
+
    https://svn.d4science.research-infrastructures.eu/gcube/trunk/data-analysis/RConfiguration/RPackagesManagement/test_r_deb_pkgs.txt
+
    https://svn.d4science.research-infrastructures.eu/gcube/trunk/data-analysis/RConfiguration/RPackagesManagement/test_r_github_pkgs.txt
+
  
 
==Process (From SAI to Production VRE)==
 
==Process (From SAI to Production VRE)==
  
Currently SAI is deployed in several scopes and the user may deploy the algorithm just in the actual VRE.
+
Until now, SAI was deployed in several scopes and the user may deploy the algorithm just in the actual VRE.
The idea is to have just an instance of SAI in RPrototypingLab VRE and allow the user to specify the VRE by providing the token for that VRE.
+
The idea is to have different instances of SAI in many VREs and allow the user to specify the VRE to consider for the deploy.
The Installation of the new algorithms by means of SAI involves the following input therefore:
+
 
+
* Package containing Metadata and dependencies
+
* The target VRE and the token to access it
+
  
 
The process is composed of two main phases:
 
The process is composed of two main phases:
  
* '''TEST Phase''': the installation of an algorithm and its dependencies in the staging dataminer; it ends with the publishing of an algorithm in the pool of dataminers of the RPrototypingLab VRE
+
* '''STAGING Phase''': the installation of an algorithm in the staging dataminer; it ends with the publishing of an algorithm in the pool of dataminers of the target VRE
 
** The DMPM contacts the Staging Dataminer and installs the algorithm and the dependencies
 
** The DMPM contacts the Staging Dataminer and installs the algorithm and the dependencies
** The output is retrieved. If there are errors in the installation (e.g. a dependency that does not exist) it stops and the log is returned to the user
+
** The output is retrieved. If there are errors in the installation (e.g. a dependency that does not exist or is written not correctly) it stops and the log is returned to the user (a mail notification is sent to the user and to the VRE adminstrators).
** The DMPM updates the SVN RPrototypingLab Dependencies lists
+
** The DMPM updates the SVN Algorithms list
** The DMPM updates the SVN RPrototypingLab Algorithms list
+
** A mail notification is sent to the user and to the VRE administrators
** Cron read the SVN lists (both Dependencies and Algorithms) and installs the algorithm only and the dependencies in RPrototypingLab dataminers.
+
** Cron read the SVN lists (both Dependencies and Algorithms) and installs the algorithm in the pool of dataminers for the current VRE.
** The script publishes the new algorithm in RPrototypingLab VRE (if an algorithm is already available on the IS in that scope, the script updates the .jar files, but the resource on the IS, the  .properties and the wps config do not change)
+
** The script publishes the new algorithm in the Information System.
  
[[File:5.png]]
+
[[File:staging.png]]
  
 
* '''RELEASE Phase'''
 
* '''RELEASE Phase'''
** SAI will invoke the service working in RELEASE PHASE in order to install the algorithm in a particular VRE of production (provided by the user); SAI will pass to the DMPM the target VRE name and the token to access to that VRE
+
** SAI will invoke the service working in RELEASE PHASE in order to install the algorithm in a particular VRE of production (provided by the user); SAI will pass to the DMPM the target VRE.
** The DMPM updates the SVN Production Dependencies lists
+
 
** The DMPM updates the SVN Production Algorithms list
 
** The DMPM updates the SVN Production Algorithms list
** Cron installs the algorithm only and the dependencies in the production dataminers
+
** Cron installs the algorithm in the production dataminers
 +
** A mail notification is sent to the user and to the VRE administrators
 
** The script publishes the algorithm in the VRE
 
** The script publishes the algorithm in the VRE
  
[[File:4.png]]
+
[[File:release.png]]
  
 
==Configuration and Testing==
 
==Configuration and Testing==
 
to do config:
 
= service.properties
 
= web.xml
 
  
 
DMPM is a SmartGears compliant service.  
 
DMPM is a SmartGears compliant service.  
Line 86: Line 71:
 
</source>
 
</source>
  
In such sense, an instance has already been deployed and configured at Development level.
+
In such sense, an instance has been deployed and configured at Development, Preprod and Prod levels.
  
 
<source lang="text">
 
<source lang="text">
Line 92: Line 77:
 
</source>
 
</source>
  
Such environment contains the configurations for ansible playbook, inventory and roles for algorithm installer, scripts, algorithms, dependencies and the logs of the executions.
+
In order to use the service, two manual configuration are needed:
 +
 
 +
** to modify the parameter ''<param-value>Dev</param-value>'' in ''/home/gcube/tomcat/webapps/dataminer-pool-manager/WEB-INF/web.xml'' file according to scope where the service runs (Dev, RProto or Prod); such information will be read dinamically from the service for the switching among the list algorithms and dependencies to consider, of for the selection of the staging dataminer.
 +
** to edit the file ''/home/gcube/dataminer-pool-manager/dpmConfig/service.properties''. Such file contains among the others, the staging dataminer to consider (automatically selected based on the environment at the previous point), the SVN repositories for the algorithms of each environment and for all the typologies of dependencies generated from SAI and available in the metadata file. (e.g., the service carries out his checks on the correctness of the name of a dependency by going to read in the correspondent file according to the language defined in the ''info.txt'' file available in the algorithm package).
 +
An example of ''info.txt'' file is the following:
  
 
<source lang="text">
 
<source lang="text">
/home/gcube/tomcat/webapps/dataminer-pool-manager/WEB-INF/classes/static    // static resource inside the WAR containing static roles
+
Username: giancarlo.panichi
/home/gcube/tomcat/webapps/dataminer-pool-manager/WEB-INF/classes/templates  // static resource inside the WAR containing the templates
+
Full Name: Giancarlo Panichi
/home/gcube/tomcat/webapps/dataminer-pool-manager/WEB-INF/classes/custom    // static resource inside the WAR containing the custom roles
+
Email: g.panichi@isti.cnr.it
/home/gcube/dataminer-pool-manager/dpmConfig/service.properties              // static resource on the filesystem containing configuration data
+
/home/gcube/dataminer-pool-manager/jobs                                      // dynamically generated resource concerning the logs of the different job executions
+
/home/gcube/dataminer-pool-manager/work                                      // dinamically generated resource concerning the Ansible worker for each job
+
  
 +
Language: R
 +
Algorithm Name: RBLACKBOX
 +
Class Name: org.gcube.dataanalysis.executor.rscripts.RBlackBox
 +
Algorithm Description: RBlackBox
 +
Algorithm Category: BLACK_BOX
 +
 +
Interpreter Version: 3.2.1
 +
 +
Packages:
 +
Package Name: DBI
 +
Package Name: RPostgreSQL
 +
Package Name: raster
 +
Package Name: maptools
 +
Package Name: sqldf
 +
Package Name: RJSONIO
 +
Package Name: data.table
 
</source>
 
</source>
  
In order to allow Ansible to work on the pool of DataMiners, is necessary that the SSH key of the VM on which the service run (e.g., node2-d-d4s.d4science.org) must be deployed on the pool of Staging dataminers with ''root'' and ''gcube'' permissions.
+
 
 +
An example of ''service.properties'' file is the following:
 +
 
 +
<source lang="text">
 +
#YML node file
 +
DEV_STAGING_HOST: dataminer1-devnext.d4science.org
 +
PROTO_PROD_STAGING_HOST: dataminer-proto-ghost.d4science.org
 +
SVN_REPO: https://svn.d4science.research-infrastructures.eu/gcube/trunk/data-analysis/RConfiguration/RPackagesManagement/
 +
#HAPROXY_CSV:  http://data.d4science.org/Yk4zSFF6V3JOSytNd3JkRDlnRFpDUUR5TnRJZEw2QjRHbWJQNStIS0N6Yz0
 +
 
 +
 
 +
svn.repository = https://svn.d4science.research-infrastructures.eu/gcube
 +
 
 +
svn.algo.main.repo = /trunk/data-analysis/DataMinerConfiguration/algorithms
 +
 
 +
svn.rproto.algorithms-list = /trunk/data-analysis/DataMinerConfiguration/algorithms/proto/algorithms
 +
 
 +
svn.rproto.deps-linux-compiled =
 +
svn.rproto.deps-pre-installed =
 +
svn.rproto.deps-r-blackbox =
 +
svn.rproto.deps-r = /trunk/data-analysis/RConfiguration/RPackagesManagement/test_r_cran_pkgs.txt
 +
svn.rproto.deps-java =
 +
svn.rproto.deps-knime-workflow =
 +
svn.rproto.deps-octave =
 +
svn.rproto.deps-python =
 +
svn.rproto.deps-windows-compiled =
 +
 
 +
 
 +
svn.prod.algorithms-list = /trunk/data-analysis/DataMinerConfiguration/algorithms/prod/algorithms
 +
 
 +
svn.prod.deps-linux-compiled =
 +
svn.prod.deps-pre-installed =
 +
svn.prod.deps-r-blackbox =
 +
svn.prod.deps-r = /trunk/data-analysis/RConfiguration/RPackagesManagement/r_cran_pkgs.txt
 +
svn.prod.deps-java =
 +
svn.prod.deps-knime-workflow =
 +
svn.prod.deps-octave =
 +
svn.prod.deps-python =
 +
svn.prod.deps-windows-compiled =
 +
 
 +
 
 +
 
 +
svn.dev.algorithms-list = /trunk/data-analysis/DataMinerConfiguration/algorithms/dev/algorithms
 +
 
 +
svn.dev.deps-linux-compiled =
 +
svn.dev.deps-pre-installed =
 +
svn.dev.deps-r-blackbox =
 +
svn.dev.deps-r = /trunk/data-analysis/RConfiguration/RPackagesManagement/r_cran_pkgs.txt
 +
svn.dev.deps-java =
 +
svn.dev.deps-knime-workflow =
 +
svn.dev.deps-octave =
 +
svn.dev.deps-python =
 +
svn.dev.deps-windows-compiled =
 +
 
 +
 
 +
</source>
  
 
==Usage and APIs==
 
==Usage and APIs==
  
  
The DMPM REST Service will expose three main functionalities (one for the test phase, another one for the release phase, and a third one cross to both of them):
+
The DMPM REST Service will expose five main functionalities (three for the staging phase, and two for the release phase).
 +
The result of the execution will be monitored asynchronously by means of a REST call to a log having as parameter the ID of the operation.
 +
This can be done both at STAGING and RELEASE phases.
  
1. '''TEST PHASE''': a method returning immediately the log ID useful to monitor the execution, able to:
+
1. '''STAGING PHASE''': a method returning immediately the log ID useful to monitor the execution, able to:
** test the installation of the algorithm and its dependencies on a staging dataminer
+
** test the installation of the algorithm on a staging dataminer
** to update the SVN lists (both for dependencies and algorithms) dedicated to RPrototypingLab
+
** to update the algorithms SVN list
  
 
The parameters to consider are the following:
 
The parameters to consider are the following:
 
* the '''algorithm''' (URL to package containing the dependencies and the script to install)
 
* the '''algorithm''' (URL to package containing the dependencies and the script to install)
 +
* the '''targetVRE''' (actually the current VRE)
 
* the '''category''' to which the algorithm belong to
 
* the '''category''' to which the algorithm belong to
* the VRE '''token''' from which SAI is used (ideally RPrototypingLab)
+
* the '''algorithm_type'''  
  
An example of Rest call is the following:
+
An example of Rest call related to the Installation is the following:
  
 
<source lang="text">
 
<source lang="text">
http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-2.0.0-SNAPSHOT/api/algorithm/stage?
+
http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-2.0.0-SNAPSHOT/api/algorithm/stage?gcube-token=*****
gcube-token=708e7eb8-11a7-4e9a-816b-c9ed7e7e99fe-98187548
+
&algorithmPackageURL=http://data-d.d4science.org/TSt3cUpDTG1teUJMemxpcXplVXYzV1lBelVHTTdsYjlHbWJQNStIS0N6Yz0
&algorithmPackageURL=http://data.d4science.org/dENQTTMxdjNZcGRpK0NHd2pvU0owMFFzN0VWemw3Zy9HbWJQNStIS0N6Yz0
+
&category=BLACK_BOX
&category=ICHTHYOP_MODEL       
+
&algorithm_type=transducerers
 +
&targetVRE=/gcube/devNext/NextNext   
 
</source>
 
</source>
  
 +
An example of Rest call related to the log is the following:
 +
 +
<source lang="text">
 +
http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-2.0.0-SNAPSHOT/api/log?gcube-token=*****
 +
&logUrl=id_from_previous_call
 +
</source>
 +
 +
An example of Rest call related to the monitor of the execution is the following (actually three different status are available: COMPLETED, IN PROGRESS, FAILED):
 +
 +
<source lang="text">
 +
http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-2.0.0-SNAPSHOT/api/monitor?gcube-token=*****
 +
&logUrl=id_from_previous_first_call
 +
</source>
  
  
 
2. '''RELEASE PHASE''': a method invoked from SAI, executed after that the Test phase has successfully finished, able to:
 
2. '''RELEASE PHASE''': a method invoked from SAI, executed after that the Test phase has successfully finished, able to:
** update the SVN list of production with the dependencies extracted from the package (if new ones are present)
+
** update the SVN list of production with the new algorithms (that is the input for the CRON-job); many attributes have been extracted from the metadata file, others are generated dinamically (e.g., the VRE, the type of algorithm, the URL to package, the Timestamp related to the last modification of the package, the current environment and so on)
** update the SVN list of production with the algorithm (if new one)
+
  
Some of the parameters to consider are the following:
+
<source lang="text">
 +
| OCTAVEBLACKBOX | Giancarlo Panichi | BLACK_BOX | Dev | <notextile>./addAlgorithm.sh OCTAVEBLACKBOX BLACK_BOX org.gcube.dataanalysis.executor.rscripts.OctaveBlackBox /gcube/devNext/NextNext transducerers N http://data-d.d4science.org/TSt3cUpDTG1teUJMemxpcXplVXYzV1lBelVHTTdsYjlHbWJQNStIS0N6Yz0 "OctaveBlackBox" </notextile> | none | Fri Sep 01 16:58:47 UTC 2017 |
 +
</source>
 +
 
 +
 
 +
The parameters to consider are the following:
 
* the '''algorithm''' (URL to package containing the dependencies and the script to install)
 
* the '''algorithm''' (URL to package containing the dependencies and the script to install)
 +
* the '''targetVRE'''
 
* the '''category''' to which the algorithm belong to
 
* the '''category''' to which the algorithm belong to
* the VRE '''token''' from which SAI is used (ideally RPrototypingLab)
+
* the '''algorithm_type'''  
* The '''target VRE''' on which install the algorithm
+
* The '''token''' for the target VRE (before publishing the algorithm in the SVNRepository, the service check if the user is registered to the targetVRE)
+
  
An example of Rest call is the following:
+
An example of Rest call related to the publishing is the following:
  
 
<source lang="text">
 
<source lang="text">
http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-1.0.0-SNAPSHOT/api/algorithm/add?
+
http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-2.0.0-SNAPSHOT/api/algorithm/add?gcube-token=*****
gcube-token=708e7eb8-11a7-4e9a-816b-c9ed7e7e99fe-98187548
+
&algorithmPackageURL=http://data-d.d4science.org/TSt3cUpDTG1teUJMemxpcXplVXYzV1lBelVHTTdsYjlHbWJQNStIS0N6Yz0
&algorithmPackageURL=http://data.d4science.org/dENQTTMxdjNZcGRpK0NHd2pvU0owMFFzN0VWemw3Zy9HbWJQNStIS0N6Yz0
+
&category=BLACK_BOX
&category=ICHTHYOP_MODEL
+
&algorithm_type=transducers
&targetVREToken=3a23bfa4-4dfe-44fc-988f-194b91071dd2-843339462
+
&targetVRE=/gcube/devNext/NextNext
&targetVRE=/d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab   
+
 
</source>
 
</source>
  
 +
An example of Rest call related to the monitoring is the following:
  
 +
<source lang="text">
 +
http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-2.0.0-SNAPSHOT/api/monitor?gcube-token=*****
 +
&logUrl=id_from_previous_call
 +
</source>
  
3. The result of the execution will be monitored asynchronously by means of a REST call to a log having as parameter the ID of the operation. This can be done both at TEST and RELEASE phases.
+
==Notification==
 +
 
 +
Both for the Staging and Release phases, the user and the VRE administrators will be notified with the outcome of the execution.
 +
Some examples of notification are the following:
  
An example of Rest call is the following:
 
  
 
<source lang="text">
 
<source lang="text">
http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-2.0.0-SNAPSHOT/api/log?
+
Subject: [DataMinerGhostInstallationRequestReport] is FAILED for OCTAVEBLACKBOX algorithm
gcube-token=708e7eb8-11a7-4e9a-816b-c9ed7e7e99fe-98187548
+
 
&logUrl=426c8e35-a624-4710-b612-c90929c32c27
+
Message:
 +
 
 +
Dear Nunzio,
 +
 
 +
DataMiner sent you a message:
 +
An error occurred while deploying your algorithm
 +
 
 +
Here are the error details:
 +
 
 +
Installation failed with return code = 2
 +
 
 +
 
 +
Algorithm details:
 +
 
 +
User: Giancarlo Panichi
 +
Algorithm name: OCTAVEBLACKBOX
 +
Staging DataMiner Host: dataminer1-devnext.d4science.org
 +
Caller VRE: /gcube/devNext
 +
Target VRE: /gcube/devNext
 
</source>
 
</source>
  
==Requirements toward the SAI integration==
 
  
The user allows SAI to generate the package.
 
Each package generated by SAI '''must have''' a ''Info.txt'' metadata file having the following information specified by the user:
 
  
Algorithm Name, Author, Category, Class Name, Packages (list of dependencies)
 
  
* The dependencies in the metadata file inside the algorithm package must respect the following guidelines:  
+
<source lang="text">
** R Dependencies must have prefix '''cran:'''
+
Subject: [DataMinerGhostInstallationRequestReport] is FAILED for OCTAVEBLACKBOX algorithm
** OS Dependencies must have prefix '''os:'''
+
 
** Custom Dependencies must have prefix '''github:'''
+
Message:
Such dependencies will be stored in the correspondent SVN file without the prefix.
+
 
 +
Dear Nunzio,
 +
 
 +
DataMiner sent you a message:
 +
An error occurred while deploying your algorithm
 +
 
 +
Here are the error details:
 +
 
 +
Following dependencies are not defined:
 +
 
 +
pippo
 +
 
 +
 
 +
Algorithm details:
 +
 
 +
User: Giancarlo Panichi
 +
Algorithm name: OCTAVEBLACKBOX
 +
Staging DataMiner Host: dataminer1-devnext.d4science.org
 +
Caller VRE: /gcube/devNext
 +
Target VRE: /gcube/devNext
 +
</source>
 +
 
 +
 
 +
 
 +
 
 +
 
 +
<source lang="text">
 +
Subject: [DataMinerGhostInstallationRequestReport] is FAILED for OCTAVEBLACKBOX algorithm
 +
 
 +
Message:
 +
 
 +
Dear Nunzio,
 +
 
 +
DataMiner sent you a message:
 +
An error occurred while deploying your algorithm
 +
 
 +
Here are the error details:
 +
 
 +
Installation completed but DataMiner Interface not working correctly or files OCTAVEBLACKBOX.jar and OCTAVEBLACKBOX_interface.jar not availables at the expected path
 +
 
 +
 
 +
Algorithm details:
 +
 
 +
User: Giancarlo Panichi
 +
Algorithm name: OCTAVEBLACKBOX
 +
Staging DataMiner Host: dataminer1-devnext.d4science.org
 +
Caller VRE: /gcube/devNext
 +
Target VRE: /gcube/devNext
 +
</source>
 +
 
 +
 
 +
 
 +
<source lang="text">
 +
Subject: [DataMinerGhostInstallationRequestReport] is SUCCESS for SAI_INHERITANCE algorithm
 +
 
 +
Message:  
 +
Dear Nunzio,
 +
 
 +
DataMiner sent you a message:
 +
The installation of the algorithm in the ghost dataminer is completed successfully.
 +
 
 +
You can retrieve experiment results under the '/DataMiner' e-Infrastructure Workspace folder or from the DataMiner interface.
 +
 
 +
 
 +
Algorithm details:
 +
 
 +
User: Gianpaolo Coro
 +
Algorithm name: SAI_INHERITANCE
 +
Staging DataMiner Host: dataminer1-devnext.d4science.org
 +
Caller VRE: /gcube/devNext/NextNext
 +
Target VRE: /gcube/devNext/NextNext
 +
 
 +
- This message was also sent to:
 +
 
 +
    Lucio Lelii
 +
    Gianpaolo Coro
 +
    Giancarlo Panichi
 +
    Paolo Scarponi
 +
    Gianpaolo Coro
 +
</source>
 +
 
 +
 
 +
<source lang="text">
 +
Subject: [DataMinerReleaseInstallationRequestReport] is SUCCESS for OCTAVEBLACKBOX algorithm
 +
 
 +
Message:
 +
Dear Nunzio,
 +
 
 +
DataMiner sent you a message:
 +
SVN REPOSITORY CORRECTLY UPDATED.
 +
 
 +
  The CRON job will install the algorithm in the target VRE 
 +
 
 +
 
 +
 
 +
 
 +
Algorithm details:
 +
 
 +
User: Giancarlo Panichi
 +
Algorithm name: OCTAVEBLACKBOX
 +
Caller VRE: /gcube/devNext/NextNext
 +
Target VRE: /gcube/devNext/NextNext
 +
 
 +
- This message was also sent to:
 +
 
 +
    Nunzio Andrea Galante
 +
    Lucio Lelii
 +
    Gianpaolo Coro
 +
    Giancarlo Panichi
 +
    Paolo Scarponi
 +
</source>
 +
 
 +
 
  
Three buttons will be available in the new SAI interface in order to allow the interaction among SAI and the three methods exposed by the Service.
 
  
* On the host where the Service is deployed, must be possible to execute the ansible-playbook command, in order to allow the installation of the dependencies on the staging dataminer, and to install the algorithm on the target VRE
+
==DataMinerPoolManager Portlet==
  
* At least for the staging dataminers used in the test phase, the application must have SSH root access
+
Please refer to https://next.d4science.org/group/nextnext/dataminerdeployer for a graphical representation of the service.

Latest revision as of 16:33, 5 September 2017

DataMiner Pool Manager

DataMiner Pool Manager service, aka DMPM, is a REST service able to rationalize and automatize the current process for publishing SAI algorithms on DataMiner nodes and keep DataMiner cluster updated.

Maven coordinates

The second version of the the service has been released in gCube 4.6.1. The maven artifact coordinates are:

<dependency>
   <groupId>org.gcube.dataanalysis</groupId>
   <artifactId>dataminer-pool-manager</artifactId>
   <version>2.0.0-SNAPSHOT</version> 
   <packaging>war</packaging>
</dependency>

Overview

The service may accept an algorithm descriptor, including its dependencies, generates (via templating) ansible playbook, inventory and roles for the relevant stuff (algorithm installer, algorithms, dependencies), executes ansible playbook on a Staging DataMiner, and finally udpdates the lists of dependendencies and algorithms that will be used from a Cron-job for the installation.

In such sense, the service accepts as input, the url of an algorithm package (including jar, and metadata), extracts the information needed to installation, installs the script, updates the list of dependencies, publishes the new algorithm in the Information System and returns asynchronously the execution outcome to the caller.

Architecture

The following main entities will be involved in the process of integration between SAI and the production environment:

  • SAI: such component allows the user to upload the Package related to the algorithm to deploy and to decide on which VRE
  • Dataminer Pool Manager: a Smartgears REST service in charge of managing the installation of algorithms on the infrastructure dataminers
  • The Staging DataMiner: a particular dataminer machine, usable only by the Dataminer Pool Manager, used to test the installation of an algorithm and to its dependencies. Two different dataminers in the d4science infrastructure are staging-oriented (such information can be set by the user inside the configuration file):
    • dataminer1-devnext.d4science.org for the development environment
    • dataminer-proto-ghost.d4science.org for the production environment
  • SVN Dependencies Lists: lists (in files on SVN) of dependencies that must be installed on Dataminer machines. There is one list for type of dependency both for Dev, RProto and Production.
  • SVN Algorithms List: lists (in files on SVN) of algorithms that must be installed on Dataminer machines. The service uses three different lists, one for the Dev environment, one for RProto and another one for the production.
  • The Cron job: runs on every Dataminer and periodically (every minute) aligns the packages and the algorithms installed on the machine with the SVN Dependencies List and the SVN Algorithms Lists. Concerning the Algorithms, The Cron Job should have to be configured to run the command line available as record of SVN list, while as far as the Dependencies concerns, the Cron Job should have to be configured in order to just read from all the set of dependencies lists. The lists to consider are the following:
    • Production Algorithms:
   http://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/algorithms/prod/algorithms
    • RProto Algorithms:
   http://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/algorithms/proto/algorithms
    • Dev Algorithms:
  http://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/algorithms/dev/algorithms

Process (From SAI to Production VRE)

Until now, SAI was deployed in several scopes and the user may deploy the algorithm just in the actual VRE. The idea is to have different instances of SAI in many VREs and allow the user to specify the VRE to consider for the deploy.

The process is composed of two main phases:

  • STAGING Phase: the installation of an algorithm in the staging dataminer; it ends with the publishing of an algorithm in the pool of dataminers of the target VRE
    • The DMPM contacts the Staging Dataminer and installs the algorithm and the dependencies
    • The output is retrieved. If there are errors in the installation (e.g. a dependency that does not exist or is written not correctly) it stops and the log is returned to the user (a mail notification is sent to the user and to the VRE adminstrators).
    • The DMPM updates the SVN Algorithms list
    • A mail notification is sent to the user and to the VRE administrators
    • Cron read the SVN lists (both Dependencies and Algorithms) and installs the algorithm in the pool of dataminers for the current VRE.
    • The script publishes the new algorithm in the Information System.

Staging.png

  • RELEASE Phase
    • SAI will invoke the service working in RELEASE PHASE in order to install the algorithm in a particular VRE of production (provided by the user); SAI will pass to the DMPM the target VRE.
    • The DMPM updates the SVN Production Algorithms list
    • Cron installs the algorithm in the production dataminers
    • A mail notification is sent to the user and to the VRE administrators
    • The script publishes the algorithm in the VRE

Release.png

Configuration and Testing

DMPM is a SmartGears compliant service.

/home/gcube/tomcat/webapps/dataminer-pool-manager-2.0.0-SNAPSHOT

In such sense, an instance has been deployed and configured at Development, Preprod and Prod levels.

http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-2.0.0-SNAPSHOT/rest/

In order to use the service, two manual configuration are needed:

    • to modify the parameter <param-value>Dev</param-value> in /home/gcube/tomcat/webapps/dataminer-pool-manager/WEB-INF/web.xml file according to scope where the service runs (Dev, RProto or Prod); such information will be read dinamically from the service for the switching among the list algorithms and dependencies to consider, of for the selection of the staging dataminer.
    • to edit the file /home/gcube/dataminer-pool-manager/dpmConfig/service.properties. Such file contains among the others, the staging dataminer to consider (automatically selected based on the environment at the previous point), the SVN repositories for the algorithms of each environment and for all the typologies of dependencies generated from SAI and available in the metadata file. (e.g., the service carries out his checks on the correctness of the name of a dependency by going to read in the correspondent file according to the language defined in the info.txt file available in the algorithm package).

An example of info.txt file is the following:

Username: giancarlo.panichi
Full Name: Giancarlo Panichi
Email: g.panichi@isti.cnr.it
 
Language: R
Algorithm Name: RBLACKBOX
Class Name: org.gcube.dataanalysis.executor.rscripts.RBlackBox
Algorithm Description: RBlackBox
Algorithm Category: BLACK_BOX
 
Interpreter Version: 3.2.1
 
Packages:
Package Name: DBI
Package Name: RPostgreSQL
Package Name: raster
Package Name: maptools
Package Name: sqldf
Package Name: RJSONIO
Package Name: data.table


An example of service.properties file is the following:

#YML node file
DEV_STAGING_HOST: dataminer1-devnext.d4science.org 
PROTO_PROD_STAGING_HOST: dataminer-proto-ghost.d4science.org
SVN_REPO: https://svn.d4science.research-infrastructures.eu/gcube/trunk/data-analysis/RConfiguration/RPackagesManagement/
#HAPROXY_CSV:  http://data.d4science.org/Yk4zSFF6V3JOSytNd3JkRDlnRFpDUUR5TnRJZEw2QjRHbWJQNStIS0N6Yz0
 
 
svn.repository = https://svn.d4science.research-infrastructures.eu/gcube
 
svn.algo.main.repo = /trunk/data-analysis/DataMinerConfiguration/algorithms
 
svn.rproto.algorithms-list = /trunk/data-analysis/DataMinerConfiguration/algorithms/proto/algorithms
 
svn.rproto.deps-linux-compiled = 
svn.rproto.deps-pre-installed = 
svn.rproto.deps-r-blackbox = 
svn.rproto.deps-r = /trunk/data-analysis/RConfiguration/RPackagesManagement/test_r_cran_pkgs.txt
svn.rproto.deps-java =
svn.rproto.deps-knime-workflow = 
svn.rproto.deps-octave = 
svn.rproto.deps-python = 
svn.rproto.deps-windows-compiled = 
 
 
svn.prod.algorithms-list = /trunk/data-analysis/DataMinerConfiguration/algorithms/prod/algorithms
 
svn.prod.deps-linux-compiled = 
svn.prod.deps-pre-installed = 
svn.prod.deps-r-blackbox = 
svn.prod.deps-r = /trunk/data-analysis/RConfiguration/RPackagesManagement/r_cran_pkgs.txt
svn.prod.deps-java = 
svn.prod.deps-knime-workflow =
svn.prod.deps-octave = 
svn.prod.deps-python = 
svn.prod.deps-windows-compiled = 
 
 
 
svn.dev.algorithms-list = /trunk/data-analysis/DataMinerConfiguration/algorithms/dev/algorithms
 
svn.dev.deps-linux-compiled = 
svn.dev.deps-pre-installed = 
svn.dev.deps-r-blackbox = 
svn.dev.deps-r = /trunk/data-analysis/RConfiguration/RPackagesManagement/r_cran_pkgs.txt
svn.dev.deps-java = 
svn.dev.deps-knime-workflow =
svn.dev.deps-octave = 
svn.dev.deps-python = 
svn.dev.deps-windows-compiled =

Usage and APIs

The DMPM REST Service will expose five main functionalities (three for the staging phase, and two for the release phase). The result of the execution will be monitored asynchronously by means of a REST call to a log having as parameter the ID of the operation. This can be done both at STAGING and RELEASE phases.

1. STAGING PHASE: a method returning immediately the log ID useful to monitor the execution, able to:

    • test the installation of the algorithm on a staging dataminer
    • to update the algorithms SVN list

The parameters to consider are the following:

  • the algorithm (URL to package containing the dependencies and the script to install)
  • the targetVRE (actually the current VRE)
  • the category to which the algorithm belong to
  • the algorithm_type

An example of Rest call related to the Installation is the following:

http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-2.0.0-SNAPSHOT/api/algorithm/stage?gcube-token=*****
&algorithmPackageURL=http://data-d.d4science.org/TSt3cUpDTG1teUJMemxpcXplVXYzV1lBelVHTTdsYjlHbWJQNStIS0N6Yz0
&category=BLACK_BOX
&algorithm_type=transducerers
&targetVRE=/gcube/devNext/NextNext

An example of Rest call related to the log is the following:

http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-2.0.0-SNAPSHOT/api/log?gcube-token=*****
&logUrl=id_from_previous_call

An example of Rest call related to the monitor of the execution is the following (actually three different status are available: COMPLETED, IN PROGRESS, FAILED):

http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-2.0.0-SNAPSHOT/api/monitor?gcube-token=*****
&logUrl=id_from_previous_first_call


2. RELEASE PHASE: a method invoked from SAI, executed after that the Test phase has successfully finished, able to:

    • update the SVN list of production with the new algorithms (that is the input for the CRON-job); many attributes have been extracted from the metadata file, others are generated dinamically (e.g., the VRE, the type of algorithm, the URL to package, the Timestamp related to the last modification of the package, the current environment and so on)
| OCTAVEBLACKBOX | Giancarlo Panichi | BLACK_BOX | Dev | <notextile>./addAlgorithm.sh OCTAVEBLACKBOX BLACK_BOX org.gcube.dataanalysis.executor.rscripts.OctaveBlackBox /gcube/devNext/NextNext transducerers N http://data-d.d4science.org/TSt3cUpDTG1teUJMemxpcXplVXYzV1lBelVHTTdsYjlHbWJQNStIS0N6Yz0 "OctaveBlackBox" </notextile> | none | Fri Sep 01 16:58:47 UTC 2017 |


The parameters to consider are the following:

  • the algorithm (URL to package containing the dependencies and the script to install)
  • the targetVRE
  • the category to which the algorithm belong to
  • the algorithm_type

An example of Rest call related to the publishing is the following:

http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-2.0.0-SNAPSHOT/api/algorithm/add?gcube-token=*****
&algorithmPackageURL=http://data-d.d4science.org/TSt3cUpDTG1teUJMemxpcXplVXYzV1lBelVHTTdsYjlHbWJQNStIS0N6Yz0
&category=BLACK_BOX
&algorithm_type=transducers
&targetVRE=/gcube/devNext/NextNext

An example of Rest call related to the monitoring is the following:

http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-2.0.0-SNAPSHOT/api/monitor?gcube-token=*****
&logUrl=id_from_previous_call

Notification

Both for the Staging and Release phases, the user and the VRE administrators will be notified with the outcome of the execution. Some examples of notification are the following:


Subject: [DataMinerGhostInstallationRequestReport] is FAILED for OCTAVEBLACKBOX algorithm
 
Message: 
 
Dear Nunzio,
 
DataMiner sent you a message:
An error occurred while deploying your algorithm
 
Here are the error details:
 
Installation failed with return code = 2
 
 
Algorithm details:
 
User: Giancarlo Panichi
Algorithm name: OCTAVEBLACKBOX
Staging DataMiner Host: dataminer1-devnext.d4science.org
Caller VRE: /gcube/devNext
Target VRE: /gcube/devNext



Subject: [DataMinerGhostInstallationRequestReport] is FAILED for OCTAVEBLACKBOX algorithm
 
Message: 
 
Dear Nunzio,
 
DataMiner sent you a message:
An error occurred while deploying your algorithm
 
Here are the error details:
 
Following dependencies are not defined:
 
pippo
 
 
Algorithm details:
 
User: Giancarlo Panichi
Algorithm name: OCTAVEBLACKBOX
Staging DataMiner Host: dataminer1-devnext.d4science.org
Caller VRE: /gcube/devNext
Target VRE: /gcube/devNext



Subject: [DataMinerGhostInstallationRequestReport] is FAILED for OCTAVEBLACKBOX algorithm
 
Message: 
 
Dear Nunzio,
 
DataMiner sent you a message:
An error occurred while deploying your algorithm
 
Here are the error details:
 
Installation completed but DataMiner Interface not working correctly or files OCTAVEBLACKBOX.jar and OCTAVEBLACKBOX_interface.jar not availables at the expected path
 
 
Algorithm details:
 
User: Giancarlo Panichi
Algorithm name: OCTAVEBLACKBOX
Staging DataMiner Host: dataminer1-devnext.d4science.org
Caller VRE: /gcube/devNext
Target VRE: /gcube/devNext


Subject: [DataMinerGhostInstallationRequestReport] is SUCCESS for SAI_INHERITANCE algorithm
 
Message: 
Dear Nunzio,
 
DataMiner sent you a message:
The installation of the algorithm in the ghost dataminer is completed successfully.
 
You can retrieve experiment results under the '/DataMiner' e-Infrastructure Workspace folder or from the DataMiner interface.
 
 
Algorithm details:
 
User: Gianpaolo Coro
Algorithm name: SAI_INHERITANCE
Staging DataMiner Host: dataminer1-devnext.d4science.org
Caller VRE: /gcube/devNext/NextNext
Target VRE: /gcube/devNext/NextNext
 
- This message was also sent to:
 
    Lucio Lelii
    Gianpaolo Coro
    Giancarlo Panichi
    Paolo Scarponi
    Gianpaolo Coro


Subject: [DataMinerReleaseInstallationRequestReport] is SUCCESS for OCTAVEBLACKBOX algorithm
 
Message: 
Dear Nunzio,
 
DataMiner sent you a message:
SVN REPOSITORY CORRECTLY UPDATED.
 
  The CRON job will install the algorithm in the target VRE  
 
 
 
 
Algorithm details:
 
User: Giancarlo Panichi
Algorithm name: OCTAVEBLACKBOX
Caller VRE: /gcube/devNext/NextNext
Target VRE: /gcube/devNext/NextNext
 
- This message was also sent to:
 
    Nunzio Andrea Galante
    Lucio Lelii
    Gianpaolo Coro
    Giancarlo Panichi
    Paolo Scarponi



DataMinerPoolManager Portlet

Please refer to https://next.d4science.org/group/nextnext/dataminerdeployer for a graphical representation of the service.