DataMiner Installation

Introduction

DataMiner is an e-Infrastructure service providing state-of-the art DataMining algorithms and ecological modelling approaches under the Web Processing Service (WPS) standard.

In this guide, we show how administrators and site-managers can install DataMiner on top of SmartGears service installations.

Prerequisites

See the SmartGears Web Hosting Node (wHN) Prerequisites

In order to manage a request load of 20,000 computations per month with a maximum allowed concurrency of 10 requests we recommend the following machine hardware:

Ubuntu 12.04.5 LTS
6 GB of RAM
10 virtual CPUs, e.g. Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
10 GB of HD space

Prerequisite Software

Further software is used by some algorithms and is required to be present on the server machine.

At this link a set of packages to be installed offline (see instructions below) is downloadable, plus one test script to check if many of the packages work.

An installation of the R interpreter is required on the machine. The required R version is 2.15.3

In the following, we report the non-R software, the packages, and the sequences of commands needed to install the offline packages.

The sequence is currently reported to take local installation packages from a /root/download/ folder.

Required software:

GDAL:
 wget http://download.osgeo.org/gdal/1.11.0/gdal1110.zip
 unzip gdal1110.zip -d gdal1110
 ./configure --with-gdal-config=/usr/local/bin/gdal-config
JAGS:
 http://mcmc-jags.sourceforge.net/

R Packages to install from CRAN

 Installation commands:
 R
 >install.packages("<package name>")

 Packages:
 data.table
 doBy
 multcomp
 mvtnorm
 survival
 splines
 TH.data
 MASS
 Matrix
 lattice
 abind 
 bayesmix 
 coda 
 R2jags 
 R2WinBUGS 
 rjags 
 runjags 
 abind 
 coda 
 maptools 
 rjags 
 sp 
 base 
 boot 
 class 
 cluster 
 codetools 
 compiler 
 datasets
 foreign 
 graphics
 grDevices
 grid 
 KernSmooth 
 lattice 
 MASS 
 Matrix 
 methods 
 mgcv 
 nlme 
 nnet 
 parallel 
 rpart 
 spatial
 splines
 stats
 stats4 
 survival
 tcltk 
 tools 
 utils

Packages for ICES processes - Install sequence

 install.packages("/root/download/Rcpp_0.9.10.tar.gz", repos = NULL, type="source")
 install.packages("/root/download/plyr_1.8.tar.gz", repos = NULL, type="source")
 install.packages("stringr")
 install.packages("/root/download/reshape2_1.2.tar.gz", repos = NULL, type="source")
 install.packages("/root/download/data.table_1.9.2.tar.gz", repos = NULL, type="source")
 install.packages("R2HTML")
 install.packages("multcomp")
 install.packages("Matrix")
 install.packages("lattice")
 install.packages("snow")
 install.packages("/root/download/RcppEigen_0.3.2.0.tar.gz", repos = NULL, type="source")
 install.packages("minqa")
 install.packages("/root/download/lme4_1.0-5.tar.gz", repos = NULL, type="source")
 install.packages("/root/download/doBy_4.5-3.tar.gz", repos = NULL, type="source")
 install.packages("mvtnorm")
 install.packages("survival")
 install.packages("data.table")
 help(package=splines)
 help(package=TH.data)
 help(package=MASS)

Packages for FAO processes - Install sequence

 install.packages("RCurl")
 install.packages("digest")
 install.packages("/root/download/httr_0.2.tar.gz", repos = NULL, type="source")
 install.packages("memoise")
 install.packages("whisker")
 install.packages("evaluate")
 install.packages("/root/download/devtools_1.4.1.tar.gz", repos = NULL, type="source")
 require(devtools)
 install_github("rsdmx", "opensdmx")
 apt-get update
 apt-get install libgdal1-dev
 apt-get install libgeos-dev
 apt-get install libspatialite-dev
 install.packages("rgdal")
 install.packages("rgeos")
 require(devtools)
 install_github("RFigisGeo", "openfigis")

Test script

 source("/root/download/interpolateTacsat.r")

Installation of DataMiner

Passages required to build a fully working development-environment DataMiner installation from scratch:

1 - Install a SmartGears-enabled tomcat service, possibly on the 80 port or with a redirect to the 80 port. Use devsec as starting scope.

2 - Download the 52 WAR application from the following link and put it under webapps:

 http://data.d4science.org/uri-resolver/id?fileName=wps-3.3.2.war&smp-id=565d67b7e4b0eacf4a0fc5ad&contentType=application%2Fx-tika-java-web-archive

Download the following xml files and copy them into the web application WEB-INF folder, to make the application be enabled on a gCube container:

 http://data.d4science.org/id?fileName=web.xml&smp-id=56615ae0e4b0158fcb561817&contentType=application%2Fxml

 http://data.d4science.org/id?fileName=gcube-app.xml&smp-id=56615ae0e4b0158fcb561815&contentType=application%2Fxml

3 - Substitute the 52n-wps-server-3.3.2-X.jar and 52n-wps-algorithm-3.3.2-X.jar with the corresponding jars on our Maven gcube-externals repository:

 (Repository: "gCube Externals")
 <dependency>
   <groupId>rapidminer-custom</groupId>
   <artifactId>52n-wps-server-d4science</artifactId>
   <version>3.3.2</version>
 </dependency>

(Otherwise available here)

 <dependency>
   <groupId>rapidminer-custom</groupId>
   <artifactId>52n-wps-algorithm-d4science</artifactId>
   <version>3.3.2</version>
 </dependency>

(Otherwise available here)

4 - add the following maven library along with its dependencies to the wps/WEB-INF/lib/ folder of the wps application:

 (Repository: "gCube Snapshots")
 <dependency>
   <groupId>org.gcube.dataanalysis</groupId>
   <artifactId>dataminer</artifactId>
   <version>[1.0.0-SNAPSHOT,2.0.0-SNAPSHOT)</version>
 </dependency>

5 - create a folder named "persistence" under wsp/

6 - create a folder named "ecocfg" under wsp/

NOTE: If also vessels data analysis is required, download the file following file gebco under the "ecocfg" folder and rename it as "gebco_08.nc"

7 - copy all the files available at this SVN link into the ecocfg folder:

 https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/cfg

copy the algorithms JAR file containing the files configuration to the WEB-INF/lib/ folder

 https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/algorithms/dataminer-algorithms.jar

8 - copy the PARALLEL_PROCESSING folder at this SVN link into the ecocfg folder (thus creating the PARALLEL_PROCESSING folder under ecocfg):

 https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/PARALLEL_PROCESSING

9 - copy the following xml file into the wps/config folder:

 https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/wpscfg/wps_config.xml

10 - substitute the hostname and the port inside the following tag of the previous xml file, with the correct indication of the hostname of the machine and of the port (80) of the tomcat:

 <Server protocol="http" hostname="localhost" hostport="8080" includeDataInputsInResponse="false" computationTimeoutMilliSeconds="3600000" cacheCapabilites="false" webappPath="wps" repoReloadInterval="0.0" minPoolSize="10" maxPoolSize="20" keepAliveSeconds="1000" maxQueuedTasks="100">

11 - a reference example of configured and working wps application can be found at this link:

 http://goo.gl/rtbHpW

12 - Testing

Test 1: algorithm descriptions (tests the basic availability of the service):

 http://<hostname>:<port>/wps/WebProcessingService?Request=DescribeProcess&Service=WPS&Version=1.0.0&gcube-token=<token>&Identifier=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.BIONYM_LOCAL

Test 2: algorithm execution (tests a complete algorithm execution):

 http://<hostname>:<port>/wps/WebProcessingService?request=Execute&service=WPS&Version=1.0.0&gcube-token=<token>&Identifier=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.BIONYM_LOCAL&DataInputs=Matcher_1=LEVENSHTEIN;Matcher_4=NONE;Matcher_5=NONE;Matcher_2=NONE;Matcher_3=NONE;Threshold_1=0.6;Threshold_2=0.6;Accuracy_vs_Speed=MAX_ACCURACY;MaxResults_2=10;MaxResults_1=10;Threshold_3=0.4;Taxa_Authority_File=FISHBASE;Parser_Name=SIMPLE;MaxResults_4=0;Threshold_4=0;MaxResults_3=0;MaxResults_5=0;Threshold_5=0;Use_Stemmed_Genus_and_Species=false;Activate_Preparsing_Processing=true;SpeciesAuthorName=Gadus morhua