Difference between revisions of "DataMiner Installation"

From Gcube Wiki
Jump to: navigation, search
(Installation of DataMiner)
Line 207: Line 207:
  
 
   https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/algorithms/dataminer-algorithms.jar
 
   https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/algorithms/dataminer-algorithms.jar
 +
 +
Optional: copy the algorithms JARs files (containing additional algorithms) in the following folder to the WEB-INF/lib/ folder
 +
 +
  https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/algorithms/
  
 
8 - copy the PARALLEL_PROCESSING folder at this SVN link into the ecocfg folder (thus creating the PARALLEL_PROCESSING folder under ecocfg):
 
8 - copy the PARALLEL_PROCESSING folder at this SVN link into the ecocfg folder (thus creating the PARALLEL_PROCESSING folder under ecocfg):

Revision as of 14:07, 1 September 2016

Introduction

DataMiner is an e-Infrastructure service providing state-of-the art DataMining algorithms and ecological modelling approaches under the Web Processing Service (WPS) standard.

In this guide, we show how administrators and site-managers can install DataMiner on top of SmartGears service installations.

Prerequisites

See the SmartGears Web Hosting Node (wHN) Prerequisites

In order to manage a request load of 20,000 computations per month with a maximum allowed concurrency of 10 requests we recommend the following machine hardware:

  • Ubuntu 12.04.5 LTS
  • 6 GB of RAM
  • 10 virtual CPUs, e.g. Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
  • 10 GB of HD space

Prerequisite Software

Further software is used by some algorithms and is required to be present on the server machine.

At this link a set of packages to be installed offline (see instructions below) is downloadable, plus one test script to check if many of the packages work.

An installation of the R interpreter is required on the machine. The required R version is 2.15.3

In the following, we report the non-R software, the packages, and the sequences of commands needed to install the offline packages.

The sequence is currently reported to take local installation packages from a /root/download/ folder.


  • Required software:
GDAL:
 wget http://download.osgeo.org/gdal/1.11.0/gdal1110.zip
 unzip gdal1110.zip -d gdal1110
 ./configure --with-gdal-config=/usr/local/bin/gdal-config
JAGS:
 http://mcmc-jags.sourceforge.net/
  • R Packages to install from CRAN
 Installation commands:
 R
 >install.packages("<package name>")
 Packages:
 data.table
 doBy
 multcomp
 mvtnorm
 survival
 splines
 TH.data
 MASS
 Matrix
 lattice
 abind 
 bayesmix 
 coda 
 R2jags 
 R2WinBUGS 
 rjags 
 runjags 
 abind 
 coda 
 maptools 
 rjags 
 sp 
 base 
 boot 
 class 
 cluster 
 codetools 
 compiler 
 datasets
 foreign 
 graphics
 grDevices
 grid 
 KernSmooth 
 lattice 
 MASS 
 Matrix 
 methods 
 mgcv 
 nlme 
 nnet 
 parallel 
 rpart 
 spatial
 splines
 stats
 stats4 
 survival
 tcltk 
 tools 
 utils
  • Packages for ICES processes - Install sequence
 install.packages("/root/download/Rcpp_0.9.10.tar.gz", repos = NULL, type="source")
 install.packages("/root/download/plyr_1.8.tar.gz", repos = NULL, type="source")
 install.packages("stringr")
 install.packages("/root/download/reshape2_1.2.tar.gz", repos = NULL, type="source")
 install.packages("/root/download/data.table_1.9.2.tar.gz", repos = NULL, type="source")
 install.packages("R2HTML")
 install.packages("multcomp")
 install.packages("Matrix")
 install.packages("lattice")
 install.packages("snow")
 install.packages("/root/download/RcppEigen_0.3.2.0.tar.gz", repos = NULL, type="source")
 install.packages("minqa")
 install.packages("/root/download/lme4_1.0-5.tar.gz", repos = NULL, type="source")
 install.packages("/root/download/doBy_4.5-3.tar.gz", repos = NULL, type="source")
 install.packages("mvtnorm")
 install.packages("survival")
 install.packages("data.table")
 help(package=splines)
 help(package=TH.data)
 help(package=MASS)
  • Packages for FAO processes - Install sequence
 install.packages("RCurl")
 install.packages("digest")
 install.packages("/root/download/httr_0.2.tar.gz", repos = NULL, type="source")
 install.packages("memoise")
 install.packages("whisker")
 install.packages("evaluate")
 install.packages("/root/download/devtools_1.4.1.tar.gz", repos = NULL, type="source")
 require(devtools)
 install_github("rsdmx", "opensdmx")
 apt-get update
 apt-get install libgdal1-dev
 apt-get install libgeos-dev
 apt-get install libspatialite-dev
 install.packages("rgdal")
 install.packages("rgeos")
 require(devtools)
 install_github("RFigisGeo", "openfigis")
  • Test script
 source("/root/download/interpolateTacsat.r")

Installation of DataMiner

Passages required to build a fully working development-environment DataMiner installation from scratch:

1 - Install a SmartGears-enabled tomcat service, possibly on the 80 port or with a redirect to the 80 port. Use devsec as starting scope.

2 - Download the 52 WAR application from the following link and put it under webapps:


 http://data.d4science.org/uri-resolver/id?fileName=wps-3.3.2.war&smp-id=565d67b7e4b0eacf4a0fc5ad&contentType=application%2Fx-tika-java-web-archive

Download the following xml files and copy them into the web application WEB-INF folder, to make the application be enabled on a gCube container:

 http://data.d4science.org/id?fileName=web.xml&smp-id=56615ae0e4b0158fcb561817&contentType=application%2Fxml
 http://data.d4science.org/id?fileName=gcube-app.xml&smp-id=56615ae0e4b0158fcb561815&contentType=application%2Fxml


3 - Substitute the 52n-wps-server-3.3.2-X.jar and 52n-wps-algorithm-3.3.2-X.jar with the corresponding jars on our Maven gcube-externals repository:


 (Repository: "gCube Externals")
 <dependency>
   <groupId>rapidminer-custom</groupId>
   <artifactId>52n-wps-server-d4science</artifactId>
   <version>3.3.2</version>
 </dependency>

(Otherwise available here)

 <dependency>
   <groupId>rapidminer-custom</groupId>
   <artifactId>52n-wps-algorithm-d4science</artifactId>
   <version>3.3.2</version>
 </dependency>

(Otherwise available here)

4 - add the following maven library along with its dependencies to the wps/WEB-INF/lib/ folder of the wps application:


 (Repository: "gCube Snapshots")
 <dependency>
   <groupId>org.gcube.dataanalysis</groupId>
   <artifactId>dataminer</artifactId>
   <version>[1.0.0-SNAPSHOT,2.0.0-SNAPSHOT)</version>
 </dependency>


5 - create a folder named "persistence" under wsp/

6 - create a folder named "ecocfg" under wsp/

NOTE: If also vessels data analysis is required, download the file following file gebco under the "ecocfg" folder and rename it as "gebco_08.nc"

7 - copy all the files available at this SVN link into the ecocfg folder:


 https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/cfg

copy the algorithms JAR file containing the files configuration to the WEB-INF/lib/ folder

 https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/algorithms/dataminer-algorithms.jar

Optional: copy the algorithms JARs files (containing additional algorithms) in the following folder to the WEB-INF/lib/ folder

 https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/algorithms/

8 - copy the PARALLEL_PROCESSING folder at this SVN link into the ecocfg folder (thus creating the PARALLEL_PROCESSING folder under ecocfg):


 https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/PARALLEL_PROCESSING


9 - copy the following xml file into the wps/config folder:


 https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/wpscfg/wps_config.xml


10 - substitute the hostname and the port inside the following tag of the previous xml file, with the correct indication of the hostname of the machine and of the port (80) of the tomcat:


 <Server protocol="http" hostname="localhost" hostport="8080" includeDataInputsInResponse="false" computationTimeoutMilliSeconds="3600000" cacheCapabilites="false" webappPath="wps" repoReloadInterval="0.0" minPoolSize="10" maxPoolSize="20" keepAliveSeconds="1000" maxQueuedTasks="100">


11 - a reference example of configured and working wps application can be found at this link:

 http://goo.gl/rtbHpW

12 - Testing

Test 1: algorithm descriptions (tests the basic availability of the service):


 http://<hostname>:<port>/wps/WebProcessingService?Request=DescribeProcess&Service=WPS&Version=1.0.0&gcube-token=<token>&Identifier=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.BIONYM_LOCAL


Test 2: algorithm execution (tests a complete algorithm execution):


 http://<hostname>:<port>/wps/WebProcessingService?request=Execute&service=WPS&Version=1.0.0&gcube-token=<token>&Identifier=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.BIONYM_LOCAL&DataInputs=Matcher_1=LEVENSHTEIN;Matcher_4=NONE;Matcher_5=NONE;Matcher_2=NONE;Matcher_3=NONE;Threshold_1=0.6;Threshold_2=0.6;Accuracy_vs_Speed=MAX_ACCURACY;MaxResults_2=10;MaxResults_1=10;Threshold_3=0.4;Taxa_Authority_File=FISHBASE;Parser_Name=SIMPLE;MaxResults_4=0;Threshold_4=0;MaxResults_3=0;MaxResults_5=0;Threshold_5=0;Use_Stemmed_Genus_and_Species=false;Activate_Preparsing_Processing=true;SpeciesAuthorName=Gadus morhua

Related Links

Related Experiments