DataMiner Installation
Contents
Introduction
DataMiner is an e-Infrastructure service providing state-of-the art DataMining algorithms and ecological modelling approaches under the Web Processing Service (WPS) standard.
In this guide, we show how administrators and site-managers can install DataMiner on top of SmartGears service installations.
Prerequisites
See the SmartGears Web Hosting Node (wHN) Prerequisites
In order to manage a request load of 20,000 computations per month with a maximum allowed concurrency of 10 requests we recommend the following machine hardware:
- Ubuntu 12.04.5 LTS
- 6 GB of RAM
- 10 virtual CPUs, e.g. Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
- 10 GB of HD space
Prerequisite Software
Further software is used by some algorithms and is required to be present on the server machine.
At this link a set of packages to be installed offline (see instructions below) is downloadable, plus one test script to check if many of the packages work.
An installation of the R interpreter is required on the machine. The required R version is 2.15.3
In the following, we report the non-R software, the packages, and the sequences of commands needed to install the offline packages.
The sequence is currently reported to take local installation packages from a /root/download/ folder.
- Required software:
GDAL: wget http://download.osgeo.org/gdal/1.11.0/gdal1110.zip unzip gdal1110.zip -d gdal1110 ./configure --with-gdal-config=/usr/local/bin/gdal-config JAGS: http://mcmc-jags.sourceforge.net/
- R Packages to install from CRAN
Installation commands: R >install.packages("<package name>")
Packages: data.table doBy multcomp mvtnorm survival splines TH.data MASS Matrix lattice abind bayesmix coda R2jags R2WinBUGS rjags runjags abind coda maptools rjags sp base boot class cluster codetools compiler datasets foreign graphics grDevices grid KernSmooth lattice MASS Matrix methods mgcv nlme nnet parallel rpart spatial splines stats stats4 survival tcltk tools utils
- Packages for ICES processes - Install sequence
install.packages("/root/download/Rcpp_0.9.10.tar.gz", repos = NULL, type="source") install.packages("/root/download/plyr_1.8.tar.gz", repos = NULL, type="source") install.packages("stringr") install.packages("/root/download/reshape2_1.2.tar.gz", repos = NULL, type="source") install.packages("/root/download/data.table_1.9.2.tar.gz", repos = NULL, type="source") install.packages("R2HTML") install.packages("multcomp") install.packages("Matrix") install.packages("lattice") install.packages("snow") install.packages("/root/download/RcppEigen_0.3.2.0.tar.gz", repos = NULL, type="source") install.packages("minqa") install.packages("/root/download/lme4_1.0-5.tar.gz", repos = NULL, type="source") install.packages("/root/download/doBy_4.5-3.tar.gz", repos = NULL, type="source") install.packages("mvtnorm") install.packages("survival") install.packages("data.table") help(package=splines) help(package=TH.data) help(package=MASS)
- Packages for FAO processes - Install sequence
install.packages("RCurl") install.packages("digest") install.packages("/root/download/httr_0.2.tar.gz", repos = NULL, type="source") install.packages("memoise") install.packages("whisker") install.packages("evaluate") install.packages("/root/download/devtools_1.4.1.tar.gz", repos = NULL, type="source") require(devtools) install_github("rsdmx", "opensdmx") apt-get update apt-get install libgdal1-dev apt-get install libgeos-dev apt-get install libspatialite-dev install.packages("rgdal") install.packages("rgeos") require(devtools) install_github("RFigisGeo", "openfigis")
- Test script
source("/root/download/interpolateTacsat.r")
Installation of DataMiner
Passages required to build a fully working development-environment DataMiner installation from scratch:
1 - Install a SmartGears-enabled tomcat service, possibly on the 80 port or with a redirect to the 80 port. Use devsec as starting scope.
2 - Download the 52 WAR application from the following link and put it under webapps:
http://data.d4science.org/uri-resolver/id?fileName=wps-3.3.2.war&smp-id=565d67b7e4b0eacf4a0fc5ad&contentType=application%2Fx-tika-java-web-archive
Download the following xml files and copy them into the web application WEB-INF folder, to make the application be enabled on a gCube container:
http://data.d4science.org/id?fileName=web.xml&smp-id=56615ae0e4b0158fcb561817&contentType=application%2Fxml
http://data.d4science.org/id?fileName=gcube-app.xml&smp-id=56615ae0e4b0158fcb561815&contentType=application%2Fxml
3 - Substitute the 52n-wps-server-3.3.2-X.jar and 52n-wps-algorithm-3.3.2-X.jar with the corresponding jars on our Maven gcube-externals repository:
(Repository: "gCube Externals") <dependency> <groupId>rapidminer-custom</groupId> <artifactId>52n-wps-server-d4science</artifactId> <version>3.3.2</version> </dependency>
(Otherwise available here)
<dependency> <groupId>rapidminer-custom</groupId> <artifactId>52n-wps-algorithm-d4science</artifactId> <version>3.3.2</version> </dependency>
(Otherwise available here)
4 - add the following maven library along with its dependencies to the wps/WEB-INF/lib/ folder of the wps application:
(Repository: "gCube Snapshots") <dependency> <groupId>org.gcube.dataanalysis</groupId> <artifactId>dataminer</artifactId> <version>[1.0.0-SNAPSHOT,2.0.0-SNAPSHOT)</version> </dependency>
5 - create a folder named "persistence" under wsp/
6 - create a folder named "ecocfg" under wsp/
NOTE: If also vessels data analysis is required, download the file following file gebco under the "ecocfg" folder and rename it as "gebco_08.nc"
7 - copy all the files available at this SVN link into the ecocfg folder:
https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/cfg
copy the algorithms JAR file containing the files configuration to the WEB-INF/lib/ folder
https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/algorithms/dataminer-algorithms.jar
8 - copy the PARALLEL_PROCESSING folder at this SVN link into the ecocfg folder (thus creating the PARALLEL_PROCESSING folder under ecocfg):
https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/PARALLEL_PROCESSING
9 - copy the following xml file into the wps/config folder:
https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/wpscfg/wps_config.xml
10 - substitute the hostname and the port inside the following tag of the previous xml file, with the correct indication of the hostname of the machine and of the port (80) of the tomcat:
<Server protocol="http" hostname="localhost" hostport="8080" includeDataInputsInResponse="false" computationTimeoutMilliSeconds="3600000" cacheCapabilites="false" webappPath="wps" repoReloadInterval="0.0" minPoolSize="10" maxPoolSize="20" keepAliveSeconds="1000" maxQueuedTasks="100">
11 - a reference example of configured and working wps application can be found at this link:
http://goo.gl/rtbHpW
12 - Testing
Test 1: algorithm descriptions (tests the basic availability of the service):
http://<hostname>:<port>/wps/WebProcessingService?Request=DescribeProcess&Service=WPS&Version=1.0.0&gcube-token=<token>&Identifier=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.BIONYM_LOCAL
Test 2: algorithm execution (tests a complete algorithm execution):
http://<hostname>:<port>/wps/WebProcessingService?request=Execute&service=WPS&Version=1.0.0&gcube-token=<token>&Identifier=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.BIONYM_LOCAL&DataInputs=Matcher_1=LEVENSHTEIN;Matcher_4=NONE;Matcher_5=NONE;Matcher_2=NONE;Matcher_3=NONE;Threshold_1=0.6;Threshold_2=0.6;Accuracy_vs_Speed=MAX_ACCURACY;MaxResults_2=10;MaxResults_1=10;Threshold_3=0.4;Taxa_Authority_File=FISHBASE;Parser_Name=SIMPLE;MaxResults_4=0;Threshold_4=0;MaxResults_3=0;MaxResults_5=0;Threshold_5=0;Use_Stemmed_Genus_and_Species=false;Activate_Preparsing_Processing=true;SpeciesAuthorName=Gadus morhua