https://wiki.gcube-system.org/api.php?action=feedcontributions&user=Gianpaolo.coro&feedformat=atomGcube Wiki - User contributions [en]2024-03-28T22:43:57ZUser contributionsMediaWiki 1.25.1https://wiki.gcube-system.org/index.php?title=Data_Mining_Facilities&diff=35208Data Mining Facilities2023-06-18T23:09:36Z<p>Gianpaolo.coro: /* Current usage statistics */</p>
<hr />
<div>[[Category:gCube Features]]<br />
== Overview ==<br />
Data Mining facilities include a set of features, services and methods for performing data processing and mining on information sets. These features face several aspects of data processing ranging from modeling to clustering, from identification of anomalies to detection of hidden series. This set of services and libraries is used by the D4Science e-infrastructure to manage data mining problems even from a computational complexity point of view. Algorithms are executed in parallel and possibly distributed fashion, using the same D4Science nodes as working nodes. Furthermore, Services performing Data Mining operations are deployed according to a distributed architecture, in order to balance the load of those procedures requiring local resources. <br />
<br />
By means of the above features, Data Mining aims to manage problems like (i) the prediction of the impact of climate changes on biodiversity, (ii) the prevention of the spread of invasive species, (iii) the identification of geographical and ecological aspects of disease transmission, (iv) the conservation planning, (v) the prediction of suitable habitats for marine species. By using the computational facilities of the D4Science e-Infrastructure, algorithms can run in a cost-effective way letting scientists perform more experiments and combine different techniques.<br />
<br />
== Key Features ==<br />
<br />
The components part of the subsystem provide the following main key features:<br />
<br />
;parallel processing<br />
:parallelization of statistical algorithms using a map-reduce approach<br />
:cloud computing approach in a seamless way to the users<br />
<br />
;pre-cooked state-of-the-art data mining algorithms<br />
:algorithms oriented to biological-related problems supplied as-a-service<br />
:general purpose algorithms (e.g. Clustering, Principal Component Analysis, Artificial Neural Networks) supplied as-a-service<br />
<br />
;data trends generation and analysis<br />
:extraction of trends for biodiversity data<br />
:inspection of time series of observations on biological species<br />
:basic signal processing techniques to explore periodicities in trends<br />
<br />
;ecological niche modelling<br />
:algorithms to perform ecological niche modelling using either mechanistic or correlative approaches<br />
:species distribution maps generation<br />
<br />
== Specifications ==<br />
<br />
;[[DataMiner_Manager | DataMiner]]<br />
: a Service allowing the management of statistical data and multi-user requests for computation<br />
;[[DataMiner_Algorithms | DataMiner Algorithms]]<br />
: the complete list of algorithms supported by the [[DataMiner_Manager | DataMiner]]<br />
;[[How-to_Implement_Algorithms_for_DataMiner | How-to Implement Algorithms for DataMiner]]<br />
: How to implement algorithms for DataMiner<br />
;[[Statistical_Algorithms_Importer | Statistical Algorithms Importer]]<br />
: a tool to import processes on DataMiner<br />
;[[DataMiner_Installation | DataMiner Installation]]<br />
: Installation guide for DataMiner<br />
;[[How_to_Interact_with_the_DataMiner_by_client | How to Interact with the DataMiner by client]]<br />
: Interacting with DataMiner from a thin client<br />
;[[Ecological Modeling]]<br />
: a set of methods for performing Data Mining operations. These include experiments and techniques categorization<br />
;[[Signal Processing]]<br />
: a set of methods to perform digital signal processing.<br />
;[[Statistical_Manager | Statistical Manager]]<br />
: the previous gCube system for Cloud computing<br />
; [[How to use the DataMiner Pool Manager| DataMiner Pool Manager]]<br />
: Automatic installer system of algorithms<br />
<br />
== Current usage statistics ==<br />
Data extracted using the [https://wiki.gcube-system.org/gcube/Accounting_Portlet gCube accounting system].<br />
<br />
;Overall number of process requests per month: ~64 300<br />
<br />
;Nr. of Algorithms integrated: ~600<br />
<br />
;Overall number of users: ~3000<br />
<br />
;Availability: 99.7%<br />
<br />
<br />
'''Last Update:''' June. 2023</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Workspace_Interaction_From_R&diff=33571Workspace Interaction From R2020-10-08T15:00:31Z<p>Gianpaolo.coro: /* Functions */</p>
<hr />
<div>= Overview =<br />
<br />
This page reports examples to interact with the online Workspace (WS) from R.<br />
<br />
= Key features =<br />
<br />
* interaction with R<br />
* saving files and folders on the WS<br />
* downloading files and folders from the WS<br />
<br />
= Functions = <br />
<br />
* INITIAL STEP: Import the [http://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/RConfiguration/RD4SFunctions/workspace_interaction.r D4Science interaction functions R container script]. '''This step is not necessary when using the RStudio instance on one of the Web portals.'''<br />
<br />
<source lang="java"><br />
#REMOTE IMPORT OF ALL FUNCTIONS - OPTIONAL<br />
source("http://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/RConfiguration/RD4SFunctions/workspace_interaction.r")<br />
<br />
#SETTING USERNAME AND TOKEN - NOT NEEDED WHEN USING RSTUDIO ON THE PORTAL<br />
username<<-"gianpaolo.coro"<br />
token<<-"..." #YOUR TOKEN FOR A VRE<br />
<br />
#LISTING<br />
a<-listHomeWS() #GET THE LIST OF FOLDERS IN THE WS ROOT<br />
b<-listWS("/Home/gianpaolo.coro/Workspace/TestSAI/") #GET THE LIST OF FILES AND FOLDERS IN ONE SUB-FOLDER<br />
<br />
<br />
#CREATING A NEW FOLDER<br />
pathonthews<-"/Home/gianpaolo.coro/Workspace"<br />
folderName = "MyNewFolder"<br />
newFolderID = createPrivateFolder(path=pathonthews,foldername=folderName)<br />
checkf<-listWS("/Home/gianpaolo.coro/Workspace/MyNewFolder/")<br />
<br />
#CREATING A NEW FOLDER WITH A DESCRIPTION<br />
pathonthews<-"/Home/gianpaolo.coro/Workspace/MyNewFolder/"<br />
folderName = "MyNextFolder"<br />
description = "MyNextFolder description"<br />
newFolderID = createPrivateFolderWithDescription(path=pathonthews,foldername=folderName,description = description)<br />
<br />
#DOWNLOADING<br />
remoteFile<-"/Home/gianpaolo.coro/Workspace/DataMiner/sample.xml" #REMOTE FILE TO DOWNLOAD<br />
downloadFileWS(remoteFile) #DOWNLOAD THE FILE LOCALLY<br />
<br />
folder<-"/Home/gianpaolo.coro/Workspace/TestSAI" #REMOTE FOLDER TO DOWNLOAD<br />
downloadFolderWS(folder) #DOWNLOAD THE FOLDER CONTENT LOCALLY<br />
<br />
#UPLOADING<br />
wsfolder<-"/Home/gianpaolo.coro/Workspace/TestUploads" #REMOTE DESTINATION FOLDER<br />
file="userconfig.csv" #LOCAL FILE TO UPLOAD<br />
overwrite<-T #CHOOSE IF THE FILE SHOULD BE OVERWRITTEN<br />
q<-uploadWS(wsfolder,file,overwrite) #UPLOAD THE FILE TO THE WS<br />
<br />
#UPLOADING THE COMPLETE LOCAL R WORKSPACE ONTO THE E-INFRA WS<br />
uploadAllWS(wsfolder)<br />
<br />
<br />
#OBTAINING A PUBLIC URL FOR A FILE (FOR NON-VRE FOLDERS)<br />
remotefile<-"/Home/gianpaolo.coro/Workspace/splist.txt"<br />
publicURL<-getPublicFileLinkWS(remotefile)<br />
<br />
#UPLOAD TO THE ROOT OF THE VRE FOLDER CORRESPONDING TO THE USER TOKEN<br />
outcome<-uploadToVREFolder("",'sampletext2.txt',T,F)<br />
#UPLOAD TO A SUBFOLDER OF THE VRE FOLDER<br />
outcome<-uploadToVREFolder("Samples/",'sampletext2.txt',T,F)<br />
#UPLOAD TO A SUBSUBFOLDER OF THE VRE FOLDER<br />
outcome<-uploadToVREFolder("Samples/TextSamples/",'sampletext2.txt',T,F)<br />
<br />
#GET FILE LINK FROM THE ROOT VRE FOLDER<br />
link<-getPublicFileLinkVREFolder('sampletext2.txt')<br />
<br />
#GET FILE LINK FROM A FILE IN A SUBFOLDER OF THE VRE FOLDER<br />
link<-getPublicFileLinkVREFolder('Samples/sampletext2.txt')<br />
<br />
#DOWNLOAD FILE FROM THE ROOT VRE FOLDER<br />
downloadFromVREFolder('sampletext2.txt')<br />
<br />
#DOWNLOAD FILE FROM A SUBFOLDER OF THE ROOT VRE FOLDER<br />
downloadFromVREFolder('Samples/sampletext2.txt')<br />
<br />
</source></div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Workspace_Interaction_From_R&diff=33570Workspace Interaction From R2020-10-08T15:00:17Z<p>Gianpaolo.coro: /* Functions */</p>
<hr />
<div>= Overview =<br />
<br />
This page reports examples to interact with the online Workspace (WS) from R.<br />
<br />
= Key features =<br />
<br />
* interaction with R<br />
* saving files and folders on the WS<br />
* downloading files and folders from the WS<br />
<br />
= Functions = <br />
<br />
* INITIAL STEP: Import the [http://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/RConfiguration/RD4SFunctions/workspace_interaction.r D4Science interaction functions R container script]. '''This step is not necessary when using the RStudio instance on one of the Web portals.'''<br />
<br />
<source lang="java"><br />
#REMOTE IMPORT OF ALL FUNCTIONS - OPTIONAL<br />
source("http://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/RConfiguration/RD4SFunctions/workspace_interaction.r")<br />
<br />
#SETTING USERNAME AND TOKEN - NOT NEEDED WHEN USING RSTUDIO ON THE PORTAL<br />
username<<-"gianpaolo.coro"<br />
token<<-"..." #YOUR TOKEN FOR A VRE<br />
<br />
#LISTING<br />
a<-listHomeWS() #GET THE LIST OF FOLDERS IN THE WS ROOT<br />
b<-listWS("/Home/gianpaolo.coro/Workspace/TestSAI/") #GET THE LIST OF FILES AND FOLDERS IN ONE SUB-FOLDER<br />
<br />
<br />
#CREATING A NEW FOLDER<br />
pathonthews<-"/Home/gianpaolo.coro/Workspace"<br />
folderName = "MyNewFolder"<br />
newFolderID = createPrivateFolder(path=pathonthews,foldername=folderName)<br />
checkf<-listWS("/Home/gianpaolo.coro/Workspace/MyNewFolder/")<br />
#CREATING A NEW FOLDER WITH DESCRIPTION<br />
pathonthews<-"/Home/gianpaolo.coro/Workspace/MyNewFolder/"<br />
folderName = "MyNextFolder"<br />
description = "MyNextFolder description"<br />
newFolderID = createPrivateFolderWithDescription(path=pathonthews,foldername=folderName,description = description)<br />
<br />
#DOWNLOADING<br />
remoteFile<-"/Home/gianpaolo.coro/Workspace/DataMiner/sample.xml" #REMOTE FILE TO DOWNLOAD<br />
downloadFileWS(remoteFile) #DOWNLOAD THE FILE LOCALLY<br />
<br />
folder<-"/Home/gianpaolo.coro/Workspace/TestSAI" #REMOTE FOLDER TO DOWNLOAD<br />
downloadFolderWS(folder) #DOWNLOAD THE FOLDER CONTENT LOCALLY<br />
<br />
#UPLOADING<br />
wsfolder<-"/Home/gianpaolo.coro/Workspace/TestUploads" #REMOTE DESTINATION FOLDER<br />
file="userconfig.csv" #LOCAL FILE TO UPLOAD<br />
overwrite<-T #CHOOSE IF THE FILE SHOULD BE OVERWRITTEN<br />
q<-uploadWS(wsfolder,file,overwrite) #UPLOAD THE FILE TO THE WS<br />
<br />
#UPLOADING THE COMPLETE LOCAL R WORKSPACE ONTO THE E-INFRA WS<br />
uploadAllWS(wsfolder)<br />
<br />
<br />
#OBTAINING A PUBLIC URL FOR A FILE (FOR NON-VRE FOLDERS)<br />
remotefile<-"/Home/gianpaolo.coro/Workspace/splist.txt"<br />
publicURL<-getPublicFileLinkWS(remotefile)<br />
<br />
#UPLOAD TO THE ROOT OF THE VRE FOLDER CORRESPONDING TO THE USER TOKEN<br />
outcome<-uploadToVREFolder("",'sampletext2.txt',T,F)<br />
#UPLOAD TO A SUBFOLDER OF THE VRE FOLDER<br />
outcome<-uploadToVREFolder("Samples/",'sampletext2.txt',T,F)<br />
#UPLOAD TO A SUBSUBFOLDER OF THE VRE FOLDER<br />
outcome<-uploadToVREFolder("Samples/TextSamples/",'sampletext2.txt',T,F)<br />
<br />
#GET FILE LINK FROM THE ROOT VRE FOLDER<br />
link<-getPublicFileLinkVREFolder('sampletext2.txt')<br />
<br />
#GET FILE LINK FROM A FILE IN A SUBFOLDER OF THE VRE FOLDER<br />
link<-getPublicFileLinkVREFolder('Samples/sampletext2.txt')<br />
<br />
#DOWNLOAD FILE FROM THE ROOT VRE FOLDER<br />
downloadFromVREFolder('sampletext2.txt')<br />
<br />
#DOWNLOAD FILE FROM A SUBFOLDER OF THE ROOT VRE FOLDER<br />
downloadFromVREFolder('Samples/sampletext2.txt')<br />
<br />
</source></div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Workspace_Interaction_From_R&diff=33569Workspace Interaction From R2020-10-08T10:03:27Z<p>Gianpaolo.coro: /* Functions */</p>
<hr />
<div>= Overview =<br />
<br />
This page reports examples to interact with the online Workspace (WS) from R.<br />
<br />
= Key features =<br />
<br />
* interaction with R<br />
* saving files and folders on the WS<br />
* downloading files and folders from the WS<br />
<br />
= Functions = <br />
<br />
* INITIAL STEP: Import the [http://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/RConfiguration/RD4SFunctions/workspace_interaction.r D4Science interaction functions R container script]. '''This step is not necessary when using the RStudio instance on one of the Web portals.'''<br />
<br />
<source lang="java"><br />
#REMOTE IMPORT OF ALL FUNCTIONS - OPTIONAL<br />
source("http://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/RConfiguration/RD4SFunctions/workspace_interaction.r")<br />
<br />
#SETTING USERNAME AND TOKEN - NOT NEEDED WHEN USING RSTUDIO ON THE PORTAL<br />
username<<-"gianpaolo.coro"<br />
token<<-"..." #YOUR TOKEN FOR A VRE<br />
<br />
#LISTING<br />
a<-listHomeWS() #GET THE LIST OF FOLDERS IN THE WS ROOT<br />
b<-listWS("/Home/gianpaolo.coro/Workspace/TestSAI/") #GET THE LIST OF FILES AND FOLDERS IN ONE SUB-FOLDER<br />
<br />
<br />
#CREATING A NEW FOLDER<br />
pathonthews<-"/Home/gianpaolo.coro/Workspace"<br />
folderName = "MyNewFolder"<br />
newFolderID = createPrivateFolder(path=pathonthews,foldername=folderName)<br />
checkf<-listWS("/Home/gianpaolo.coro/Workspace/MyNewFolder/")<br />
<br />
#DOWNLOADING<br />
remoteFile<-"/Home/gianpaolo.coro/Workspace/DataMiner/sample.xml" #REMOTE FILE TO DOWNLOAD<br />
downloadFileWS(remoteFile) #DOWNLOAD THE FILE LOCALLY<br />
<br />
folder<-"/Home/gianpaolo.coro/Workspace/TestSAI" #REMOTE FOLDER TO DOWNLOAD<br />
downloadFolderWS(folder) #DOWNLOAD THE FOLDER CONTENT LOCALLY<br />
<br />
#UPLOADING<br />
wsfolder<-"/Home/gianpaolo.coro/Workspace/TestUploads" #REMOTE DESTINATION FOLDER<br />
file="userconfig.csv" #LOCAL FILE TO UPLOAD<br />
overwrite<-T #CHOOSE IF THE FILE SHOULD BE OVERWRITTEN<br />
q<-uploadWS(wsfolder,file,overwrite) #UPLOAD THE FILE TO THE WS<br />
<br />
#UPLOADING THE COMPLETE LOCAL R WORKSPACE ONTO THE E-INFRA WS<br />
uploadAllWS(wsfolder)<br />
<br />
<br />
#OBTAINING A PUBLIC URL FOR A FILE (FOR NON-VRE FOLDERS)<br />
remotefile<-"/Home/gianpaolo.coro/Workspace/splist.txt"<br />
publicURL<-getPublicFileLinkWS(remotefile)<br />
<br />
#UPLOAD TO THE ROOT OF THE VRE FOLDER CORRESPONDING TO THE USER TOKEN<br />
outcome<-uploadToVREFolder("",'sampletext2.txt',T,F)<br />
#UPLOAD TO A SUBFOLDER OF THE VRE FOLDER<br />
outcome<-uploadToVREFolder("Samples/",'sampletext2.txt',T,F)<br />
#UPLOAD TO A SUBSUBFOLDER OF THE VRE FOLDER<br />
outcome<-uploadToVREFolder("Samples/TextSamples/",'sampletext2.txt',T,F)<br />
<br />
#GET FILE LINK FROM THE ROOT VRE FOLDER<br />
link<-getPublicFileLinkVREFolder('sampletext2.txt')<br />
<br />
#GET FILE LINK FROM A FILE IN A SUBFOLDER OF THE VRE FOLDER<br />
link<-getPublicFileLinkVREFolder('Samples/sampletext2.txt')<br />
<br />
#DOWNLOAD FILE FROM THE ROOT VRE FOLDER<br />
downloadFromVREFolder('sampletext2.txt')<br />
<br />
#DOWNLOAD FILE FROM A SUBFOLDER OF THE ROOT VRE FOLDER<br />
downloadFromVREFolder('Samples/sampletext2.txt')<br />
<br />
</source></div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=File:AbsencesSpeciesList_prod_annotated.zip&diff=33301File:AbsencesSpeciesList prod annotated.zip2020-03-23T11:41:54Z<p>Gianpaolo.coro: Gianpaolo.coro uploaded a new version of File:AbsencesSpeciesList prod annotated.zip</p>
<hr />
<div></div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Statistical_Algorithms_Importer:_R_Project&diff=33300Statistical Algorithms Importer: R Project2020-03-23T11:32:17Z<p>Gianpaolo.coro: /* Using WPS4R Annotations */</p>
<hr />
<div>{| align="right"<br />
||__TOC__<br />
|}<br />
<br />
:This page explains how to create a R project using [[Statistical_Algorithms_Importer|Statistical Algorithms Importer (SAI)]] portlet. <br />
<br />
[[Image:StatisticalAlgorithmsImporter_RBase1.png|thumb|center|250px|R Project, SAI]]<br />
<br />
==Project Folder==<br />
:After select an empty folder on the e-Infrastructure Workspace, the system creates an empty project in that folder.<br />
<br />
[[Image:StatisticalAlgorithmsImporter_CreateProject.png|thumb|center|800px|Create Project, SAI]]<br />
<br />
==Import Resources==<br />
:Any resource needed to run the script can be imported in the Project Folder. Resources cab be added either via the Workspace or using the Add Resource button in main menu, or dragging and dropping files in the folder window.<br />
[[Image:StatisticalAlgorithmsImporter_AddResource.png|thumb|center|800px|Add Resource, SAI]]<br />
<br />
:Thus, if the resource is on the user's local file system, (s)he can use the Drag and Drop facility, working also with multiple files.<br />
<br />
[[Image:StatisticalAlgorithmsImporter_ProjectExplorerDND.png|thumb|center|800px|Adding resources with Drag and Drop, SAI]]<br />
<br />
==Import Resources From GitHub==<br />
:If you have a project on GitHub, you can import it into SAI. After creating a new project, just click the menu button on GitHub.<br />
<br />
[[Image:StatisticalAlgorithmsImporter_GitHubMenu.png|thumb|center|700px|GitHub on Menu, SAI]]<br />
<br />
:You may access the GitHub Connector wizard. Please, read here to see how to use it: [[GitHub Connector|GitHub Connector]]<br />
<br />
==Set Main Code==<br />
:After adding the scripts and resources, one of the script files should be indicated as Main code. The e-Infrastructure will run this code, which is supposed to import and orchestrate the other scripts. Indicating a script as Main code can be done by clicking the Set Main button in Project Explorer. The file will be loaded in the Editor. In this phase the system also reads possible annotations inside the script (e.g. WPS4R annotations). At this point, the user can change the code and save it using the Save button on the Editor panel. Alternatively, the user can also use Copy and Paste by writing the code directly in the editor and then save it, still using the Save button in Editor menu (A file name will be requested).<br />
<br />
[[Image:StatisticalAlgorithmsImporter_MainCodeFull.png|thumb|center|800px|Set Main Code facility, SAI]]<br />
<br />
==Input==<br />
:In this area the system collects all the information required by the system to create software for the e-Infrastructure and communicate with the e-Infrastructure team. Metadata, input/output information, global parameters and required packages are collected here.<br />
<br />
===Global Variables===<br />
:In this panel you can add any Global Variable that are used by the script as pre-requisite.<br />
[[Image:StatisticalAlgorithmsImporter_AbsenceSpecies_GlobalV.png|thumb|center|800px|Global Variables indication, SAI]]<br />
<br />
===Input/Output===<br />
:In this area, selected input and output from the script is collected. In order to add a new I/O, the user should select a row in the code (from the the Editor) and than click the +Input (or +Output) button in the Menu Editor. <br />
A new row is added to the Input/Output list. The system parses the code behind the scenes and guesses the best type, description and name of the parameter. Once a row has been created in the Input/Output window, the user can change information by clicking on the row. At least one input is required for compiling the project. '''The name of the input variable and the default value should not be changed unless a parsing error occurred'''. The reason is that the infrastructure will discover the variables inside the script by using the name and the default value.<br />
<br />
'''Note: as a general rule, always set a default value for a variable, otherwise the execution of the algorithm may be compromised. Thus, do not use empty strings as default values.'''<br />
<br />
[[Image:StatisticalAlgorithmsImporter_AbsenceSpecies_InputOutput.png|thumb|center|800px|Input/Output window, SAI]]<br />
<br />
<br />
===Advanced Input===<br />
It is possible to indicate spatial inputs or time/date inputs. The details for the definition of these dare are reported in the [[Advanced Input| Advanced Input ]]<br />
<br />
===Interpreter Info===<br />
:You can add Version and Packages information in the Interpreter Info panel. The version number is mandatory for the project. Here, for example, a user should specify the version of the R interpreter and the packages needed to run the script. These will be installed on the e-Infrastructure machines during the first deployment session.<br />
<br />
[[Image:StatisticalAlgorithmsImporter_AbsenceSpecies_InterpreterInfo.png|thumb|center|800px|Interpreter Info, SAI]]<br />
<br />
:A list of pre-installed software on the infrastructure machines is available at this page: [[Pre Installed Packages|Pre Installed Packages]]<br />
<br />
===Project Info===<br />
:A name and a description of the project are mandatory. These will be displayed to the user of the e-Infrastructure and should also contain proper citation of the algorithm. Special characters are not allowed for the algorithm name. The user can include the category of the algorithm.<br />
[[Image:StatisticalAlgorithmsImporter_AbsenceSpecies_ProjectInfo.png|thumb|center|800px|Project Info, SAI]]<br />
<br />
==Save Project==<br />
:You can save project by click on Save button in main menu. A file called stat_algo.project is add to Project Folder.<br />
[[Image:StatisticalAlgorithmsImporter_AbsenceSpecies_SaveProject.png|thumb|center|800px|Save Project, SAI]]<br />
<br />
==Inheritance of Global and Infrastructure Variables==<br />
<br />
the following global variables are inherited by all the R scripts running in the e-Infrastructure. They are meant to allow the scripts to contact the infrastructure services:<br />
<br />
* '''gcube_username''' (the user who run the computation, e.g. gianpaolo.coro)<br />
<br />
* '''gcube_context''' (the VRE the process was run in, e.g. d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab)<br />
<br />
* '''gcube_token''' (the token of the user for the VRE, e.g. 1234-567-890)<br />
<br />
<br />
==Using WPS4R Annotations==<br />
:SAI automatically parses R code containing [https://wiki.52north.org/bin/view/Geostatistics/WPS4R WPS4R annotations], the system automatically transforms annotations into Input/Output panel and Project Info panel information. The name of algorithm is mandatory in the annotations. We report a full example of annotated algorithm and attach the complete algorithm in a zip package:<br />
<br />
<source lang='javascript' style="display:block;font-family:monospace;white-space:pre;margin:1em 0;"><br />
############################################################################################################################<br />
############# Absence Generation Script - Gianpaolo Coro and Chiara Magliozzi, CNR 2015, Last version 06-07-2015 ###########<br />
############################################################################################################################<br />
#Modified 25-05-2017<br />
<br />
#52North WPS annotations<br />
# wps.des: id = Absence_generation_from_OBIS, title = Absence_generation_from_OBIS, abstract = A script to estimate absence records from OBIS;<br />
<br />
####REST API VERSION#####<br />
rm(list=ls(all=TRUE))<br />
graphics.off() <br />
<br />
## charging the libraries<br />
library(DBI)<br />
library(RPostgreSQL)<br />
library(raster)<br />
library(maptools)<br />
library("sqldf")<br />
library(RJSONIO)<br />
library(httr)<br />
library(data.table)<br />
<br />
# time<br />
t0<-Sys.time()<br />
<br />
## parameters <br />
# wps.in: id = list, type = text/plain, title = list of species beginning with the speciesname header,value="species.txt";<br />
list= "species.txt"<br />
specieslist<-read.table(list,header=T,sep=",") # my short dataset 2 species<br />
#attach(specieslist)<br />
# wps.in: id = res, type = double, title = resolution of the analysis,value=1;<br />
res=1;<br />
extent_x=180<br />
extent_y=90<br />
n=extent_y*2/res;<br />
m=extent_x*2/res;<br />
# wps.in: id = occ_percentage, type = double, title = percentage of observations occurrence of a viable survey,value=0.1;<br />
occ_percentage=0.05 #between 0 and 1<br />
<br />
#uncomment for time filtering<br />
<br />
#No time filter<br />
TimeStart<-"";<br />
TimeEnd<-"";<br />
<br />
TimeStart<-gsub("(^ +)|( +$)", "",TimeStart)<br />
TimeEnd<-gsub("(^ +)|( +$)", "", TimeEnd)<br />
<br />
#AUX function<br />
pos_id<-function(latitude,longitude){<br />
#latitude<-round(latitude, digits = 3)<br />
#longitude<-round(longitude, digits = 3)<br />
latitude<-latitude<br />
longitude<-longitude<br />
code<-paste(latitude,";",longitude,sep="")<br />
return(code)<br />
}<br />
<br />
## opening the connection with postgres<br />
cat("REST API VERSION\n")<br />
cat("PROCESS VERSION 6 \n")<br />
cat("Opening the connection with the catalog\n")<br />
#drv <- dbDriver("PostgreSQL")<br />
#con <- dbConnect(drv, dbname="obis", host="...", port="...", user="...", password="...")<br />
<br />
cat("Analyzing the list of species\n")<br />
counter=0;<br />
overall=length(specieslist$scientificname)<br />
<br />
cat("Extraction from the different contributors the total number of obs per resource id...\n")<br />
<br />
timefilter<-""<br />
if (nchar(TimeStart)>0 && nchar(TimeEnd)>0)<br />
timefilter<-paste(" where datecollected>'",TimeStart,"' and datecollected<'",TimeEnd,"'",sep="");<br />
<br />
queryCache <- paste("select drs.resource_id, count(distinct position_id) as allcount from obis.drs", timefilter, " group by drs.resource_id",sep="")<br />
cat("Resources extraction query:",queryCache,"\n")<br />
<br />
allresfile="allresources.dat"<br />
if (file.exists(allresfile)){<br />
load(allresfile)<br />
} else{<br />
#allresources1<-dbGetQuery(con,queryCache)<br />
######QUERY 0 - REST CALL<br />
cat("Q0:querying for resources\n")<br />
<br />
getJsonQ0<-function(limit,offset){<br />
cat("Q0: offset",offset,"limit",limit,"\n")<br />
resources_query<-paste("http://api.iobis.org/resource?limit=",limit,"&offset=",offset,sep="")<br />
<br />
json_file <- fromJSON(resources_query)<br />
<br />
#res_count<-json_file$count<br />
res_count<-length(json_file$results)<br />
res_count_json<<-json_file$count<br />
cat("Q0:json count vs count",res_count_json,"vs",res_count,"\n",sep=" ")<br />
<br />
allresources1 <- data.frame(resource_id=integer(),allcount=integer())<br />
<br />
for (i in 1:res_count){<br />
#cat(i,"\n")<br />
if (is.null(json_file$results[[i]]$record_cnt))<br />
json_file$results[[i]]$record_cnt=0<br />
row<-data.frame(resource_id = json_file$results[[i]]$id, allcount = json_file$results[[i]]$record_cnt)<br />
allresources1 <- rbind(allresources1, row)<br />
}<br />
rm(json_file)<br />
return(allresources1)<br />
}<br />
objects = 1000<br />
allresources1<-getJsonQ0(objects,0)<br />
ceil<-ceiling(res_count_json/objects)<br />
if (ceil>1){<br />
for (i in 2:ceil){<br />
cat(">call n.",i,"\n")<br />
allresources1.1<-getJsonQ0(objects,objects*(i-1))<br />
allresources1<-rbind(allresources1,allresources1.1)<br />
}<br />
}<br />
######END REST CALL<br />
save(allresources1,file=allresfile)<br />
}<br />
<br />
<br />
cat("All resources saved\n")<br />
<br />
files<-vector()<br />
f<-0<br />
if (!file.exists("./data"))<br />
dir.create("./data")<br />
<br />
cat("About to analyse species\n")<br />
<br />
for (sp in specieslist$scientificname){<br />
f<-f+1<br />
t1<-Sys.time()<br />
graphics.off()<br />
grid=matrix(data=0,nrow=n,ncol=m)<br />
gridInfo=matrix(data="",nrow=n,ncol=m)<br />
outputfileAbs=paste("data/Absences_",sp,"_",res,"deg.csv",sep="");<br />
outputimage=paste("data/Absences_",sp,"_",res,"deg.png",sep="");<br />
<br />
counter=counter+1;<br />
cat("analyzing species",sp,"\n")<br />
cat("***Species status",counter,"of",overall,"\n")<br />
<br />
## first query: select the species<br />
cat("Extraction the species id from the OBIS database...\n")<br />
query1<-paste("select id from obis.tnames where tname='",sp,"'", sep="")<br />
#obis_id<- dbGetQuery(con,query1)<br />
<br />
######QUERY 1 - REST CALL<br />
cat("Q1:querying for the species",sp," \n")<br />
query1<-paste("http://api.iobis.org/taxa?scientificname=",URLencode(sp),sep="")<br />
cat("Q1:query: ",query1," \n")<br />
result_from_httr1<-GET(query1, timeout(1*3600))<br />
json_obis_taxa_id <- fromJSON(content(result_from_httr1, as="text"))<br />
<br />
#json_obis_taxa_id <- fromJSON(query1)<br />
cat("Q1:query done\n")<br />
res_count_json<-json_obis_taxa_id$count<br />
res_count<-length(json_obis_taxa_id$results)<br />
cat("Q1:json count vs count",res_count_json,"vs",res_count,"\n",sep=" ")<br />
obis_id<-json_obis_taxa_id$results[[1]]$id<br />
obis_id<-data.frame(id=obis_id)<br />
######END REST CALL<br />
<br />
cat("The ID extracted is ", obis_id$id, "for the species", sp, "\n", sep=" ")<br />
if (nrow(obis_id)==0) {<br />
cat("WARNING: there is no reference code for", sp,"\n")<br />
next;<br />
}<br />
<br />
## second query: select the contributors<br />
cat("Selection of the contributors in the database having recorded the species...\n")<br />
query2<- paste("select distinct resource_id from obis.drs where valid_id='",obis_id$id,"'", sep="")<br />
#posresource<-dbGetQuery(con,query2)<br />
<br />
######QUERY 2 - REST CALL<br />
cat("Q2:querying for obisid ",obis_id$id," \n")<br />
<br />
<br />
downlq<-paste("http://api.iobis.org/occurrence/download?obisid=",obis_id$id,"&sync=true",sep="")<br />
cat("Q2:query",downlq," \n")<br />
<br />
filezip<-paste("sp_",obis_id$id,".zip",sep="")<br />
dirzip<-paste("./sp_",obis_id$id,sep="")<br />
download.file(downlq, filezip, method="wget", quiet = F, mode = "w",cacheOK = FALSE)<br />
cat("Q2:dirzip",dirzip," \n")<br />
<br />
if (!file.exists(dirzip))<br />
dir.create(dirzip)<br />
cat("Q2:unzipping",dirzip," \n")<br />
unzip(filezip,exdir=dirzip)<br />
<br />
csvfile<-dir(dirzip)<br />
csvfile<-paste(dirzip,"/",csvfile[1],sep="") <br />
cat("Q2:reading csv file",csvfile," \n")<br />
occurrences<-read.csv(csvfile)<br />
posresource<-sqldf("select resource_id from occurrences",drv="SQLite")<br />
tgtresources1<-sqldf("select resource_id, latitude || ';' || longitude as tgtcount from occurrences",drv="SQLite")<br />
posresource<-sqldf("select distinct * from posresource",drv="SQLite")<br />
rm(occurrences)<br />
######END REST CALL<br />
<br />
if (nrow(posresource)==0) {<br />
cat("WARNING: there are no resources for", sp,"\n")<br />
next;<br />
}<br />
<br />
<br />
## third query: select from the contributors different observations<br />
merge(allresources1, posresource, by="resource_id")-> res_ids<br />
<br />
## forth query: how many obs are contained in each contributors for the species<br />
cat("Extraction from the different contributors the number of obs for the species...\n")<br />
query4 <- paste("select drs.resource_id, count(distinct position_id) as tgtcount from obis.drs where valid_id='",obis_id$id,"'group by drs.resource_id ",sep="")<br />
#tgtresources1<-dbGetQuery(con,query4)<br />
<br />
######QUERY 4 - REST CALL<br />
cat("Q4:extracting obs from contributors ",obis_id$id," \n")<br />
getJsonQ4<-function(limit, offset){<br />
cat("Q4: offset",offset,"limit",limit,"\n")<br />
query4<-paste("http://api.iobis.org/occurrence?obisid=",obis_id$id,"&limit=",limit,"&offset=",offset,sep="")<br />
result_from_httr<-GET(query4, timeout(1*3600))<br />
jsonDoc <- fromJSON(content(result_from_httr, as="text"))<br />
res_count_json<<-jsonDoc$count<br />
res_count<-length(jsonDoc$results)<br />
cat("Q4:json count vs count",res_count_json,"vs",res_count,"\n",sep=" ")<br />
<br />
tgtresources1 <- data.frame(resource_id=integer(),tgtcount=character())<br />
res_count<-length(jsonDoc$results)<br />
for (i in 1:res_count){<br />
positionID<-pos_id(jsonDoc$results[[i]]$decimalLatitude,jsonDoc$results[[i]]$decimalLongitude)<br />
row<-data.frame(resource_id = jsonDoc$results[[i]]$resourceID , tgtcount=positionID)<br />
tgtresources1 <- rbind(tgtresources1, row)<br />
}<br />
#tgtresources1<-sqldf("select resource_id, count(distinct tgtcount) as tgtcount from tgtresources1 group by resource_id",drv="SQLite")<br />
<br />
return(tgtresources1)<br />
}<br />
<br />
#objects = 1500<br />
#tgtresources1<-getJsonQ4(objects,0)<br />
#ceil<-ceiling(res_count_json/objects)<br />
#if (ceil>1){<br />
#for (i in 2:ceil){<br />
# cat(">call n.",i,"\n")<br />
#tgtresources1.1<-getJsonQ4(objects,objects*(i-1))<br />
#tgtresources1<-rbind(tgtresources1,tgtresources1.1)<br />
#}<br />
#}<br />
<br />
tgtresources1<-sqldf("select resource_id, count(distinct tgtcount) as tgtcount from tgtresources1 group by resource_id",drv="SQLite")<br />
<br />
######END REST CALL<br />
<br />
<br />
merge(tgtresources1, posresource, by="resource_id")-> tgtresourcesSpecies <br />
<br />
## fifth query: select contributors that has al least 0.1 observation of the species<br />
#### we have the table all together: contributors, obs in each contributors for at leat one species and obs of the species in each contributors<br />
cat("Extracting the contributors containing more than 10% of observations for the species\n")<br />
cat("Selected occurrence percentage: ",occ_percentage,"\n")<br />
<br />
tmp <- merge(res_ids, tgtresourcesSpecies, by= "resource_id",all.x=T)<br />
tmp["species_10"] <- NA <br />
as.numeric(tmp$tgtcount) / tmp$allcount -> tmp$species_10<br />
<br />
<br />
<br />
viable_res_ids <- subset(tmp,species_10 >= occ_percentage, select=c("resource_id","allcount","tgtcount", "species_10")) <br />
#cat(viable_res_ids)<br />
<br />
if (nrow(viable_res_ids)==0) {<br />
cat("WARNING: there are no viable points for", sp,"\n")<br />
next;<br />
}<br />
<br />
numericselres<-paste("'",paste(as.character(as.numeric(t(viable_res_ids["resource_id"]))),collapse="','"),"'",sep="")<br />
selresnumbers<-as.numeric(t(viable_res_ids["resource_id"]))<br />
<br />
## sixth query: select all the cell at 0.1 degrees resolution in the main contributors<br />
cat("Select the cells at 0.1 degrees resolution for the main contributors\n")<br />
query6 <- paste("select position_id, positions.latitude, positions.longitude, count(*) as allcount ", <br />
"from obis.drs ", <br />
"inner join obis.tnames on drs.valid_id=tnames.id ",<br />
"inner join obis.positions on position_id=positions.id ",<br />
"where resource_id in (", numericselres,") ",<br />
"group by position_id, positions.latitude, positions.longitude, resource_id")<br />
#all_cells <- dbGetQuery(con,query6)<br />
<br />
<br />
######QUERY 6 - REST CALL<br />
cat("Q6:extracting 0.1 cells from contributors \n")<br />
<br />
downlq<-paste("http://api.iobis.org/occurrence/download?resourceid=",gsub("'", "", numericselres),"&sync=true",sep="")<br />
cat("Q6:query",downlq," \n")<br />
filezip<-paste("rsp_",obis_id$id,".zip",sep="")<br />
dirzip<-paste("./rsp_",obis_id$id,sep="")<br />
download.file(downlq, filezip, method="wget", quiet = F, mode = "w",cacheOK = FALSE)<br />
cat("Q6:dirzip",dirzip," \n")<br />
<br />
if (!file.exists(dirzip))<br />
dir.create(dirzip)<br />
cat("Q6:unzipping",dirzip," \n")<br />
unzip(filezip,exdir=dirzip)<br />
<br />
csvfile<-dir(dirzip)<br />
csvfile<-paste(dirzip,"/",csvfile[1],sep="") <br />
cat("Q6:reading csv file",csvfile," \n")<br />
occurrences<-read.csv(csvfile)<br />
<br />
all_cells_table<-sqldf("select resource_id, latitude || ';' || longitude as position, latitude ,longitude from occurrences",drv="SQLite")<br />
rm(occurrences)<br />
getJsonQ6<-function(limit,offset,selres){<br />
cat("Q6: offset",offset,"limit",limit,"\n")<br />
cat("Q6: resource",selres,"\n")<br />
#query6<-paste("http://api.iobis.org/occurrence?resourceid=",gsub("'", "", numericselres),"&limit=",limit,"&offset=",offset,sep="")<br />
if (offset>0)<br />
query6<-paste("http://api.iobis.org/occurrence?resourceid=",gsub("'", "", selres),"&limit=",limit,"&skipid=",offset,sep="")<br />
else<br />
query6<-paste("http://api.iobis.org/occurrence?resourceid=",gsub("'", "", selres),"&limit=",limit,sep="")<br />
<br />
cat("Q6:",query6," \n")<br />
<br />
<br />
jsonDoc = tryCatch({<br />
result_from_httr<-GET(query6, timeout(1*3600))<br />
cat("Q6: got answer\n")<br />
jsonDoc <- fromJSON(content(result_from_httr, as="text"))<br />
}, warning = function(w) {<br />
cat("Warning: ",w,"\n")<br />
}, error = function(e) {<br />
cat("Error: Too small value for resolution for this species - the solution spaceis too large!\n")<br />
}, finally = {<br />
jsonDoc=NA<br />
})<br />
<br />
<br />
<br />
res_count_json<<-jsonDoc$count<br />
res_count<-length(jsonDoc$results)<br />
cat("Q6:json count vs count",res_count_json,"vs",res_count,"\n",sep=" ")<br />
<br />
all_cells2 <- data.frame(resource_id=integer(),position_id=character(),latitude=integer(),longitude=integer())<br />
for (i in 1:res_count){<br />
positionID<-pos_id(jsonDoc$results[[i]]$decimalLatitude,jsonDoc$results[[i]]$decimalLongitude)<br />
row<-data.frame(resource_id = jsonDoc$results[[i]]$resourceID, position_id = positionID, latitude=jsonDoc$results[[i]]$decimalLatitude, longitude=jsonDoc$results[[i]]$decimalLongitude)<br />
all_cells2 <- rbind(all_cells2, row)<br />
}<br />
lastid<<-jsonDoc$results[[res_count]]$id<br />
return(all_cells2)<br />
}<br />
<br />
cat("All resources:",numericselres,"\n")<br />
<br />
all_cells<-sqldf("select position as position_id, latitude, longitude, count(*) as allcount from all_cells_table group by position, latitude, longitude, resource_id",drv="SQLite")<br />
<br />
######END REST CALL<br />
<br />
<br />
<br />
## seventh query: select all the cell at 0.1 degrees resolution in the main contributors for the selected species<br />
cat("Select the cells at 0.1 degrees resolution for the species in the main contributors\n")<br />
query7 <- paste("select position_id, positions.latitude, positions.longitude, count(*) as tgtcount ",<br />
"from obis.drs",<br />
"inner join obis.tnames on drs.valid_id=tnames.id ", <br />
"inner join obis.positions on position_id=positions.id ", <br />
"where resource_id in (", numericselres,") ",<br />
"and drs.valid_id='",obis_id$id,"'", <br />
"group by position_id, positions.latitude, positions.longitude")<br />
#presence_cells<-dbGetQuery(con,query7)<br />
<br />
######QUERY 7 - REST CALL<br />
cat("Q7:extracting 0.1 cells for the species ",obis_id$id,"\n")<br />
<br />
downlq<-paste("http://api.iobis.org/occurrence/download?resourceid=",gsub("'", "", numericselres),"&obisid=",obis_id$id,"&sync=true",sep="")<br />
cat("Q7:query",downlq," \n")<br />
filezip<-paste("rspsp_",obis_id$id,".zip",sep="")<br />
dirzip<-paste("./rspsp_",obis_id$id,sep="")<br />
download.file(downlq, filezip, method="wget", quiet = F, mode = "w",cacheOK = FALSE)<br />
cat("Q7:dirzip",dirzip," \n")<br />
<br />
if (!file.exists(dirzip))<br />
dir.create(dirzip)<br />
cat("Q7:unzipping",dirzip," \n")<br />
unzip(filezip,exdir=dirzip)<br />
<br />
csvfile<-dir(dirzip)<br />
csvfile<-paste(dirzip,"/",csvfile[1],sep="") <br />
cat("Q7:reading csv file",csvfile," \n")<br />
occurrences<-read.csv(csvfile)<br />
<br />
presence_cells2<-sqldf("select resource_id, latitude ,longitude, latitude || ';' || longitude as position from occurrences",drv="SQLite")<br />
rm(occurrences)<br />
getJsonQ7<-function(limit,offset){<br />
cat("Q7: offset",offset,"limit",limit,"\n")<br />
if (offset>0)<br />
query7<-paste("http://api.iobis.org/occurrence?resourceid=",gsub("'", "", numericselres),"&obisid=",obis_id$id,"&limit=",limit,sep="")<br />
else query7<-paste("http://api.iobis.org/occurrence?resourceid=",gsub("'", "", numericselres),"&obisid=",obis_id$id,"&limit=",limit,"&skipid=",offset,sep="")<br />
<br />
result_from_httr<-GET(query7, timeout(1*3600))<br />
jsonDoc <- fromJSON(content(result_from_httr, as="text"))<br />
res_count_json<<-jsonDoc$count<br />
res_count<-length(jsonDoc$results)<br />
cat("Q7:json count vs count",res_count_json,"vs",res_count,"\n",sep=" ")<br />
<br />
presence_cells2 <- data.frame(resource_id=integer(),position_id=character(),latitude=integer(),longitude=integer())<br />
for (i in 1:res_count){<br />
<br />
positionID<-pos_id(jsonDoc$results[[i]]$decimalLatitude,jsonDoc$results[[i]]$decimalLongitude)<br />
row<-data.frame(resource_id = jsonDoc$results[[i]]$resourceID, position_id = positionID, latitude=jsonDoc$results[[i]]$decimalLatitude, longitude=jsonDoc$results[[i]]$decimalLongitude)<br />
presence_cells2 <- rbind(presence_cells2, row)<br />
}<br />
<br />
lastid<<-jsonDoc$results[[res_count]]$id<br />
<br />
return(presence_cells2)<br />
}<br />
<br />
<br />
presence_cells<-sqldf("select position as position_id, latitude, longitude, count(*) as tgtcount from presence_cells2 group by position_id, latitude, longitude, resource_id",drv="SQLite")<br />
<br />
######END REST CALL<br />
<br />
## last query: for every cell in the sixth query if there is a correspondent in the seventh query I can put 1 otherwise 0<br />
#data.df<-merge(all_cells, presence_cells, by= "position_id",all.x=T)<br />
#data.df$longitude.y<-NULL <br />
#data.df$latitude.y<-NULL<br />
#data.df[is.na(data.df)] <- 0 <br />
<br />
######### Table resulting from the analysis<br />
#pres_abs_cells <- subset(data.df,select=c("latitude.x","longitude.x", "tgtcount","position_id"))<br />
#positions<-paste("'",paste(as.character(as.numeric(t(pres_abs_cells["position_id"]))),collapse="','"),"'",sep="")<br />
positions<-""<br />
query8<-paste("select position_id, resfullname,digirname,abstract,temporalscope,date_last_harvested",<br />
"from ((select distinct position_id,resource_id from obis.drs where position_id IN (", positions,<br />
") order by position_id ) as a",<br />
"inner join (select id,resfullname,digirname,abstract,temporalscope,date_last_harvested from obis.resources where id in (",<br />
numericselres,")) as b on b.id = a.resource_id) as d")<br />
<br />
#resnames<-dbGetQuery(con,query8)<br />
<br />
######QUERY 8 - REST CALL<br />
cat("Q8:extracting contributors details\n")<br />
data.df2<-merge(all_cells, presence_cells, by= "position_id",all.x=T)<br />
data.df2$longitude.y<-NULL <br />
data.df2$latitude.y<-NULL<br />
data.df2[is.na(data.df2)] <- 0 <br />
rm (all_cells)<br />
pres_abs_cells2 <- subset(data.df2,select=c("latitude.x","longitude.x", "tgtcount","position_id"))<br />
positions2<-paste("'",paste(as.character(as.character(t(pres_abs_cells2["position_id"]))),collapse="','"),"'",sep="")<br />
<br />
refofpositions<-sqldf(paste("select distinct resource_id from all_cells_table where position in (",positions2,")"),drv="SQLite")<br />
referencesn<-nrow(refofpositions)<br />
resnames_res2 <- data.frame(resource_id=integer(),resfullname=character(),digirname=character(),abstract=character(),temporalscope=character(),date_last_harvested=character())<br />
for (i in 1: referencesn){<br />
query8<-paste("http://api.iobis.org/resource/",refofpositions[i,1],sep="")<br />
result_from_httr<-GET(query8, timeout(1*3600))<br />
jsonDoc <- fromJSON(content(result_from_httr, as="text"))<br />
<br />
daterecord<-as.POSIXct(jsonDoc$date_last_harvested/1000, origin="1970-01-01")#origin="1970-01-01")<br />
if (length(daterecord)==0)<br />
daterecord=""<br />
abstractst<-jsonDoc$abstract_str<br />
<br />
if (length(jsonDoc$abstract_str)==0)<br />
jsonDoc$abstract_str=""<br />
<br />
if (length(jsonDoc$id)==0)<br />
jsonDoc$id=""<br />
<br />
if (length(jsonDoc$fullname)==0)<br />
jsonDoc$fullname=""<br />
<br />
if (length(jsonDoc$temporalscope)==0)<br />
jsonDoc$temporalscope=""<br />
<br />
<br />
row<-data.frame(resource_id = jsonDoc$id, resfullname=jsonDoc$fullname, digirname=jsonDoc$digirname, abstract=jsonDoc$abstract_str,temporalscope=jsonDoc$temporalscope,date_last_harvested=daterecord)<br />
<br />
resnames_res2 <- rbind(resnames_res2, row) <br />
}<br />
<br />
resnames2<-sqldf(paste("select distinct position as position_id, resfullname, digirname, abstract, temporalscope, date_last_harvested from (select * from all_cells_table where position in (",positions2,")) as a inner join resnames_res2 as b on a.resource_id=b.resource_id"),drv="SQLite")<br />
resnames<-sqldf("select * from resnames2 order by position_id",drv="SQLite")<br />
pres_abs_cells<-sqldf("select * from pres_abs_cells2 order by position_id",drv="SQLite")<br />
rm(all_cells_table)<br />
######END REST CALL<br />
<br />
#sorting data df<br />
# pres_abs_cells<-pres_abs_cells[with(pres_abs_cells, order(position_id)), ]<br />
nrows = nrow(pres_abs_cells)<br />
######## FIRST Loop inside the rows of the dataset<br />
cat("Looping on the data\n")<br />
for(i in 1: nrows) {<br />
lat<-pres_abs_cells[i,1]<br />
long<-pres_abs_cells[i,2]<br />
value<-pres_abs_cells[i,3]<br />
resource_name<-paste("\"",paste(as.character(t(resnames[i,])),collapse="\",\""),"\"",sep="")#resnames[i,2]<br />
k=round((lat+90)*n/180)<br />
g=round((long+180)*m/360)<br />
if (k==0) k=1;<br />
if (g==0) g=1;<br />
if (k>n || g>m)<br />
next;<br />
if (value>=1){<br />
if (grid[k,g]==0){<br />
grid[k,g]=1<br />
gridInfo[k,g]=resource_name<br />
}<br />
else if (grid[k,g]==-1){<br />
grid[k,g]=-2<br />
gridInfo[k,g]=resource_name<br />
}<br />
}<br />
else if (value==0){<br />
if (grid[k,g]==0){<br />
grid[k,g]=-1<br />
#cat("resource abs",resource_name,"\n")<br />
gridInfo[k,g]=resource_name<br />
}<br />
else if (grid[k,g]==1){<br />
grid[k,g]=-2<br />
gridInfo[k,g]=resource_name<br />
}<br />
<br />
}<br />
}<br />
cat("End looping\n")<br />
<br />
cat("Generating image\n")<br />
absence_cells<-which(grid==-1,arr.ind=TRUE)<br />
presence_cells_idx<-which(grid==1,arr.ind=TRUE)<br />
latAbs<-((absence_cells[,1]*180)/n)-90<br />
longAbs<-((absence_cells[,2]*360)/m)-180<br />
latPres<-((presence_cells_idx[,1]*180)/n)-90<br />
longPres<-((presence_cells_idx[,2]*360)/m)-180<br />
resource_abs<-gridInfo[absence_cells]<br />
rm(gridInfo)<br />
rm(grid)<br />
absPoints <- cbind(longAbs, latAbs)<br />
absPointsData <- cbind(longAbs, latAbs,resource_abs)<br />
<br />
if (length(absPoints)==0)<br />
{<br />
cat("WARNING no viable point found for ",sp," after processing!\n")<br />
next;<br />
}<br />
data(wrld_simpl)<br />
projection(wrld_simpl) <- CRS("+proj=longlat")<br />
png(filename=outputimage, width=1200, height=600)<br />
plot(wrld_simpl, xlim=c(-180, 180), ylim=c(-90, 90), axes=TRUE, col="black")<br />
box()<br />
pts <- SpatialPoints(absPoints,proj4string=CRS(proj4string(wrld_simpl)))<br />
<br />
## Find which points do not fall over land<br />
cat("Retreiving the poing that do not fall on land\n")<br />
pts<-pts[which(is.na(over(pts, wrld_simpl)$FIPS))]<br />
points(pts, col="green", pch=1, cex=0.50)<br />
datapts<-as.data.frame(pts)<br />
colnames(datapts) <- c("longAbs","latAbs")<br />
<br />
abspointstable<-merge(datapts, absPointsData, by.x= c("longAbs","latAbs"), by.y=c("longAbs","latAbs"),all.x=F)<br />
<br />
<br />
header<-"longitude,latitude,resource_id,resource_name,resource_identifier,resource_abstract,resource_temporalscope,resource_last_harvested_date"<br />
write.table(header,file=outputfileAbs,append=F,row.names=F,quote=F,col.names=F)<br />
<br />
write.table(abspointstable,file=outputfileAbs,append=T,row.names=F,quote=F,col.names=F,sep=",")<br />
files[f]<-outputfileAbs<br />
cat("Elapsed: created imaged in ",Sys.time()-t1," sec \n")<br />
graphics.off()<br />
}<br />
<br />
# wps.out: id = zipOutput, type = text/zip, title = zip file containing absence records and images;<br />
zipOutput<-"absences.zip"<br />
zip(zipOutput, files=c("./data"), flags= "-r9X", extras = "",zip = Sys.getenv("R_ZIPCMD", "zip"))<br />
<br />
cat("Closing database connection")<br />
cat("Elapsed: overall process finished in ",Sys.time()-t0," min \n")<br />
#dbDisconnect(con)<br />
graphics.off()<br />
<br />
</source><br />
[[File:AbsencesSpeciesList_prod_annotated.zip|AbsencesSpeciesList_prod_annotated.zip]]<br />
<br />
<br />
:The following screenshot report the result of importing this script into SAI:<br />
<br />
[[Image:StatisticalAlgorithmsImporter_AbsenceSpecies_Annotations_Info.png|thumb|center|800px|Annotations Project Info, SAI]]<br />
[[Image:StatisticalAlgorithmsImporter_AbsenceSpecies_Annotations_InputOutput.png|thumb|center|800px|Annotations Input/Output, SAI]]<br />
<br />
<br />
<!--<br />
==References==<br />
{{Reflist}} --><br />
<br />
<br />
<!--<br />
[[Template:Statistical Algorithms Importer]] <br />
--><br />
<br />
[[Category:Statistical Algorithms Importer]]</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=ServiceManager_Guide&diff=33099ServiceManager Guide2019-12-11T11:58:27Z<p>Gianpaolo.coro: </p>
<hr />
<div>[[Category:Administrator's Guide]]<br />
{|align=right<br />
||__TOC__<br />
|}<br />
This part of the guide is intended to cover the installation and configuration of gCube services that are not mentioned in the Administration guide. Mainly we refer to services that are not Enabling and that can be installed dynamically by the Infrastructure/VO Managers. The list includes also for each component known issues and specific configuration steps to follow.<br />
<br />
=Search=<br />
<br />
==Search V 2.xx ==<br />
<br />
<br />
The installation of a Search Node in gCube is characterised by the installation of 2 web-services ( in the minimal configuration ) :<br />
<br />
* SearchSystemService<br />
* ExecutionEngineService<br />
<br />
This is the minimal installation scenario but it's possible to enable distributed search as well and this will required the installation and configuration of several ExecutionEngineServices<br />
<br />
=== HW requirements ===<br />
<br />
The minimal installation requirements for a Search node are a Single CPU node with 2GB RAMm but it's more than recommended to have at least 3GB RAM on the node dedicated to the GHN.<br />
<br />
=== Configuration ===<br />
<br />
The SearchSystemService and ExecutionEngineService have to be automatically/manually deployed in a VRE scope. In addition if we want to configure the SearchSystemService to exploit the local ExecutionEngineService to run the queries ( minimal installation) we should configure the jndi service as follows:<br />
<br />
* excludeLocal = false<br />
* collocationThreshold = 0.3f<br />
* complexPlanNumNodes = 800000<br />
<br />
== Search v 3.x.x==<br />
<br />
The 3.0 version has moved to Smartgears and tomcat. <br />
<br />
The requirement of the codeployment with Execution Engine Service is also there , so the Execution Engine Service v 2.0.0 has been also ported to SmartGears <br />
<br />
=== HW requirements ===<br />
<br />
The minimal installation requirements for a Search node are a Single CPU node with 2GB RAMm but it's more than recommended to have at least 3GB RAM on the node dedicated to the GHN.<br />
<br />
=== Configuration ===<br />
<br />
in order to fix an issue with datanucleus compatibility and java 7 there is a change to be included in the tomcat configuration:<br />
<br />
* uncomment and modify the following line on the $CATALINA_HOME/bin/catalina.sh file:<br />
<br />
JAVA_OPTS="$JAVA_OPTS -noverify -Dorg.apache.catalina.security.SecurityListener.UMASK=`umask`"<br />
<br />
* The conf file $CATALINA_HOME/conf/infrastructure.properties containing infra and scope informations needs to be present<br />
<br />
# a single infrastructure<br />
infrastructure=d4science.research-infrastructures.eu<br />
# multiple scopes must be separated by a common (e.g FARM,gCubeApps)<br />
scopes=Ecosystem<br />
clientMode=false<br />
<br />
<br />
* The conf file $CATALINA_HOME/webapps/<search>WEB-INF/classes/deploy.properties needs to be filled with this info:<br />
<br />
hostname = xx<br />
startScopes = xx<br />
port=xx<br />
<br />
=== Known Issues ===<br />
<br />
= Excecution Engine =<br />
<br />
The 2.0 version has moved to Smartgears and tomcat. <br />
<br />
=== HW requirements ===<br />
<br />
The minimal installation requirements for an Execution Engine node are a Single CPU node with 2GB RAMm but it's more than recommended to have at least 3GB RAM on the node dedicated to the GHN.<br />
<br />
=== Installation ===<br />
<br />
Different packagings of the Execution engine are available depending on the service they are going to be co-deployed with and invoked:<br />
<br />
* DTS : <artifactId>executionengineservice-dts</artifactId><br />
* Search: <artifactId>executionengineservice-search</artifactId><br />
<br />
=== Configuration ===<br />
in order to fix an issue with datanucleus compatibility and java 7 there is a change to be included in the tomcat configuration:<br />
<br />
* uncomment and modify the following line on the $CATALINA_HOME/bin/catalina.sh file:<br />
<br />
JAVA_OPTS="$JAVA_OPTS -noverify -Dorg.apache.catalina.security.SecurityListener.UMASK=`umask`"<br />
<br />
* The conf file $CATALINA_HOME/conf/infrastructure.properties containing infra and scope informations needs to be present<br />
<br />
# a single infrastructure<br />
infrastructure=d4science.research-infrastructures.eu<br />
# multiple scopes must be separated by a common (e.g FARM,gCubeApps)<br />
scopes=Ecosystem<br />
clientMode=false<br />
<br />
<br />
* The conf file $CATALINA_HOME/webapps/<execution-engine>WEB-INF/classes/deploy.properties needs to be filled with this info:<br />
<br />
hostname = xx<br />
startScopes = xx<br />
port=xx<br />
pe2ng.port = 4000<br />
<br />
*in case the exeucution engine needs to call DTS on the container.xml add:<br />
<br />
<property name='dts.execution' value='true' /><br />
<br />
= Executor and GenericWorker =<br />
<br />
=== HW requirements ===<br />
<br />
The minimal installation requirements for an Executor node with a Generic Worker plugin are a Single CPU node with 2GB RAM but it's more than recommended to have at least 3GB RAM on the node dedicated to the GHN.<br />
<br />
=== Configuration ===<br />
<br />
The following Software should be installed on the VM:<br />
<br />
* R version 2.14.1<br />
<br />
whit the following components<br />
<br />
* coda<br />
* R2jags<br />
* R2WinBUGS<br />
* rjags <br />
* bayesmix<br />
* runjags<br />
<br />
=== Known Issues ===<br />
<br />
* The GenericWorker is exploited by the Statistical Manager service to run distributed computations. Given that the SM use the root scope to discover instances of the GenericWorker. the plugin must be deployed at root scope level<br />
<br />
* Given that the GenericWorker plugin depends on the Executor Service, when dynamically deploying the plugin the Executor Service is also deployed.<br />
<br />
<br />
<br />
= SmartExecutor =<br />
<br />
<br />
=== HW requirements ===<br />
<br />
The minimal installation requirements for an Executor node with a Generic Worker plugin are a Single CPU node with 2GB RAM but it's more than recommended to have at least 3GB RAM on the node dedicated to the vHN (Smartgears gHN).<br />
<br />
=== Configuration ===<br />
<br />
No specific configuration are needed for SmartExecutor<br />
<br />
=== Known Issues ===<br />
<br />
* When correctly started the SmartExecutor publishes a ServiceEndpoint with <Category>VREManagement</Category> and <Name>SmartExecutor</Name>. You can check the availability of the plugin on that resource. there is one <AccessPoint> per plugin.<br />
<br />
<br />
<br />
= SmartGenericWorker =<br />
<br />
=== HW requirements ===<br />
<br />
The minimal installation requirements for an Executor node with a Generic Worker plugin are a Single CPU node with 2GB RAM but it's more than recommended to have at least 3GB RAM on the node dedicated to the vHN.<br />
<br />
=== Configuration ===<br />
<br />
The following Software should be installed on the VM:<br />
<br />
* R version 2.14.1<br />
<br />
whit the following components<br />
<br />
* coda<br />
* R2jags<br />
* R2WinBUGS<br />
* rjags <br />
* bayesmix<br />
* runjags<br />
<br />
=== Known Issues ===<br />
<br />
* The SmartGenericWorker is exploited by the Statistical Manager service to run distributed computations. Given that the SM use the root scope to discover instances of the SmartGenericWorker, the plugin must be deployed at root scope level<br />
* To deploy SmartGenericWorker you need to copy the SmartGenericWorker jar-with-dependecies in $CATALINA_HOME/webapps/smart-executor/WEB-INF/lib/ directory. A container restart is needed to load the new plugin.<br />
* When the container is restarted the plugin availability can be cheeked looking at the Service Endpoint published by the SmartExecutor.<br />
<br />
This simple script can help the deployment process.<br />
<br />
<br />
<code><br />
<nowiki>#</nowiki>!/bin/bash<br />
$CATALINA_HOME/bin/shutdown.sh -force<br />
rm -rf $CATALINA_HOME/webapps/smart-executor*<br />
<br />
cp ~/smart-executor.war $CATALINA_HOME/webapps/<br />
<br />
mkdir $CATALINA_HOME/webapps/smart-executor<br />
unzip $CATALINA_HOME/webapps/smart-executor.war -d $CATALINA_HOME/webapps/smart-executor<br />
<br />
cp ~/smart-generic-worker-*.jar $CATALINA_HOME/webapps/smart-executor/WEB-INF/lib/<br />
<br />
sleep 5s<br />
$CATALINA_HOME/bin/startup.sh<br />
</code><br />
<br />
= DTS =<br />
== DTS v2.x==<br />
=== HW requirements ===<br />
<br />
The minimal installation requirements for an DTS node are a Single CPU node with 2GB RAMm but it's more than recommended to have at least 3GB RAM on the node dedicated to the GHN.<br />
<br />
=== Configuration ===<br />
<br />
DTS uses Execution Engine to run the transformations so at least one Execution Engine should be deployed in the same scope as DTS and the related GHNLabels.xml file should contain:<br />
<br />
<pre><br />
<Variable><br />
<Key>dts.execution</Key><br />
<Value>true</Value><br />
</Variable><br />
</pre><br />
<br />
=== Known Issues ===<br />
<br />
none<br />
<br />
== DTS v3.x==<br />
<br />
=== HW requirements ===<br />
<br />
The minimal installation requirements for an DTS node with a Generic Worker plugin are a Single CPU node with 2GB RAMm but it's more than recommended to have at least 3GB RAM on the node dedicated to the GHN.<br />
<br />
=== Configuration ===<br />
<br />
* The conf file $CATALINA_HOME/conf/infrastructure.properties containing infra and scope informations needs to be present<br />
<br />
# a single infrastructure<br />
infrastructure=d4science.research-infrastructures.eu<br />
# multiple scopes must be separated by a common (e.g FARM,gCubeApps)<br />
scopes=Ecosystem<br />
clientMode=false<br />
<br />
* The conf file $CATALINA_HOME/webapps/<dts>/WEB-INF/classes/deploy.properties needs to be filled with this info:<br />
<br />
hostname = xx<br />
startScopes = xx<br />
port=xx<br />
<br />
DTS uses Execution Engine to run the transformations so at least one Execution Engine should be deployed in the same scope as DTS and the related Smartgears conf file ( container.xml ) should have this properties:<br />
<br />
<pre><br />
<property name='dts.execution' value='true' /> <br />
</pre><br />
<br />
= Index =<br />
<br />
== Index Service ==<br />
<br />
The Index Service is the latest released Restful Service running on Smartgears. It implements both FW and FT index functionalitoes<br />
<br />
=== HW requirements ===<br />
<br />
Given codeployment with ElasticSearch ( embedded) it's recommended at least a VM with 4GB RAM and 2 CPUs. <br />
<br />
Also open file limit should be raised to 32000<br />
<br />
=== Configuration ===<br />
<br />
Details on the Index Service configuration are available at https://gcube.wiki.gcube-system.org/gcube/index.php/Index_Management_Framework#Deployment_Instructions<br />
<br />
== ForwardIndexNode ( Dismissed) ==<br />
<br />
The ForwardIndexNode service needs to be codeployed with an instance of CouchBase service<br />
<br />
=== HW requirements ===<br />
<br />
Given codeployment with Couchbase it's recommended at least a VM with 4GB RAM and 2 CPUs. <br />
<br />
=== Configuration ===<br />
<br />
The installation of Couchbase should be performed manually and it depends on the OS ( binary package, rpm, debs). <br />
<br />
It's recommended to put an higher limit of the open files on the VM ( 32000 min).<br />
<br />
The configuration for the FWIndexNode that should be customized (jndi file):<br />
<br />
* couchBaseIP = IP of the server hosting Couchbase ( so the same as the GHN)<br />
* couchBaseUseName = the username set when configuring Couchbase<br />
* couchBasePassword = the password set when configuring Couchbase<br />
<br />
Once configured it's needed to initialize Couchbase using the cb_initialize_node.sh script contained into the service configuration folder.<br />
<br />
=== Known Issues ===<br />
<br />
* Sometimes the cb_initialize_node.sh script fails, it could mean that there is not enough memory to inizialize the data bucket , try to reduce the value of ''ramQuota'' in the jndi file.<br />
<br />
= Statistical Manager =<br />
<br />
== Resources ==<br />
{| {{table}}<br />
| align="center" style="background:#f0f0f0;"|'''Runtime Resources'''<br />
| align="center" style="background:#f0f0f0;"|''''''<br />
| align="center" style="background:#f0f0f0;"|''''''<br />
|-<br />
| DataStorage/StorageManager||VO/VRE||StorageManager<br />
|-<br />
| Database/Obis2Repository||VRE||Trendylyzer<br />
|-<br />
| Database/StatisticalManagerDatabase||INFRA/VO/VRE||Statistical<br />
|-<br />
| Database/AquamapsDB||VO/VRE||Algorithms<br />
|-<br />
| Database/FishCodesConversion||VO/VRE||Algorithms<br />
|-<br />
| Database/FishBase||VO/VRE||Algorithms - TaxaMatch<br />
|-<br />
| DataStorage/Storage Manager||INFRA/VO/VRE||All<br />
|-<br />
| Gis/Geoserver1..n||VRE||Maps Algorithms<br />
|-<br />
| Gis/TimeSeriesDatastore||VO/VRE||Maps Algorithms<br />
|-<br />
| Gis/GeoNetwork||VRE||Maps Algorithms<br />
|-<br />
| Service/MessageBroker||VO||Service<br />
|-<br />
| BiodiversityRepository/CatalogofLife||VO/VRE||Occurrence Algorithms<br />
|-<br />
| BiodiversityRepository/GBIF||VO/VRE||Occurrence Algorithms<br />
|-<br />
| BiodiversityRepository/ITIS||VO/VRE||Occurrence Algorithms<br />
|-<br />
| BiodiversityRepository/WoRDSS||VO/VRE||Occurrence Algorithms<br />
|-<br />
| BiodiversityRepository/WoRMS||VO/VRE||Occurrence Algorithms<br />
|-<br />
| BiodiversityRepository/OBIS||VO/VRE||Occurrence Algorithms<br />
|-<br />
| BiodiversityRepository/NCBI||VO/VRE||Occurrence Algorithms<br />
|-<br />
| BiodiversityRepository/SpeciesLink||VO/VRE||Occurrence Algorithms<br />
|-<br />
| DataAnalysis/Dataminer||VRE||Required if Dataminer is needed in the VRE<br />
|-<br />
| Database/UsersGisTablesDB||VRE||Required if Dataminer and SDI are needed in the VRE<br />
|}<br />
<br />
<br />
{| {{table}}<br />
| align="center" style="background:#f0f0f0;"|'''WS Resources'''<br />
| align="center" style="background:#f0f0f0;"|''''''<br />
| align="center" style="background:#f0f0f0;"|''''''<br />
|-<br />
| Workers||INFRA/VO||Parallel Computations<br />
|}<br />
<br />
<br />
{| {{table}}<br />
| align="center" style="background:#f0f0f0;"|'''Generic Resources'''<br />
| align="center" style="background:#f0f0f0;"|''''''<br />
| align="center" style="background:#f0f0f0;"|''''''<br />
|-<br />
| ISO/MetadataConstants||VO/VRE||Maps Algorithms<br />
|}<br />
<br />
=== Known Issues ===<br />
<br />
Tested on ghn 4.0.0 and StatisticalManager service 1.4.0:<br />
* install the SM on the same network where the database and the used resources are located. Otherwise it would imply to restart production databases because direct access could not be granted to such resources.<br />
* remove lib axis-1.4.jar from gCore/lib<br />
* replace the library hsqldb-1.8.jar with the library hsqldb-2.2.8.jar in gCore/lib<br />
<br />
=== Additional Installation Steps ===<br />
* create a suitable R environment[https://support.d4science.org/issues/2174]<br />
* download the file following file [http://thredds.research-infrastructures.eu/thredds/fileServer/public/netcdf/gebco_08_OCEANS_CLIMATOLOGY_METEOROLOGY_ATMOSPHERE_.nc gebco] under /home/gcube/gCore/etc/statistical-manager-service-full-XXX/cfg and rename it as gebco_08.nc<br />
* copy the gcube keys under /home/gcube/gCore/etc/statistical-manager-service-full-XXX/cfg/PARALLEL_PROCESSING<br />
<br />
== Services and Databases used by the Statistical Manager and Data Analysis facilities ==<br />
<br />
====GHN====<br />
gcube@statistical-manager1.d4science.org<br />
<br />
gcube@statistical-manager2.d4science.org<br />
<br />
gcube@statistical-manager3.d4science.org<br />
<br />
gcube@statistical-manager4.d4science.org<br />
<br />
gcube2@statistical-manager.d.d4science.org<br />
<br />
====TOMCAT====<br />
<br />
(root user)<br />
<br />
thredds.research-infrastructures.eu<br />
<br />
wps.statistical.d4science.org<br />
<br />
rstudio.p.d4science.research-infrastructures.eu<br />
<br />
geoserver.d4science.org<br />
<br />
geoserver2.d4science.org<br />
<br />
geoserver3.d4science.org<br />
<br />
geoserver4.d4science.org<br />
<br />
geoserver-dev.d4science-ii.research-infrastructures.eu<br />
<br />
geoserver-dev2.d4science-ii.research-infrastructures.eu<br />
<br />
geonetwork.geothermaldata.d4science.org<br />
<br />
geonetwork.d4science.org<br />
<br />
====THIRD PARTY SERVICES====<br />
<br />
(root user)<br />
<br />
rstudio.p.d4science.research-infrastructures.eu (sw rstudio, command: rstudio-server restart)<br />
<br />
====DATABASES====<br />
<br />
(root user)<br />
<br />
geoserver-db.d4science.org<br />
<br />
node49.p.d4science.research-infrastructures.eu<br />
<br />
biodiversity.db.i-marine.research-infrastructures.eu <br />
<br />
db1.p.d4science.research-infrastructures.eu <br />
<br />
db5.p.d4science.research-infrastructures.eu <br />
<br />
dbtest.research-infrastructures.eu<br />
<br />
dbtest3.research-infrastructures.eu<br />
<br />
geoserver.d4science-ii.research-infrastructures.eu <br />
<br />
geoserver2.i-marine.research-infrastructures.eu<br />
<br />
geoserver-db.d4science.org <br />
<br />
geoserver-test.d4science-ii.research-infrastructures.eu<br />
<br />
node50.p.d4science.research-infrastructures.eu <br />
<br />
node49.p.d4science.research-infrastructures.eu <br />
<br />
node59.p.d4science.research-infrastructures.eu <br />
<br />
obis2.i-marine.research-infrastructures.eu <br />
<br />
statistical-manager.d.d4science.org <br />
<br />
====WORKER NODES====<br />
<br />
(gcube2 user)<br />
<br />
(production)<br />
<br />
node3.d4science.org<br />
<br />
node4.d4science.org<br />
<br />
node11.d4science.org<br />
<br />
node12.d4science.org<br />
<br />
node13.d4science.org<br />
<br />
node14.d4science.org<br />
<br />
node15.d4science.org<br />
<br />
node16.d4science.org<br />
<br />
node18.d4science.org<br />
<br />
node20.d4science.org<br />
<br />
node21.d4science.org<br />
<br />
node23.d4science.org<br />
<br />
node27.d4science.org<br />
<br />
node28.d4science.org<br />
<br />
node29.d4science.org<br />
<br />
node30.d4science.org<br />
<br />
node31.d4science.org<br />
<br />
node32.d4science.org<br />
<br />
node33.d4science.org<br />
<br />
node34.d4science.org<br />
<br />
node35.d4science.org<br />
<br />
node36.d4science.org<br />
<br />
node37.d4science.org<br />
<br />
node38.d4science.org<br />
<br />
node39.d4science.org<br />
<br />
<br />
(development)<br />
<br />
node17.d4science.org<br />
<br />
node19.d4science.org<br />
<br />
node22.d4science.org<br />
<br />
====TESTING====<br />
<br />
[http://goo.gl/SnfA0M Test plan for the Statistical Manager.]<br />
<br />
=GIS Technologies=<br />
<br />
In order to handle GIS Technologies, developers should rely on libraries ''geonetwork'' and ''gisinterface''. Both distributed under subsystem ''org.gcube.spatial.data''. <br />
Depending on which libraries are used, different resources are mandatory.<br />
<br />
==Geonetwork==<br />
<br />
This sections covers the default behavior of ''geonetwork'' library. Please note that clients of the library might override it.<br />
<br />
===Geonetwork Service Discovery===<br />
<br />
A single Service Endpoint per Geonetwork instance is needed, you can find more details on the resource [[GeoNetwork Configuration#Runtime Resource|here]].<br />
<br />
===Metadata Publication===<br />
<br />
In order to exploit the library's features to generate ISO metadata, the following Generic Resource is needed in the scope : <br />
* Secondary Type : ISO<br />
* Name : MetadataConstants<br />
<br />
===Metadata Resolution===<br />
<br />
Geonetwork library uses the "Uri Resolver Manager" library to resolve the Gis Layer generated via HTTP protocol, the following Generic Resource is needed in the scope: <br />
<br />
* Uri Resolver Manager<br />
<br />
https://gcube.wiki.gcube-system.org/gcube/URI_Resolver#Uri_Resolver_Manager<br />
<br />
<pre><br />
<Type>GenericResource</Type><br />
<SecondaryType>UriResolverMap</SecondaryType><br />
<Name>Uri-Resolver-Map</Name><br />
</pre><br />
<br />
==GeoServer==<br />
<br />
In order to let ''gisinterface'' library discover instances of Geoserver, an Access Point must be defined for each instance. <br />
The Service Endpoint resource for such Access Points must have :<br />
* Category : Gis<br />
* Platform/Name : GeoServer<br />
<br />
==GeoExplorer==<br />
<br />
In order to let GeoExplorer portlet work fine, you must copy the resources following from root scope (/d4science.research-infrastructures.eu/) to the VRE where it must run:<br />
<br />
* Transect<br />
<pre><br />
<Type>RuntimeResource</Type><br />
<Caegory>Application</Category><br />
<Name>Transect</Name><br />
</pre><br />
<br />
* Gis Resolver<br />
<br />
https://gcube.wiki.gcube-system.org/gcube/URI_Resolver#GIS_Resolver<br />
<br />
<pre><br />
<Type>RuntimeResource</Type><br />
<Category>Service</Category><br />
<Name>Gis-Resolver</Name><br />
</pre><br />
<br />
* Gis Viewer Application<br />
<br />
<pre><br />
<Type>GenericResource</Type><br />
<SecondaryType>ApplicationProfile</SecondaryType><br />
<Name>Gis Viewer Application</Name><br />
</pre><br />
<br />
and then must edit the Generic Reosurce shown here: https://gcube.wiki.gcube-system.org/gcube/URI_Resolver#Generic_Resource_for_Gis_Viewer_Application<br />
<br />
=Tabular Data Manager=<br />
Each service's operation may need a specific configuration. The following is a list of needed resources per operation module.<br />
<br />
==Operation View==<br />
The module requires GIS Technologies to be already configured in the operating scope. See [[#Gis Technologies|Gis Technologies]].<br />
<br />
The module requires also the following Generic Resource :<br />
*Secondary Type : TDMConfiguration<br />
<br />
Since the operation needs to put data in a postgis database already connected with Geoserver, a Service Endpoint for such database must be present in the same scope.<br />
Constraints for retrieving such Service Endpoint are taken from the Generic Resource described above (values are indicated with their xml Element name as declared in the Generic Resource's body) :<br />
<br />
*Category : <gisDBCategory><br />
*Platform/Name : <gisDBPlatformName><br />
*AccessPoint/<tdmDataStoreFlag> : true<br />
<br />
= Resource Catalogue =<br />
In this section the resources required to deploy the Catalogue in a given context are reported. <br />
<br />
'''Please note that only the mandatory ones are shown.'''<br />
<br />
== CKAN Connector ==<br />
<source lang="xml"><br />
ServiceClass = DataAccess<br />
ServiceName = CkanConnector<br />
</source><br />
This is the service that allows to perform login operation from the Gateways on CKAN. It runs on SmartGears so once it is published in the context there is no much left to do. However, it is fundamental.<br />
<br />
== Generic Resource ==<br />
=== Portlet URL ===<br />
<source lang="xml"><br />
SecondaryType = ApplicationProfile<br />
Name = CkanPortlet<br />
Description = The url of the gcube-ckan-datacatalog portlet for this scope<br />
</source><br />
<br />
The content (body) of the resource has to report the url of the catalogue portlet for this context, e.g. <br />
<br />
<source lang="xml"><br />
<url>https://services.research-infrastructures.eu/group/d4science-services-gateway/data-catalogue</url><br />
</source><br />
=== Item Catalogue ===<br />
<source lang="xml"><br />
SecondaryType = ApplicationProfile<br />
Name = Item Catalogue<br />
Description = This is the Item Catalogue application profile for alerting items creation in the infrastructure's catalogues<br />
</source><br />
This resource is deployed at root level. It contains a list of endpoints in the following format <br />
<br />
<source lang="xml"><br />
<EndPoint><br />
<URL>....</URL><br />
<Scope>....</Scope><br />
</EndPoint><br />
</source><br />
<br />
Each couple shows the map between a context and the url where the portlet is deployed in that context. <br />
<br />
'''NOTE: the resource is automatically updated by the portal (since gCube 4.11).'''<br />
<br />
=== Catalogue-Resolver ===<br />
<source lang="xml"><br />
SecondaryType = ApplicationProfile<br />
Name = Catalogue-Resolver<br />
Description = Used by Catalogue Resolver for mapping VRE NAME with its SCOPE so that resolve correctly URL of kind: http://[CATALOGUE_RESOLVER_SERVLET]/[VRE_NAME]/[entity_context value]/[entity_name value]<br />
</source><br />
See wiki page at: [https://wiki.gcube-system.org/gcube/URI_Resolver#CATALOGUE_Resolver CATALOGUE_Resolver]<br />
<br />
'''NOTE: the resource is automatically updated by the Catalogue Resolver'''<br />
<br />
=== DataCatalogueMapScopesUrls ===<br />
<source lang="xml"><br />
SecondaryType = ApplicationProfile<br />
Name = DataCatalogueMapScopesUrls<br />
Description = EndPoints that map url to scope for the data catalogue portlet instances<br />
</source><br />
This resource is deployed at root level. It contains a list of "exceptions", i.e. how to manage catalogues at VO or root VO level.<br />
<br />
== Service Endpoint(s) ==<br />
=== CKanDataCatalogue ===<br />
<source lang="xml"><br />
Application = Application<br />
Name = CKanDataCatalogue<br />
Description = A Tomcat Server hosting the ckan data catalogue<br />
</source><br />
<br />
Among the other properties of the SE, these should be reported:<br />
* HostedOn (in RunTime) is the url of the ckan instance, e.g. ckan-d4s.d4science.org;<br />
* Username (in AccessData) is the username of the CKAN SYSAdmin;<br />
* Property URL_RESOLVER, whose value is equal to the url of the URI-RESOLVER in the context;<br />
* Encrypted property API_KEY, is the api key of the CKAN SYSAdmin.<br />
=== CKanDatabase ===<br />
<source lang="xml"><br />
Application = Database<br />
Name = CKanDatabase<br />
Description = A Postgres Server hosting the ckan database<br />
</source><br />
<br />
Among the other properties of the SE, these should be reported:<br />
* HostedOn (in RunTime) is the machine hosting the postgres CKAN uses (e.g. ckan-pg-d4s.d4science.org);<br />
* EndPoint (in AccessPoint) is the machine URL hosting the postgres CKAN uses followed by the port number (e.g., ckan-pg-d4s.d4science.org:5432);<br />
* In AccessData please report the credentials (password must be encrypted) of the user allowed to access the database.<br />
<br />
<br />
'''Please note that gCat requires to dial with postgres, hence the gCat host must be enabled on postgres installation'''<br />
<br />
=== Enable view per VRE ===<br />
In order to enable this special view (which allows the catalogue portlet to render itself on a single organization), one should access the portal and as administrator enable a special custom field of the VRE. The custom field can be found, on the VRE Page, under "Admin > Pages > Configuration > Site Settings > Custom Field". Set it to true to enable the view.</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Workspace_Interaction_From_R&diff=33077Workspace Interaction From R2019-12-02T16:12:26Z<p>Gianpaolo.coro: </p>
<hr />
<div>= Overview =<br />
<br />
This page reports examples to interact with the online Workspace (WS) from R.<br />
<br />
= Key features =<br />
<br />
* interaction with R<br />
* saving files and folders on the WS<br />
* downloading files and folders from the WS<br />
<br />
= Functions = <br />
<br />
* INITIAL STEP: Import the [http://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/RConfiguration/RD4SFunctions/workspace_interaction.r D4Science interaction functions R container script]. '''This step is not necessary when using the RStudio instance on one of the Web portals.'''<br />
<br />
<source lang="java"><br />
#REMOTE IMPORT OF ALL FUNCTIONS - OPTIONAL<br />
source("http://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/RConfiguration/RD4SFunctions/workspace_interaction.r")<br />
<br />
#SETTING USERNAME AND TOKEN - NOT NEEDED WHEN USING RSTUDIO ON THE PORTAL<br />
username<<-"gianpaolo.coro"<br />
token<<-"..." #YOUR TOKEN FOR A VRE<br />
<br />
#LISTING<br />
a<-listHomeWS() #GET THE LIST OF FOLDERS IN THE WS ROOT<br />
b<-listWS("/Home/gianpaolo.coro/Workspace/TestSAI/") #GET THE LIST OF FILES AND FOLDERS IN ONE SUB-FOLDER<br />
<br />
#DOWNLOADING<br />
remoteFile<-"/Home/gianpaolo.coro/Workspace/DataMiner/sample.xml" #REMOTE FILE TO DOWNLOAD<br />
downloadFileWS(remoteFile) #DOWNLOAD THE FILE LOCALLY<br />
<br />
folder<-"/Home/gianpaolo.coro/Workspace/TestSAI" #REMOTE FOLDER TO DOWNLOAD<br />
downloadFolderWS(folder) #DOWNLOAD THE FOLDER CONTENT LOCALLY<br />
<br />
#UPLOADING<br />
wsfolder<-"/Home/gianpaolo.coro/Workspace/TestUploads" #REMOTE DESTINATION FOLDER<br />
file="userconfig.csv" #LOCAL FILE TO UPLOAD<br />
overwrite<-T #CHOOSE IF THE FILE SHOULD BE OVERWRITTEN<br />
q<-uploadWS(wsfolder,file,overwrite) #UPLOAD THE FILE TO THE WS<br />
<br />
#UPLOADING THE COMPLETE LOCAL R WORKSPACE ONTO THE E-INFRA WS<br />
uploadAllWS(wsfolder)<br />
<br />
<br />
#OBTAINING A PUBLIC URL FOR A FILE (FOR NON-VRE FOLDERS)<br />
remotefile<-"/Home/gianpaolo.coro/Workspace/splist.txt"<br />
publicURL<-getPublicFileLinkWS(remotefile)<br />
<br />
#UPLOAD TO THE ROOT OF THE VRE FOLDER CORRESPONDING TO THE USER TOKEN<br />
outcome<-uploadToVREFolder("",'sampletext2.txt',T,F)<br />
#UPLOAD TO A SUBFOLDER OF THE VRE FOLDER<br />
outcome<-uploadToVREFolder("Samples/",'sampletext2.txt',T,F)<br />
#UPLOAD TO A SUBSUBFOLDER OF THE VRE FOLDER<br />
outcome<-uploadToVREFolder("Samples/TextSamples/",'sampletext2.txt',T,F)<br />
<br />
#GET FILE LINK FROM THE ROOT VRE FOLDER<br />
link<-getPublicFileLinkVREFolder('sampletext2.txt')<br />
<br />
#GET FILE LINK FROM A FILE IN A SUBFOLDER OF THE VRE FOLDER<br />
link<-getPublicFileLinkVREFolder('Samples/sampletext2.txt')<br />
<br />
#DOWNLOAD FILE FROM THE ROOT VRE FOLDER<br />
downloadFromVREFolder('sampletext2.txt')<br />
<br />
#DOWNLOAD FILE FROM A SUBFOLDER OF THE ROOT VRE FOLDER<br />
downloadFromVREFolder('Samples/sampletext2.txt')<br />
<br />
</source></div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Workspace_Interaction_From_R&diff=33028Workspace Interaction From R2019-11-27T14:46:53Z<p>Gianpaolo.coro: /* Functions */</p>
<hr />
<div>= Overview =<br />
<br />
This page reports examples to interact with the online Workspace (WS) from R.<br />
<br />
= Key features =<br />
<br />
* interaction with R<br />
* saving files and folders on the WS<br />
* downloading files and folders from the WS<br />
<br />
= Functions = <br />
<br />
* INITIAL STEP: Import the [http://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/RConfiguration/RD4SFunctions/workspace_interaction.r D4Science interaction functions R container script]. '''This step is not necessary when using the RStudio instance on one of the Web portals.'''<br />
<br />
<source lang="java"><br />
#SETTING USERNAME AND TOKEN - NOT NEEDED WHEN USING RSTUDIO ON THE PORTAL<br />
username<<-"gianpaolo.coro"<br />
token<<-"..." #YOUR TOKEN FOR A VRE<br />
<br />
#LISTING<br />
a<-listHomeWS() #GET THE LIST OF FOLDERS IN THE WS ROOT<br />
b<-listWS("/Home/gianpaolo.coro/Workspace/TestSAI/") #GET THE LIST OF FILES AND FOLDERS IN ONE SUB-FOLDER<br />
<br />
#DOWNLOADING<br />
remoteFile<-"/Home/gianpaolo.coro/Workspace/DataMiner/sample.xml" #REMOTE FILE TO DOWNLOAD<br />
downloadFileWS(remoteFile) #DOWNLOAD THE FILE LOCALLY<br />
<br />
folder<-"/Home/gianpaolo.coro/Workspace/TestSAI" #REMOTE FOLDER TO DOWNLOAD<br />
downloadFolderWS(folder) #DOWNLOAD THE FOLDER CONTENT LOCALLY<br />
<br />
#UPLOADING<br />
wsfolder<-"/Home/gianpaolo.coro/Workspace/TestUploads" #REMOTE DESTINATION FOLDER<br />
file="userconfig.csv" #LOCAL FILE TO UPLOAD<br />
overwrite<-T #CHOOSE IF THE FILE SHOULD BE OVERWRITTEN<br />
q<-uploadWS(wsfolder,file,overwrite) #UPLOAD THE FILE TO THE WS<br />
<br />
#UPLOADING THE COMPLETE LOCAL R WORKSPACE ONTO THE E-INFRA WS<br />
uploadAllWS(wsfolder)<br />
<br />
<br />
#OBTAINING A PUBLIC URL FOR A FILE (FOR NON-VRE FOLDERS)<br />
remotefile<-"/Home/gianpaolo.coro/Workspace/splist.txt"<br />
publicURL<-getPublicFileLinkWS(remotefile)<br />
<br />
#UPLOAD TO THE ROOT OF THE VRE FOLDER CORRESPONDING TO THE USER TOKEN<br />
outcome<-uploadToVREFolder("",'sampletext2.txt',T,F)<br />
#UPLOAD TO A SUBFOLDER OF THE VRE FOLDER<br />
outcome<-uploadToVREFolder("Samples/",'sampletext2.txt',T,F)<br />
#UPLOAD TO A SUBSUBFOLDER OF THE VRE FOLDER<br />
outcome<-uploadToVREFolder("Samples/TextSamples/",'sampletext2.txt',T,F)<br />
<br />
#GET FILE LINK FROM THE ROOT VRE FOLDER<br />
link<-getPublicFileLinkVREFolder('sampletext2.txt')<br />
<br />
#GET FILE LINK FROM A FILE IN A SUBFOLDER OF THE VRE FOLDER<br />
link<-getPublicFileLinkVREFolder('Samples/sampletext2.txt')<br />
<br />
#DOWNLOAD FILE FROM THE ROOT VRE FOLDER<br />
downloadFromVREFolder('sampletext2.txt')<br />
<br />
#DOWNLOAD FILE FROM A SUBFOLDER OF THE ROOT VRE FOLDER<br />
downloadFromVREFolder('Samples/sampletext2.txt')<br />
<br />
</source></div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Workspace_Interaction_From_R&diff=33027Workspace Interaction From R2019-11-27T14:43:27Z<p>Gianpaolo.coro: /* Functions */</p>
<hr />
<div>= Overview =<br />
<br />
This page reports examples to interact with the online Workspace (WS) from R.<br />
<br />
= Key features =<br />
<br />
* interaction with R<br />
* saving files and folders on the WS<br />
* downloading files and folders from the WS<br />
<br />
= Functions = <br />
<br />
* INITIAL STEP: Import the [http://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/RConfiguration/RD4SFunctions/workspace_interaction.r D4Science interaction functions R container script]. '''This step is not necessary when using the RStudio instance on one of the Web portals.'''<br />
<br />
<source lang="java"><br />
#SETTING USERNAME AND TOKEN - NOT NEEDED WHEN USING RSTUDIO ON THE PORTAL<br />
username<<-"gianpaolo.coro"<br />
token<<-"..." #YOUR TOKEN FOR A VRE<br />
<br />
#LISTING<br />
a<-listHomeWS() #GET THE LIST OF FOLDERS IN THE WS ROOT<br />
b<-listWS("/Home/gianpaolo.coro/Workspace/TestSAI/") #GET THE LIST OF FILES AND FOLDERS IN ONE SUB-FOLDER<br />
<br />
#DOWNLOADING<br />
remoteFile<-"/Home/gianpaolo.coro/Workspace/DataMiner/sample.xml" #REMOTE FILE TO DOWNLOAD<br />
downloadFileWS(remoteFile) #DOWNLOAD THE FILE LOCALLY<br />
<br />
folder<-"/Home/gianpaolo.coro/Workspace/TestSAI" #REMOTE FOLDER TO DOWNLOAD<br />
downloadFolderWS(folder) #DOWNLOAD THE FOLDER CONTENT LOCALLY<br />
<br />
#UPLOADING<br />
wsfolder<-"/Home/gianpaolo.coro/Workspace/TestUploads" #REMOTE DESTINATION FOLDER<br />
file="userconfig.csv" #LOCAL FILE TO UPLOAD<br />
overwrite<-T #CHOOSE IF THE FILE SHOULD BE OVERWRITTEN<br />
q<-uploadWS(wsfolder,file,overwrite) #UPLOAD THE FILE TO THE WS<br />
<br />
#UPLOADING THE COMPLETE LOCAL R WORKSPACE ONTO THE E-INFRA WS<br />
uploadAllWS(wsfolder)<br />
<br />
<br />
#OBTAINING A PUBLIC URL FOR A FILE (NOT AVAILABLE FOR VRE FOLDERS)<br />
remotefile<-"/Home/gianpaolo.coro/Workspace/splist.txt"<br />
publicURL<-getPublicFileLinkWS(remotefile)<br />
<br />
#UPLOAD TO THE ROOT OF THE VRE FOLDER CORRESPONDING TO THE USER TOKEN<br />
outcome<-uploadToVREFolder("",'sampletext2.txt',T,F)<br />
#UPLOAD TO A SUBFOLDER OF THE VRE FOLDER<br />
outcome<-uploadToVREFolder("Samples/",'sampletext2.txt',T,F)<br />
#UPLOAD TO A SUBSUBFOLDER OF THE VRE FOLDER<br />
outcome<-uploadToVREFolder("Samples/TextSamples/",'sampletext2.txt',T,F)<br />
<br />
#GET FILE LINK FROM THE ROOT VRE FOLDER<br />
link<-getPublicFileLinkVREFolder('sampletext2.txt')<br />
<br />
#GET FILE LINK FROM A FILE IN A SUBFOLDER OF THE VRE FOLDER<br />
link<-getPublicFileLinkVREFolder('Samples/sampletext2.txt')<br />
<br />
#DOWNLOAD FILE FROM THE ROOT VRE FOLDER<br />
downloadFromVREFolder('sampletext2.txt')<br />
<br />
#DOWNLOAD FILE FROM A SUBFOLDER OF THE ROOT VRE FOLDER<br />
downloadFromVREFolder('Samples/sampletext2.txt')<br />
<br />
</source></div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Workspace_Interaction_From_R&diff=32989Workspace Interaction From R2019-11-21T14:47:56Z<p>Gianpaolo.coro: /* Functions */</p>
<hr />
<div>= Overview =<br />
<br />
This page reports examples to interact with the online Workspace (WS) from R.<br />
<br />
= Key features =<br />
<br />
* interaction with R<br />
* saving files and folders on the WS<br />
* downloading files and folders from the WS<br />
<br />
= Functions = <br />
<br />
* INITIAL STEP: Import the [http://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/RConfiguration/RD4SFunctions/workspace_interaction.r D4Science interaction functions R container script]. '''This step is not necessary when using the RStudio instance on one of the Web portals.'''<br />
<br />
<source lang="java"><br />
#SETTING USERNAME AND TOKEN - NOT NEEDED WHEN USING RSTUDIO ON THE PORTAL<br />
username<<-"gianpaolo.coro"<br />
token<<-"..." #YOUR TOKEN FOR A VRE<br />
<br />
#LISTING<br />
a<-listHomeWS() #GET THE LIST OF FOLDERS IN THE WS ROOT<br />
b<-listWS("/Home/gianpaolo.coro/Workspace/TestSAI/") #GET THE LIST OF FILES AND FOLDERS IN ONE SUB-FOLDER<br />
<br />
#DOWNLOADING<br />
remoteFile<-"/Home/gianpaolo.coro/Workspace/DataMiner/sample.xml" #REMOTE FILE TO DOWNLOAD<br />
downloadFileWS(remoteFile) #DOWNLOAD THE FILE LOCALLY<br />
<br />
folder<-"/Home/gianpaolo.coro/Workspace/TestSAI" #REMOTE FOLDER TO DOWNLOAD<br />
downloadFolderWS(folder) #DOWNLOAD THE FOLDER CONTENT LOCALLY<br />
<br />
#UPLOADING<br />
wsfolder<-"/Home/gianpaolo.coro/Workspace/TestUploads" #REMOTE DESTINATION FOLDER<br />
file="userconfig.csv" #LOCAL FILE TO UPLOAD<br />
overwrite<-T #CHOOSE IF THE FILE SHOULD BE OVERWRITTEN<br />
q<-uploadWS(wsfolder,file,overwrite) #UPLOAD THE FILE TO THE WS<br />
<br />
#UPLOADING THE COMPLETE LOCAL R WORKSPACE ONTO THE E-INFRA WS<br />
uploadAllWS(wsfolder)<br />
<br />
<br />
#OBTAINING A PUBLIC URL FOR A FILE (NOT AVAILABLE FOR FOLDERS)<br />
remotefile<-"/Home/gianpaolo.coro/Workspace/splist.txt"<br />
publicURL<-getPublicFileLinkWS(remotefile)<br />
<br />
#UPLOAD TO THE VRE FOLDER CORRESPONDING TO THE TOKEN<br />
outcome<-uploadToVREFolder('sampletext2.txt',overwrite=T,archive=F)<br />
<br />
#GET THE PUBLIC LINK OF A FILE IN A VRE FOLDER<br />
link<-getPublicFileLinkVREFolder('sampletext2.txt')<br />
<br />
#DOWNLOAD A FILE IN THE VRE FOLDER<br />
downloadFromVREFolder('sampletext2.txt')<br />
<br />
</source></div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Workspace_Interaction_From_R&diff=32988Workspace Interaction From R2019-11-21T14:46:06Z<p>Gianpaolo.coro: /* Functions */</p>
<hr />
<div>= Overview =<br />
<br />
This page reports examples to interact with the online Workspace (WS) from R.<br />
<br />
= Key features =<br />
<br />
* interaction with R<br />
* saving files and folders on the WS<br />
* downloading files and folders from the WS<br />
<br />
= Functions = <br />
<br />
* INITIAL STEP: Import the [http://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/RConfiguration/RD4SFunctions/workspace_interaction.r D4Science interaction functions R container script]. '''This step is not necessary when using the RStudio instance on one of the Web portals.'''<br />
<br />
<source lang="java"><br />
#SETTING USERNAME AND TOKEN - NOT NEEDED WHEN USING RSTUDIO ON THE PORTAL<br />
username<<-"gianpaolo.coro"<br />
token<<-"..." #YOUR TOKEN FOR A VRE<br />
<br />
#LISTING<br />
a<-listHomeWS() #GET THE LIST OF FOLDERS IN THE WS ROOT<br />
b<-listWS("/Home/gianpaolo.coro/Workspace/TestSAI/") #GET THE LIST OF FILES AND FOLDERS IN ONE SUB-FOLDER<br />
<br />
#DOWNLOADING<br />
remoteFile<-"/Home/gianpaolo.coro/Workspace/DataMiner/sample.xml" #REMOTE FILE TO DOWNLOAD<br />
downloadFileWS(remoteFile) #DOWNLOAD THE FILE LOCALLY<br />
<br />
folder<-"/Home/gianpaolo.coro/Workspace/TestSAI" #REMOTE FOLDER TO DOWNLOAD<br />
downloadFolderWS(folder) #DOWNLOAD THE FOLDER CONTENT LOCALLY<br />
<br />
#UPLOADING<br />
wsfolder<-"/Home/gianpaolo.coro/Workspace/TestUploads" #REMOTE DESTINATION FOLDER<br />
file="userconfig.csv" #LOCAL FILE TO UPLOAD<br />
overwrite<-T #CHOOSE IF THE FILE SHOULD BE OVERWRITTEN<br />
q<-uploadWS(wsfolder,file,overwrite) #UPLOAD THE FILE TO THE WS<br />
<br />
#UPLOADING THE COMPLETE LOCAL R WORKSPACE ONTO THE E-INFRA WS<br />
uploadAllWS(wsfolder)<br />
<br />
<br />
#OBTAINING A PUBLIC URL FOR A FILE (NOT AVAILABLE FOR FOLDERS)<br />
remotefile<-"/Home/gianpaolo.coro/Workspace/splist.txt"<br />
publicURL<-getPublicFileLinkWS(remotefile)<br />
<br />
#UPLOAD TO THE VRE FOLDER CORRESPONDING TO THE TOKEN<br />
id<-getVREFolderID()<br />
outcome<-uploadToVREFolder(id,'sampletext2.txt',overwrite=T,archive=F)<br />
<br />
#GET THE PUBLIC LINK OF A FILE IN A VRE FOLDER<br />
link<-getPublicFileLinkVREFolder('sampletext2.txt')<br />
<br />
#DOWNLOAD A FILE IN THE VRE FOLDER<br />
downloadFromVREFolder('sampletext2.txt')<br />
<br />
<br />
</source></div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Workspace_Interaction_From_R&diff=32987Workspace Interaction From R2019-11-21T14:12:27Z<p>Gianpaolo.coro: /* Functions */</p>
<hr />
<div>= Overview =<br />
<br />
This page reports examples to interact with the online Workspace (WS) from R.<br />
<br />
= Key features =<br />
<br />
* interaction with R<br />
* saving files and folders on the WS<br />
* downloading files and folders from the WS<br />
<br />
= Functions = <br />
<br />
* INITIAL STEP: Import the [http://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/RConfiguration/RD4SFunctions/workspace_interaction.r D4Science interaction functions R container script]. '''This step is not necessary when using the RStudio instance on one of the Web portals.'''<br />
<br />
<source lang="java"><br />
#SETTING USERNAME AND TOKEN - NOT NEEDED WHEN USING RSTUDIO ON THE PORTAL<br />
username<<-"gianpaolo.coro"<br />
token<<-"..." #YOUR TOKEN FOR A VRE<br />
<br />
#LISTING<br />
a<-listHomeWS() #GET THE LIST OF FOLDERS IN THE WS ROOT<br />
b<-listWS("/Home/gianpaolo.coro/Workspace/TestSAI/") #GET THE LIST OF FILES AND FOLDERS IN ONE SUB-FOLDER<br />
<br />
#DOWNLOADING<br />
remoteFile<-"/Home/gianpaolo.coro/Workspace/DataMiner/sample.xml" #REMOTE FILE TO DOWNLOAD<br />
downloadFileWS(remoteFile) #DOWNLOAD THE FILE LOCALLY<br />
<br />
folder<-"/Home/gianpaolo.coro/Workspace/TestSAI" #REMOTE FOLDER TO DOWNLOAD<br />
downloadFolderWS(folder) #DOWNLOAD THE FOLDER CONTENT LOCALLY<br />
<br />
#UPLOADING<br />
wsfolder<-"/Home/gianpaolo.coro/Workspace/TestUploads" #REMOTE DESTINATION FOLDER<br />
file="userconfig.csv" #LOCAL FILE TO UPLOAD<br />
overwrite<-T #CHOOSE IF THE FILE SHOULD BE OVERWRITTEN<br />
q<-uploadWS(wsfolder,file,overwrite) #UPLOAD THE FILE TO THE WS<br />
<br />
#UPLOADING THE COMPLETE LOCAL R WORKSPACE ONTO THE E-INFRA WS<br />
uploadAllWS(wsfolder)<br />
<br />
<br />
#OBTAINING A PUBLIC URL FOR A FILE (NOT AVAILABLE FOR FOLDERS)<br />
remotefile<-"/Home/gianpaolo.coro/Workspace/splist.txt"<br />
publicURL<-getPublicFileLinkWS(remotefile)<br />
<br />
#UPLOAD TO THE VRE FOLDER CORRESPONDING TO THE TOKEN<br />
id<-getVREFolderID()<br />
outcome<-uploadToVREFolder(id,'sampletext2.txt',overwrite=T,archive=F)<br />
<br />
</source></div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=StorageHub_REST_API&diff=32871StorageHub REST API2019-10-01T13:15:37Z<p>Gianpaolo.coro: </p>
<hr />
<div>{| align="right"<br />
||__TOC__<br />
|}<br />
<br />
= Overview =<br />
<br />
The StorageHub APIs components provide a simple access to StorageHub service. <br />
<br />
It is conceived to support both Java and REST-based calls. <br />
<br />
== Dependencies ==<br />
<br />
=== Maven coordinates ===<br />
<br />
<source lang="java"><br />
<dependency><br />
<groupId>org.gcube.common</groupId><br />
<artifactId>storagehub-client-library</artifactId><br />
<version>[1.0.0,2.0.0)</version><br />
</dependency><br />
</source><br />
<br />
= Key features =<br />
Users must use the personal token to access the REST interface. They can access just their own files and the folders shared with them.<br />
<br />
'''StorageHub REST interface''' supports the following operations:<br />
* '''Retrieve WS''': to retrieve the user Workspace;<br />
* '''Folder Listing''': to list the content of a folder;<br />
* '''Retrieve VRE Folder''': to retrieve the VREFolder related to the token;<br />
* '''Find''': to find a file or a folder by name or pattern;<br />
* '''Delete''': to remove a file or a folder (including subfolders);<br />
* '''Download''': to download a file or a folder in ZIP format;<br />
* '''Get Public Link''': to get a public link of a file;<br />
* '''Create Folder''': to create a folder in the given parent folder;<br />
* '''Unzip''': to upload a zip file in a specific folder;<br />
* '''Upload file''': to upload a file in a folder.<br />
* '''Versions''': to get a specific version of a file;<br />
<br />
== API ==<br />
<br />
== Retrieve Workspace ==<br />
<br />
Returns the Item representing the workspace of the user retrieved by the token used for the call.<br />
<br />
=== Java ===<br />
<source lang="java"><br />
import org.gcube.common.storagehub.client.dsl.StorageHubClient;<br />
import org.gcube.common.storagehub.client.dsl.FileContainer;<br />
import org.gcube.common.storagehub.model.items.Item;<br />
<br />
<br />
StorageHubClient shc = new StorageHubClient();<br />
FolderContainer rootContainer = shc.getWSRoot()<br />
</source><br />
<br />
=== REST API ===<br />
<br />
GET /workspace/?gcube-token={user-token}<br />
<br />
==== responses ====<br />
<br />
200 The workspace root item (in json format) is returned. <br />
<br />
500 The error is specified in the body of the response message<br />
<br />
== Get Item ById ==<br />
<br />
=== Java ===<br />
<source lang="java"><br />
import org.gcube.common.storagehub.client.dsl.FileContainer;<br />
import org.gcube.common.storagehub.client.dsl.StorageHubClient;<br />
import org.gcube.common.storagehub.model.items.Item;<br />
<br />
...<br />
<br />
String theId = {itemIdentifier} // the identifier of the item <br />
StorageHubClient shc = new StorageHubClient();<br />
FileContainer fileContainer = shc.open(theId).asFile();<br />
</source><br />
<br />
== List Item Versions, Get Item Version ==<br />
<br />
=== Java ===<br />
<source lang="java"><br />
import org.gcube.common.storagehub.client.dsl.FileContainer;<br />
import org.gcube.common.storagehub.client.dsl.StorageHubClient;<br />
import org.gcube.common.storagehub.model.items.Item;<br />
import org.gcube.common.storagehub.model.service.Version;<br />
<br />
...<br />
<br />
String theId = {itemIdentifier} // the identifier of the item <br />
StorageHubClient shc = new StorageHubClient();<br />
FileContainer fileContainer = shc.open(theId).asFile();<br />
List<Version> fileVersions = fileContainer.getVersions();<br />
<br />
//to download a version<br />
fileContainer.downloadSpecificVersion(versionName)<br />
</source><br />
<br />
== Folder Listing ==<br />
<br />
Returns the content of a Folder<br />
<br />
=== Java ===<br />
<br />
<source lang="java"><br />
StorageHubClient shc = new StorageHubClient();<br />
FolderContainer folderContainer = shc.open("{folderIdentifier}").asFolder();<br />
List<? extends Item> items = folderContainer.list().getItems();<br />
</source><br />
<br />
=== REST API ===<br />
<br />
GET /workspace/items/{folder-identifier}/children?gcube-token={userToken}<br />
<br />
==== responses ====<br />
<br />
200 the list of the Items (in JSON format)<br />
<br />
500 The error is specified in the body of the response message<br />
<br />
== Retrieve VRE Folder ==<br />
<br />
Return the Item representing the root of the VRE folder related to the user token.<br />
<br />
=== Java ===<br />
<source lang="java"><br />
StorageHubClient shc = new StorageHubClient();<br />
FolderContainer rootContainer = shc.openVREFolder();<br />
</source><br />
<br />
=== REST API ===<br />
<br />
GET /workspace/vrefolder?gcube-token={user-token}<br />
<br />
==== responses ====<br />
<br />
200 The VRE folder root item (in json format) is returned. <br />
<br />
500 The error is specified in the body of the response message<br />
<br />
== Create Folder ==<br />
<br />
Creates a new folder under another folder (specified by its id)<br />
<br />
=== Java ===<br />
<source lang="java"><br />
StorageHubClient shc = new StorageHubClient();<br />
FolderContainer root = shc.getWSRoot();<br />
//Creating the folder on the root workspace folder<br />
root.newFolder("name", "description");<br />
</source><br />
<br />
=== REST API ===<br />
<br />
POST /workspace/items/{destiantion-folder-id}/create/FOLDER?gcube-token={user-token} <br />
Content-Type: application/x-www-form-urlencoded <br />
{String name, String description, boolean hidden}<br />
<br />
<br />
==== responses ====<br />
<br />
200 The Folder item identifier is returned. <br />
<br />
500 The error is specified in the body of the response message<br />
<br />
== Upload File ==<br />
<br />
Creates a new file under a folder (specified by its id)<br />
<br />
=== Java ===<br />
<source lang="java"><br />
StorageHubClient shc = new StorageHubClient();<br />
FolderContainer root = shc.getWSRoot();<br />
//Creating the folder on the root workspace folder<br />
FileContainer file = null;<br />
try(InputStream is = new FileInputStream(new File("{file-to-upload}"))){<br />
file = root.uploadFile(is, "name", "description");<br />
} catch (Exception e) {<br />
//print the error<br />
}<br />
</source><br />
<br />
== Upload Archive ==<br />
<br />
Extract the content of the specified archive in a destination Folder (specified by its id)<br />
<br />
=== Java ===<br />
<source lang="java"><br />
StorageHubClient shc = new StorageHubClient();<br />
FolderContainer root = shc.getWSRoot();<br />
//Creating the folder on the root workspace folder<br />
FileContainer file = null;<br />
try(InputStream is = new FileInputStream(new File("{file-to-upload}"))){<br />
file = root.uploadArchive(is, "parentFolderName");<br />
} catch (Exception e) {<br />
//print the error<br />
}<br />
</source><br />
<br />
<br />
=== REST API ===<br />
<br />
POST /workspace/items/{destination-folder-id}/create/ARCHIVE?gcube-token={user-token}<br />
Content-Type: multipart/form-data<br />
{String parentFolderName, File file}<br />
<br />
==== CURL Example ====<br />
<br />
curl -F "parentFolderName=hspentest.csv" -F "file=@/home/lucio/Downloads/hspen.csv" protocol://host:port/storagehub/workspace/items/{folder-destination-id}/create/ARCHIVE?gcube-token={token}<br />
<br />
== Get Public Url of a File ==<br />
<br />
Returns the public URL of a file<br />
<br />
=== Java ===<br />
<source lang="java"><br />
StorageHubClient shc = new StorageHubClient();<br />
shc.open({item-id}).asFile().getPublicLink();<br />
<br />
//returns the public link for a specific version of the file <br />
shc.open({item-id}).asFile().getPublicLink({file-version});<br />
<br />
</source><br />
<br />
=== REST API ===<br />
<br />
GET /workspace/items/{item-id}/publiclink?gcube-token={user-token}&version={file-version} //version is optional<br />
<br />
== Find By Name ==<br />
<br />
Returns all the items with a name that matches a pattern on the first level of a given folder. <br />
The result will be empty in case of none of the item in the folder matches the pattern.<br />
Wildcard (*) can be used at the start or the end of the pattern (eg. starts with *{name}, ends with = {name}*) <br />
<br />
=== Java ===<br />
<source lang="java"><br />
StorageHubClient shc = new StorageHubClient();<br />
//getting my root workspace folder<br />
FolderContainer myRoot = shc.getWSRoot();<br />
myRoot.findByName("{pattern}");<br />
</source><br />
<br />
=== REST API ===<br />
<br />
GET /workspace/items/{parent-folder-id}/items/{pattern}?gcube-token={user-token} <br />
<br />
==== responses ====<br />
<br />
200 A JSON with the retrieved items or an empty itemList. <br />
<br />
500 The error is specified in the body of the response message<br />
<br />
= DataMiner and SAI Interactions =<br />
Algorithms created in [[Statistical_Algorithms_Importer|SAI]] and executed by [[DataMiner_Manager|DataMiner]] can interact with StorageHub through the StorageHub REST APIs. <br />
Here are some examples:<br />
* [[Statistical_Algorithms_Importer:_Java_Project_FAQ#StorageHub|StorageHub Facility Java]]<br />
* [[Statistical_Algorithms_Importer:_Python_Project_FAQ#StorageHub|StorageHub Facility Python]]<br />
<br />
= R Client =<br />
An R client implementing the above function is available [https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/RConfiguration/RD4SFunctions/workspace_interaction.r here]</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Statistical_Algorithms_Importer:_Java_Project&diff=31513Statistical Algorithms Importer: Java Project2018-10-22T10:52:48Z<p>Gianpaolo.coro: </p>
<hr />
<div>{| align="right"<br />
||__TOC__<br />
|}<br />
<br />
This page explains how to create a Java project using two alternative approaches: [https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer:_Java_Project#Black_Box_Integration '''Black-box'''] and [https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer:_Java_Project#White_Box_Integration '''White-box'''] integration. The next sections explain how these work and which cases these two approaches seaddress.<br />
<br />
=Black Box Integration=<br />
<br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox0.png|thumb|center|250px|Java Project, SAI]]<br />
<br />
This is the preferred way for developers who want their processes executions distributed based on the load of the requests. Each process request will run on one dedicated machine and is allowed to use multi-core processing. Black box processed usually do not use the e-Infrastructure resources but "live on their own". '''The [[Statistical_Algorithms_Importer|Statistical Algorithms Importer (SAI)]] portlet must be used for this integration'''.<br />
<br />
==Project Configuration==<br />
:Define project's metadata<br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox1.png|thumb|center|750px|Java Info, SAI]]<br />
<br />
:Add input and output parameters and click on "Set Code" to indicate the main file to execute (i.e. the .jar file)<br />
:'''Important: the full class path (including the package path) should be indicated as the FIRST parameter. It should be also indicated as System parameter so that it will appear neither in the GUI nor among the user's inputs.'''<br />
:For example, the default value of the ClassToRun parameter would be '''org.gcube.dataanalysis.SimpleProducer''' should the package of the SimpleProducer class be org.gcube.dataanalysis. If the package is the "default" one, there is no need for this specification (like it is in the example).<br />
[[File:StatisticalAlgorithmsImporter_JavaBlackBox2b.png|thumb|center|750px|Java I/O, SAI]]<br />
<br />
:Add information about the running environment (e.g. Java version etc.) <br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox3.png|thumb|center|750px|Java Interpreter, SAI]]<br />
<br />
:After the [https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer:_Create_Software software creation phase] a Main.R file and a Taget folder are created<br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox4.png|thumb|center|750px|Java Create, SAI]]<br />
<br />
== Example Code ==<br />
:Java code in sample:<br />
<br />
<source lang='java'><br />
/**<br />
* <br />
* @author Giancarlo Panichi<br />
* <br />
*<br />
*/<br />
import java.io.File;<br />
import java.io.FileWriter;<br />
<br />
public class SimpleProducer<br />
{<br />
public static void main(String[] args)<br />
{<br />
try<br />
{<br />
FileWriter fw = new FileWriter(new File("program.txt"));<br />
fw.write("Check: " + args[0]);<br />
fw.close();<br />
}<br />
catch (Exception e)<br />
{<br />
e.printStackTrace();<br />
}<br />
}<br />
}<br />
</source><br />
<br />
==Example Download==<br />
[[File:JavaBlackBox.zip|JavaBlackBox.zip]]<br />
<br />
==Inheritance of Global and Infrastructure Variables==<br />
At each run of the process the '''globalvariables.csv''' file is created locally to the process (i.e. it can be read as ./globalvariables.csv), which contains the following global variables that are meant to allow the process to properly contact the e-Infrastructure services:<br />
<br />
* '''gcube_username''' (the user who run the computation, e.g. gianpaolo.coro)<br />
<br />
* '''gcube_context''' (the VRE the process was run in, e.g. d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab)<br />
<br />
* '''gcube_token''' (the token of the user for the VRE, e.g. 1234-567-890)<br />
<br />
The format of the CSV file is like the one of the following example:<br />
<br />
<source lang='vim'><br />
"globalvariable","globalvalue"<br />
"gcube_username","gianpaolo.coro"<br />
"gcube_context","/d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab"<br />
"gcube_token","1234-567-890"<br />
</source><br />
<br />
=White Box Integration=<br />
This is the preferred way for developers who want their processes to fully exploit the e-Infrastructure resources, for example to implement Cloud computing using the e-Infrastructure computational resources. This integration modality also allows to fully reuse the Java data mining frameworks integrated by DataMiner, i.e. [https://www.knime.com/ Knime], [https://rapidminer.com/ RapidMiner], [https://www.cs.waikato.ac.nz/ml/weka/ Weka], [https://wiki.gcube-system.org/gcube/Statistical_Manager_Algorithms gCube EcologicalEngine]. '''The Eclipse IDE should be used for this integration'''.<br />
<br />
[https://gcube.wiki.gcube-system.org/gcube/How-to_Implement_Algorithms_for_DataMiner Step-by-step guide to integrate Java processes as white boxes]<br />
<!--<br />
[[Template:Statistical Algorithms Importer]] <br />
--><br />
<br />
<!--<br />
==References==<br />
{{Reflist}} --><br />
<br />
<br />
[[Category:Statistical Algorithms Importer]]</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Statistical_Algorithms_Importer:_Knime-Workflow_Project&diff=31511Statistical Algorithms Importer: Knime-Workflow Project2018-10-19T16:58:43Z<p>Gianpaolo.coro: </p>
<hr />
<div>{| align="right"<br />
||__TOC__<br />
|}<br />
<br />
:This page explains how to create a Knime-Workflow project using the [[Statistical_Algorithms_Importer|Statistical Algorithms Importer (SAI)]] portlet.<br />
[[Image:StatisticalAlgorithmsImporter_KnimeBlackBox0.png|thumb|center|250px|Knime Project, SAI]]<br />
<br />
==Project Configuration==<br />
:Define project's metadata<br />
[[Image:StatisticalAlgorithmsImporter_KnimeBlackBox1.png|thumb|center|750px|Knime Info, SAI]]<br />
<br />
:Add input and output parameters and click on "Set Code" to indicate the main file to execute (i.e. the .knwf file)<br />
[[Image:StatisticalAlgorithmsImporter_KnimeBlackBox2.png|thumb|center|750px|Knime I/O, SAI]]<br />
<br />
:Add information about the running environment (e.g. Knime version etc.) <br />
[[Image:StatisticalAlgorithmsImporter_KnimeBlackBox3.png|thumb|center|750px|Knime Interpreter, SAI]]<br />
<br />
:After the [https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer:_Create_Software software creation phase] a Main.R file and a Taget folder are created<br />
[[Image:StatisticalAlgorithmsImporter_KnimeBlackBox4.png|thumb|center|750px|Knime Create, SAI]]<br />
<br />
==Example Download==<br />
[[File:KnimeBlackBox.zip|KnimeBlackBox.zip]]<br />
<br />
==Inheritance of Global and Infrastructure Variables==<br />
At each run of the process the '''globalvariables.csv''' file is created locally to the process (i.e. it can be read as ./globalvariables.csv), which contains the following global variables that are meant to allow the process to properly contact the e-Infrastructure services:<br />
<br />
* '''gcube_username''' (the user who run the computation, e.g. gianpaolo.coro)<br />
<br />
* '''gcube_context''' (the VRE the process was run in, e.g. d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab)<br />
<br />
* '''gcube_token''' (the token of the user for the VRE, e.g. 1234-567-890)<br />
<br />
The format of the CSV file is like the one of the following example:<br />
<br />
<source lang='vim'><br />
"globalvariable","globalvalue"<br />
"gcube_username","gianpaolo.coro"<br />
"gcube_context","/d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab"<br />
"gcube_token","1234-567-890"<br />
</source><br />
<br />
<!--<br />
==References==<br />
{{Reflist}} --><br />
<br />
<br />
<!--<br />
[[Template:Statistical Algorithms Importer]] <br />
--><br />
<br />
[[Category:Statistical Algorithms Importer]]</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Statistical_Algorithms_Importer:_Java_Project_FAQ&diff=31389Statistical Algorithms Importer: Java Project FAQ2018-06-18T15:15:14Z<p>Gianpaolo.coro: </p>
<hr />
<div>{| align="right"<br />
||__TOC__<br />
|}<br />
<br />
F.A.Q. of Statistical Algorithms Importer (SAI), here are common mistakes we have found in Java Project.<br />
<br />
== Main Class ==<br />
The first parameter of a Java process is the main class that will be executed, this integration is for java processes that accept arguments as a list. In other words, a Java program process will be called as:<br />
<br />
*java -jar arg1 arg2 arg3...<br />
<br />
Please note the full package name must be entered as default value, for example:<br />
<br />
*org.d4science.projectx.XClass<br />
<br />
If your process works like an executable that requires some parameters, you can consider to use the preinstalled software integration way, which allows you to write a shell script though which you can build the invocation to your executable(it is added as a resource of the project): <br />
*[[Statistical Algorithms Importer: Pre-Installed Project|Pre-Installed Project]]<br />
*[[Statistical Algorithms Importer: Pre-Installed Project FAQ|Pre-Installed Project FAQ]]<br />
<br />
== How to Use File Input ==<br />
:Add input file parameter in Java project:<br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox_FileInputParameter.png|thumb|center|750px|File Input Parameter, SAI]]<br />
<br />
:Java source code in sample:<br />
<br />
<source lang='java'><br />
package org.myfactory.specialgroup;<br />
<br />
import java.nio.file.Files;<br />
import java.nio.file.Path;<br />
import java.nio.file.Paths;<br />
import java.nio.file.StandardCopyOption;<br />
<br />
/**<br />
* <br />
* @author Giancarlo Panichi<br />
*<br />
*/<br />
public class FileConsumer {<br />
<br />
public static void main(String[] args) {<br />
try {<br />
Path fileInput=Paths.get(args[0]); //second parameter in the project<br />
Path fileOutput=Paths.get("output.txt");<br />
Files.copy(fileInput, fileOutput, StandardCopyOption.REPLACE_EXISTING);<br />
<br />
} catch (Throwable e) {<br />
System.out.println("Error in process: " + e.getLocalizedMessage());<br />
e.printStackTrace();<br />
}<br />
}<br />
<br />
}<br />
</source><br />
<br />
:Java code in sample:<br />
[[File:JavaBlackBox_FileInputPrameter.zip|JavaBlackBox_FileInputPrameter.zip]]<br />
<br />
== How to Use Enumerated Input == <br />
:Consider the Las Vegas algorithm:<br />
[[Image:StatisticalAlgorithmsImporter_LasVegas-0.png|thumb|center|750px|Las Vegas Info, SAI]]<br />
<br />
:Indicates the java version:<br />
[[Image:StatisticalAlgorithmsImporter_LasVegas-1.png|thumb|center|750px|Las Vegas Interpreter, SAI]]<br />
<br />
:Indicates the I/O parameters:<br />
[[Image:StatisticalAlgorithmsImporter_LasVegas-2.png|thumb|center|750px|Las Vegas I/O Parameters, SAI]]<br />
<br />
:DataMiner result:<br />
[[Image:StatisticalAlgorithmsImporter_LasVegas-3.png|thumb|center|750px|Las Vegas on DataMiner, SAI]]<br />
<br />
<br />
:Java source code in Las Vegas:<br />
<br />
<source lang='java'><br />
package org.myfactory.specialgroup;<br />
<br />
import java.io.IOException;<br />
import java.nio.file.Files;<br />
import java.nio.file.Path;<br />
import java.nio.file.Paths;<br />
import java.nio.file.StandardOpenOption;<br />
import java.util.function.ToIntFunction;<br />
import java.util.stream.Stream; <br />
<br />
/**<br />
* <br />
* Las Vegas<br />
* <br />
* @author Giancarlo Panichi<br />
*<br />
*/<br />
public class Casino {<br />
<br />
private static String game;<br />
private static boolean bluff;<br />
private static Path betsFile;<br />
<br />
private static void init(String g, String f, String b) {<br />
game = g;<br />
betsFile = Paths.get(f);<br />
<br />
try {<br />
bluff = Boolean.valueOf(b);<br />
} catch (Exception e) {<br />
bluff = false;<br />
}<br />
}<br />
<br />
private static ToIntFunction<String> play = new ToIntFunction<String>() {<br />
<br />
@Override<br />
public int applyAsInt(String beat) {<br />
Integer b = 0;<br />
try {<br />
b = Integer.valueOf(beat);<br />
} catch (NumberFormatException e) {<br />
<br />
}<br />
<br />
Integer winnings = 0;<br />
if (b > 0) {<br />
winnings = playSpecificGame(b);<br />
}<br />
<br />
return winnings;<br />
}<br />
};<br />
<br />
private static Integer playSpecificGame(Integer beat) {<br />
Integer winnings = 0;<br />
int factor = 0;<br />
switch (game) {<br />
case "slots":<br />
// 4<br />
factor = (int) (Math.random() * 4);<br />
if (factor > 2) {<br />
winnings = beat * 4;<br />
} else {<br />
winnings = 0;<br />
}<br />
break;<br />
<br />
case "roulette":<br />
// 38<br />
factor = (int) (Math.random() * 38);<br />
if (factor > 19) {<br />
winnings = beat * 38;<br />
} else {<br />
winnings = 0;<br />
}<br />
break;<br />
<br />
case "poker":<br />
// 52<br />
factor = (int) (Math.random() * 52);<br />
if (factor > 26 || (bluff && factor > 13)) {<br />
winnings = beat * 52;<br />
} else {<br />
winnings = 0;<br />
}<br />
break;<br />
default:<br />
winnings = 0;<br />
break;<br />
<br />
}<br />
<br />
return winnings;<br />
}<br />
<br />
public static void main(String[] args) {<br />
<br />
try {<br />
System.out.println("Las Vegas");<br />
System.out.println("Game: " + args[0]);<br />
System.out.println("Bets: " + args[1]);<br />
System.out.println("Bluff: " + args[2]);<br />
<br />
init(args[0], args[1],args[2]);<br />
Integer winnings = 0;<br />
<br />
// read stream<br />
try (Stream<String> stream = Files.lines(betsFile)) {<br />
winnings = stream.mapToInt(play::applyAsInt).sum();<br />
} catch (IOException e) {<br />
System.out.println("Error reading the file: " + e.getLocalizedMessage());<br />
e.printStackTrace();<br />
}<br />
<br />
String resultString = "You Won: " + winnings;<br />
<br />
Path result = Paths.get("win.txt");<br />
Files.write(result, resultString.getBytes(), StandardOpenOption.CREATE);<br />
<br />
} catch (Throwable e) {<br />
System.out.println("Error in process: " + e.getLocalizedMessage());<br />
e.printStackTrace();<br />
}<br />
}<br />
<br />
}<br />
</source><br />
<br />
:bets.txt<br />
<pre style="display:block;font-family:monospace;white-space:pre;margin:1em 0;"><br />
100<br />
50<br />
40<br />
200<br />
10<br />
30<br />
400<br />
</pre><br />
<br />
:result in win.txt: <br />
<pre style="display:block;font-family:monospace;white-space:pre;margin:1em 0;"><br />
You Won: 43160<br />
</pre><br />
<br />
:Java code in Las Vegas:<br />
[[File:JavaBlackBox_LasVegas.zip|JavaBlackBox_LasVegas.zip]]<br />
<br />
== StorageHub == <br />
:StorageHub is the new service for accessing to the user's workspace. Below we show the StorageHubFacilityJava algorithm, it exhibits the interactions with StorageHub through its Rest API:<br />
[[Image:StorageHubFacilityJava0.png|thumb|center|750px|StorageHub Facility Java, SAI]]<br />
<br />
:Indicates the I/O parameters:<br />
[[Image:StorageHubFacilityJava1.png|thumb|center|750px|StorageHub Facility Java I/O parameters, SAI]]<br />
<br />
:Indicates the java version:<br />
[[Image:StorageHubFacilityJava2.png|thumb|center|750px|StorageHub Facility Java version, SAI]]<br />
<br />
:Indicates the code jar:<br />
[[Image:StorageHubFacilityJava3.png|thumb|center|750px|StorageHub Facility Java Code, SAI]]<br />
<br />
<br />
:DataMiner result:<br />
[[Image:StorageHubFacilityJava4.png|thumb|center|750px|StorageHub Facility Java in DataMiner, SAI]]<br />
<br />
:This algorithm shows 5 types of interactions with StorageHub:<br />
:* Get Root Info<br />
:* Get Item Info (requires an itemId as argument1)<br />
:* Get Root Children<br />
:* Get Item Children (requires an itemId as argument1)<br />
:* Item Download (requires an itemId as argument1)<br />
<br />
:Java source code of StorageHubFacilityJava:<br />
:[[File:StorageHubFacilityJava.zip|StorageHubFacilityJava.zip]]<br />
<br />
<br />
[[Category:Statistical Algorithms Importer]]</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Statistical_Algorithms_Importer:_Java_Project&diff=31281Statistical Algorithms Importer: Java Project2018-05-16T14:44:42Z<p>Gianpaolo.coro: </p>
<hr />
<div>{| align="right"<br />
||__TOC__<br />
|}<br />
<br />
This page explains how to create a Java project using two alternative approaches: [https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer:_Java_Project#Black_Box_Integration '''Black-box'''] and [https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer:_Java_Project#White_Box_Integration '''White-box'''] integration. The next sections explain how these work and which cases these two approaches seaddress.<br />
<br />
=Black Box Integration=<br />
<br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox0.png|thumb|center|250px|Java Project, SAI]]<br />
<br />
This is the preferred way for developers who want their processes executions distributed based on the load of the requests. Each process request will run on one dedicated machine and is allowed to use multi-core processing. Black box processed usually do not use the e-Infrastructure resources but "live on their own". '''The Statistical Algorithms Importer (SAI) portlet must be used for this integration'''.<br />
<br />
==Project Configuration==<br />
:Define project's metadata<br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox1.png|thumb|center|750px|Java Info, SAI]]<br />
<br />
:Add input and output parameters and click on "Set Code" to indicate the main file to execute (i.e. the .jar file)<br />
:'''Important: the full class path (including the package path) should be indicated as the FIRST parameter. It should be also indicated as System parameter so that it will appear neither in the GUI nor among the user's inputs.'''<br />
:For example, the default value of the ClassToRun parameter would be '''org.gcube.dataanalysis.SimpleProducer''' should the package of the SimpleProducer class be org.gcube.dataanalysis. If the package is the "default" one, there is no need for this specification (like it is in the example).<br />
[[File:StatisticalAlgorithmsImporter_JavaBlackBox2b.png|thumb|center|750px|Java I/O, SAI]]<br />
<br />
:Add information about the running environment (e.g. Java version etc.) <br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox3.png|thumb|center|750px|Java Interpreter, SAI]]<br />
<br />
:After the [https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer:_Create_Software software creation phase] a Main.R file and a Taget folder are created<br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox4.png|thumb|center|750px|Java Create, SAI]]<br />
<br />
== Example Code ==<br />
:Java code in sample:<br />
<br />
<source lang='java'><br />
/**<br />
* <br />
* @author Giancarlo Panichi<br />
* <br />
*<br />
*/<br />
import java.io.File;<br />
import java.io.FileWriter;<br />
<br />
public class SimpleProducer<br />
{<br />
public static void main(String[] args)<br />
{<br />
try<br />
{<br />
FileWriter fw = new FileWriter(new File("program.txt"));<br />
fw.write("Check: " + args[0]);<br />
fw.close();<br />
}<br />
catch (Exception e)<br />
{<br />
e.printStackTrace();<br />
}<br />
}<br />
}<br />
</source><br />
<br />
==Example Download==<br />
[[File:JavaBlackBox.zip|JavaBlackBox.zip]]<br />
<br />
==Inheritance of Global and Infrastructure Variables==<br />
At each run of the process the '''globalvariables.csv''' file is created locally to the process (i.e. it can be read as ./globalvariables.csv), which contains the following global variables that are meant to allow the process to properly contact the e-Infrastructure services:<br />
<br />
* '''gcube_username''' (the user who run the computation, e.g. gianpaolo.coro)<br />
<br />
* '''gcube_context''' (the VRE the process was run in, e.g. d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab)<br />
<br />
* '''gcube_token''' (the token of the user for the VRE, e.g. 1234-567-890)<br />
<br />
The format of the CSV file is like the one of the following example:<br />
<br />
<source lang='vim'><br />
globalvariable,globalvalue<br />
gcube_username,gianpaolo.coro<br />
gcube_context,/d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab<br />
gcube_token,1234-567-890<br />
</source><br />
<br />
=White Box Integration=<br />
This is the preferred way for developers who want their processes to fully exploit the e-Infrastructure resources, for example to implement Cloud computing using the e-Infrastructure computational resources. This integration modality also allows to fully reuse the Java data mining frameworks integrated by DataMiner, i.e. [https://www.knime.com/ Knime], [https://rapidminer.com/ RapidMiner], [https://www.cs.waikato.ac.nz/ml/weka/ Weka], [https://wiki.gcube-system.org/gcube/Statistical_Manager_Algorithms gCube EcologicalEngine]. '''The Eclipse IDE should be used for this integration'''.<br />
<br />
[https://gcube.wiki.gcube-system.org/gcube/How-to_Implement_Algorithms_for_DataMiner Step-by-step guide to integrate Java processes as white boxes]<br />
<!--<br />
[[Template:Statistical Algorithms Importer]] <br />
--><br />
<br />
<!--<br />
==References==<br />
{{Reflist}} --><br />
<br />
<br />
[[Category:Statistical Algorithms Importer]]</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=File:StatisticalAlgorithmsImporter_JavaBlackBox2b.png&diff=31280File:StatisticalAlgorithmsImporter JavaBlackBox2b.png2018-05-16T14:35:37Z<p>Gianpaolo.coro: Gianpaolo.coro uploaded a new version of File:StatisticalAlgorithmsImporter JavaBlackBox2b.png</p>
<hr />
<div></div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=File:StatisticalAlgorithmsImporter_JavaBlackBox2b.png&diff=31279File:StatisticalAlgorithmsImporter JavaBlackBox2b.png2018-05-16T14:35:13Z<p>Gianpaolo.coro: Gianpaolo.coro uploaded a new version of File:StatisticalAlgorithmsImporter JavaBlackBox2b.png</p>
<hr />
<div></div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=File:StatisticalAlgorithmsImporter_JavaBlackBox2b.png&diff=31278File:StatisticalAlgorithmsImporter JavaBlackBox2b.png2018-05-16T14:34:41Z<p>Gianpaolo.coro: </p>
<hr />
<div></div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Statistical_Algorithms_Importer:_Java_Project&diff=31277Statistical Algorithms Importer: Java Project2018-05-16T14:34:12Z<p>Gianpaolo.coro: </p>
<hr />
<div>{| align="right"<br />
||__TOC__<br />
|}<br />
<br />
This page explains how to create a Java project using two alternative approaches: [https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer:_Java_Project#Black_Box_Integration '''Black-box'''] and [https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer:_Java_Project#White_Box_Integration '''White-box'''] integration. The next sections explain how these work and which cases these two approaches seaddress.<br />
<br />
=Black Box Integration=<br />
<br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox0.png|thumb|center|250px|Java Project, SAI]]<br />
<br />
This is the preferred way for developers who want their processes executions distributed based on the load of the requests. Each process request will run on one dedicated machine and is allowed to use multi-core processing. Black box processed usually do not use the e-Infrastructure resources but "live on their own". '''The Statistical Algorithms Importer (SAI) portlet must be used for this integration'''.<br />
<br />
==Project Configuration==<br />
:Define project's metadata<br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox1.png|thumb|center|750px|Java Info, SAI]]<br />
<br />
:Add input and output parameters and click on "Set Code" to indicate the main file to execute (i.e. the .jar file)<br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox2b.png|thumb|center|750px|Java I/O, SAI]]<br />
<br />
:Add information about the running environment (e.g. Java version etc.) <br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox3.png|thumb|center|750px|Java Interpreter, SAI]]<br />
<br />
:After the [https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer:_Create_Software software creation phase] a Main.R file and a Taget folder are created<br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox4.png|thumb|center|750px|Java Create, SAI]]<br />
<br />
== Example Code ==<br />
:Java code in sample:<br />
<br />
<source lang='java'><br />
/**<br />
* <br />
* @author Giancarlo Panichi<br />
* <br />
*<br />
*/<br />
import java.io.File;<br />
import java.io.FileWriter;<br />
<br />
public class SimpleProducer<br />
{<br />
public static void main(String[] args)<br />
{<br />
try<br />
{<br />
FileWriter fw = new FileWriter(new File("program.txt"));<br />
fw.write("Check: " + args[0]);<br />
fw.close();<br />
}<br />
catch (Exception e)<br />
{<br />
e.printStackTrace();<br />
}<br />
}<br />
}<br />
</source><br />
<br />
==Example Download==<br />
[[File:JavaBlackBox.zip|JavaBlackBox.zip]]<br />
<br />
==Inheritance of Global and Infrastructure Variables==<br />
At each run of the process the '''globalvariables.csv''' file is created locally to the process (i.e. it can be read as ./globalvariables.csv), which contains the following global variables that are meant to allow the process to properly contact the e-Infrastructure services:<br />
<br />
* '''gcube_username''' (the user who run the computation, e.g. gianpaolo.coro)<br />
<br />
* '''gcube_context''' (the VRE the process was run in, e.g. d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab)<br />
<br />
* '''gcube_token''' (the token of the user for the VRE, e.g. 1234-567-890)<br />
<br />
The format of the CSV file is like the one of the following example:<br />
<br />
<source lang='vim'><br />
globalvariable,globalvalue<br />
gcube_username,gianpaolo.coro<br />
gcube_context,/d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab<br />
gcube_token,1234-567-890<br />
</source><br />
<br />
=White Box Integration=<br />
This is the preferred way for developers who want their processes to fully exploit the e-Infrastructure resources, for example to implement Cloud computing using the e-Infrastructure computational resources. This integration modality also allows to fully reuse the Java data mining frameworks integrated by DataMiner, i.e. [https://www.knime.com/ Knime], [https://rapidminer.com/ RapidMiner], [https://www.cs.waikato.ac.nz/ml/weka/ Weka], [https://wiki.gcube-system.org/gcube/Statistical_Manager_Algorithms gCube EcologicalEngine]. '''The Eclipse IDE should be used for this integration'''.<br />
<br />
[https://gcube.wiki.gcube-system.org/gcube/How-to_Implement_Algorithms_for_DataMiner Step-by-step guide to integrate Java processes as white boxes]<br />
<!--<br />
[[Template:Statistical Algorithms Importer]] <br />
--><br />
<br />
<!--<br />
==References==<br />
{{Reflist}} --><br />
<br />
<br />
[[Category:Statistical Algorithms Importer]]</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Statistical_Algorithms_Importer:_FAQ&diff=31263Statistical Algorithms Importer: FAQ2018-05-14T10:07:25Z<p>Gianpaolo.coro: </p>
<hr />
<div>{| align="right"<br />
||__TOC__<br />
|}<br />
<br />
F.A.Q. of Statistical Algorithms Importer (SAI), here are common mistakes we have found.<br />
<br />
== Project Type FAQ ==<br />
<br />
* [[Statistical Algorithms Importer: R Project FAQ|R Project FAQ]]<br />
* [[Statistical Algorithms Importer: Java Project FAQ|Java Project FAQ]]<br />
* [[Statistical Algorithms Importer: Linux-compiled Project FAQ|Linux-compiled Project FAQ]]<br />
* [[Statistical Algorithms Importer: Python Project FAQ|Python Project FAQ]]<br />
* [[Statistical Algorithms Importer: Pre-Installed Project FAQ|Pre-Installed Project FAQ]]<br />
<br />
== Installed Software ==<br />
:A list of pre-installed software on the infrastructure machines is available at this page:<br />
* [[Pre Installed Packages|Pre Installed Packages]]<br />
<br />
== Project Folder ==<br />
It is important that each algorithm has its own project folder. The project folder keeps the code created by the developer, so it is important that each algorithm has its own project folder, different for each algorithm.<br />
<br />
== Parameters ==<br />
It is important that an algorithm always has at least one input and an output parameter.<br />
<br />
== I don't see my algorithm in DataMiner ==<br />
DataMiner portlets store algorithms in the user session, so if an algorithm is deployed but is not visible you must try to exit and reconnect to the portal. Remember, after the deploy a few minutes are needed to upgrade the system.<br />
<br />
== Advanced Input ==<br />
It is possible to indicate spatial inputs or time/date inputs. The details for the definition of these dare are reported in the [[Advanced Input| Advanced Input ]]<br />
<br />
== Update the status of a computation ==<br />
It is possible to update the inner status of a computation by writing a status.txt file locally to the process [[Statistical Algorithms Importer: StatusUpdate| Updating the status of a computation]]<br />
<br />
<br />
[[Category:Statistical Algorithms Importer]]</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Statistical_Algorithms_Importer:_StatusUpdate&diff=31262Statistical Algorithms Importer: StatusUpdate2018-05-14T10:07:18Z<p>Gianpaolo.coro: </p>
<hr />
<div>{| align="right"<br />
||__TOC__<br />
|}<br />
<br />
:This page explains how to update the status of a process from a SAI-integrated algorithm.<br />
<br />
<br />
== Updating the status of a process ==<br />
<br />
It is sufficient to '''write a file named "status.txt" ''' locally to the process indicating '''a number from 0 to 100'''. The DataMiner will transform this information into a WPS status, also visible through the status bar of the DataMiner GUI. The algorithm's status is always forced to 100 by DataMiner at the end of the computation. <br />
<br />
For example, the following R script writes a local ./status.txt file indicating its internal status. <br />
<br />
<source lang = "java"><br />
nseconds <- 60<br />
<br />
nsteps = nseconds/10<br />
<br />
for (i in 1:nsteps){<br />
status = i*100/nsteps<br />
cat("Status",status,"\n")<br />
write(status,file="status.txt")<br />
Sys.sleep(1)<br />
<br />
}<br />
<br />
output="test.txt"<br />
write(nseconds,file=output)<br />
<br />
</source><br />
<br />
<br />
[[Category:Statistical Algorithms Importer]]</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Statistical_Algorithms_Importer&diff=31261Statistical Algorithms Importer2018-05-14T10:05:10Z<p>Gianpaolo.coro: </p>
<hr />
<div>{| align="right"<br />
||__TOC__<br />
|}<br />
<br />
In this guide we describe the Statistical Algorithms Importer web interface.<br />
<br />
== Overview ==<br />
Statistical Algorithms Importer (SAI) is a tool to import algorithms in the D4Science e-Infrastructure. Currently, it supports different types of software integration. SAI separates software development from its deployment in the infrastructure in a very flexible way. After the first deployment, made in collaboration with the e-Infrastructure team, script developers can modify and update their scripts by themselves, without the intervention of the e-Infrastructure team.<br />
<br />
In order to transform an algorithm, three main passages are required: <br />
<br />
1 - Indicate Input, Output and types of a main script orchestrating the process<br />
<br />
2 - Create the Software: this operation creates the interface from the e-Infrastructure service to the script and should be used each time either the interface (I/O) or the name of the algorithm or the required additional packages change<br />
<br />
3 - Publish the Software: this operation communicates to the infrastructure that a newly create software should be put online<br />
<br />
Additionally, the Repackage function can be used when only the internal code of the orchestrating script changes and the algorithms has already been published.<br />
The pages in this Wiki explain the details of these operations. <br />
<br />
[[Image:StatisticalAlgorithmsImporter1.png|thumb|center|800px|Statistical Algorithms Importer (SAI), portlet. Main interface.]]<br />
<br />
== F.A.Q. ==<br />
Please, read our best practices first: [[Statistical Algorithms Importer: FAQ|F.A.Q.]]<br />
<br />
== Demonstration ==<br />
A demonstration video is available [http://data.d4science.org/SkhVR3AyTUNLaStCV2tNdUhsL2VIQ3AwMG1tTjVka3dHbWJQNStIS0N6Yz0 here].<br />
<br />
== Main Steps==<br />
<br />
# [[Statistical Algorithms Importer: Create Project|Creating a new Project]]<br />
# [[Statistical Algorithms Importer: Publish Algorithms|Publishing Algorithms for deployment]]<br />
# [[Statistical Algorithms Importer: Repackage| Repackaging a script]]<br />
# [[Advanced Input| Advanced Input ]]<br />
# [[Statistical Algorithms Importer: StatusUpdate| Updating the status of a computation]]<br />
# [[Statistical Algorithms Importer: FAQ|F.A.Q.]]<br />
# [[Statistical_Algorithms_Importer:_R_Project#Import_Resources_From_GitHub | Import projects from GitHub ]]<br />
<br />
<!--<br />
==References==<br />
{{Reflist}} --><br />
<br />
<br />
<!--<br />
[[Template:Statistical Algorithms Importer]] <br />
--><br />
<br />
[[Category:Statistical Algorithms Importer]]</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Statistical_Algorithms_Importer:_StatusUpdate&diff=31260Statistical Algorithms Importer: StatusUpdate2018-05-14T10:03:57Z<p>Gianpaolo.coro: Created page with "{| align="right" ||__TOC__ |} :This page explains how to update the status of a process from a SAI-integrated algorithm. == Updating the status of a process == It is suff..."</p>
<hr />
<div>{| align="right"<br />
||__TOC__<br />
|}<br />
<br />
:This page explains how to update the status of a process from a SAI-integrated algorithm.<br />
<br />
<br />
== Updating the status of a process ==<br />
<br />
It is sufficient to '''write a file named "status.txt" ''' in the algorithms' process folder indicating '''a number from 0 to 100'''. The DataMiner will transform this information into a WPS status, also visible through the status bar of the DataMiner GUI. The algorithm's status is always forced to 100 by DataMiner at the end of the computation. <br />
<br />
For example, the following R script writes a local ./status.txt file indicating its internal status. <br />
<br />
<source lang = "java"><br />
nseconds <- 60<br />
<br />
nsteps = nseconds/10<br />
<br />
for (i in 1:nsteps){<br />
status = i*100/nsteps<br />
cat("Status",status,"\n")<br />
write(status,file="status.txt")<br />
Sys.sleep(1)<br />
<br />
}<br />
<br />
output="test.txt"<br />
write(nseconds,file=output)<br />
<br />
</source><br />
<br />
<br />
[[Category:Statistical Algorithms Importer]]</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=How_to_Interact_with_the_DataMiner_by_client&diff=31065How to Interact with the DataMiner by client2018-04-09T09:29:49Z<p>Gianpaolo.coro: </p>
<hr />
<div><!-- CATEGORIES --><br />
[[Category:Developer's Guide]]<br />
<!-- END CATEGORIES --><br />
=Prerequisites=<br />
One of the following software is required:<br />
<br />
* Firefox or Google Chrome Web browser<br />
* IDE: Eclipse Java EE IDE for Web Developers. Version: 3.7+<br />
* R 3.3.1<br />
* QGIS<br />
<br />
=Introduction=<br />
Here we show how to invoke an algorithm residing on the DataMiner (DM), from outside an e-Infrastructure or from a client.<br />
<br />
=Notes on Authorization=<br />
The username required to use our clients is the one of the Web portal, e.g. john.smith, whereas the authorization token identifies the VRE to interact with. The token must be generated from the VRE home page through the "Service Authorization Token" tool that is present in the VRE home page. The user name is displayed by the same panel.<br />
<br />
=HTTP interface=<br />
Each DM execution has an HTTP link associated that can be obtained by pressing the "Show" button on the [https://wiki.gcube-system.org/gcube/DataMiner_Manager#Execute_an_Experiment DM interface during the execution of an experiment]. By changing the parameters after DataInputs parameter it is possible to manage new computations. Note that a gCube token is required.<br />
<br />
Examples of WPS requests to the DataMiner D4Science cluster:<br />
<br />
http://dataminer.d4science.org/wps/WebProcessingService?Request=GetCapabilities&Service=WPS&gcube-token=<VRE token><br />
<br />
http://dataminer.d4science.org/wps/WebProcessingService?Request=DescribeProcess&Service=WPS&Version=1.0.0&gcube-token=<VRE token>&Identifier=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.clusterers.DBSCAN<br />
<br />
http://dataminer.d4science.org/wps/WebProcessingService?request=Execute&service=WPS&Version=1.0.0&gcube-token=<VRE token>&lang=en-US&Identifier=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.clusterers.DBSCAN&DataInputs=OccurrencePointsClusterLabel=OccClustersTest;epsilon=10;min_points=1;OccurrencePointsTable=http://goo.gl/VDzpch;FeaturesColumnNames=depthmean|sstmnmax|salinitymean;<br />
<br />
=Java Client=<br />
The [https://wiki.52north.org/Geoprocessing/ClientAPI 52North WPS Client] can be used to interact with DataMiner. Note that interactions should use either basic HTTP authentication or add the gcube-token parameter to the requests.<br />
<br />
=Plain Java Client=<br />
<br />
Here is one example of Java call, that invokes an algorithm asynchronously through a [http://data.d4science.org/S0x1VzJ5YXZaOHNQc3NicTZ1SkpGa01VcEkrcVZmbWdHbWJQNStIS0N6Yz0 template file]<br />
<br />
<source lang="java"><br />
package it.test;<br />
<br />
import java.io.BufferedReader;<br />
import java.io.File;<br />
import java.io.FileReader;<br />
import java.io.IOException;<br />
import java.io.InputStream;<br />
import java.io.InputStreamReader;<br />
import java.io.OutputStream;<br />
import java.io.OutputStreamWriter;<br />
import java.io.Reader;<br />
import java.io.StringWriter;<br />
import java.io.Writer;<br />
import java.net.HttpURLConnection;<br />
import java.net.ProtocolException;<br />
import java.net.URL;<br />
import java.net.URLConnection;<br />
<br />
public class InvokeDataMinerViaPost {<br />
<br />
static String dataMinerUrl = "http://dataminer-prototypes.d4science.org/wps/WebProcessingService?request=Execute&service=WPS&Version=1.0.0";<br />
static String token = "your-token-here";<br />
<br />
private static void pipe(Reader reader, Writer writer) throws IOException {<br />
char[] buf = new char[1024];<br />
int read = 0;<br />
while ((read = reader.read(buf)) >= 0) {<br />
writer.write(buf, 0, read);<br />
}<br />
writer.flush();<br />
}<br />
<br />
public static void postData(Reader data, URL endpoint, Writer output) throws Exception {<br />
HttpURLConnection urlc = null;<br />
try {<br />
urlc = (HttpURLConnection) endpoint.openConnection();<br />
try {<br />
urlc.setRequestMethod("POST");<br />
} catch (ProtocolException e) {<br />
throw new Exception("Shouldn't happen: HttpURLConnection doesn't support POST??", e);<br />
}<br />
urlc.setDoOutput(true);<br />
urlc.setDoInput(true);<br />
urlc.setUseCaches(false);<br />
urlc.setAllowUserInteraction(false);<br />
urlc.setRequestProperty("Content-type", "text/xml; charset=" + "UTF-8");<br />
<br />
OutputStream out = urlc.getOutputStream();<br />
<br />
try {<br />
Writer writer = new OutputStreamWriter(out, "UTF-8");<br />
pipe(data, writer);<br />
writer.close();<br />
} catch (IOException e) {<br />
throw new Exception("IOException while posting data", e);<br />
} finally {<br />
if (out != null)<br />
out.close();<br />
}<br />
<br />
InputStream in = urlc.getInputStream();<br />
try {<br />
Reader reader = new InputStreamReader(in);<br />
pipe(reader, output);<br />
reader.close();<br />
} catch (IOException e) {<br />
throw new Exception("IOException while reading response", e);<br />
} finally {<br />
if (in != null)<br />
in.close();<br />
}<br />
<br />
} catch (IOException e) {<br />
throw new Exception("Connection error (is server running at " + endpoint + " ?): " + e);<br />
} finally {<br />
if (urlc != null)<br />
urlc.disconnect();<br />
}<br />
}<br />
<br />
public static String getStatus(String endpoint) {<br />
String result = null;<br />
<br />
// Send a GET request to the servlet<br />
try {<br />
// Send data<br />
String urlStr = endpoint;<br />
<br />
URL url = new URL(urlStr);<br />
URLConnection conn = url.openConnection();<br />
conn.setConnectTimeout(120000);<br />
conn.setReadTimeout(120000);<br />
<br />
// Get the response<br />
BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));<br />
StringBuffer sb = new StringBuffer();<br />
String line;<br />
while ((line = rd.readLine()) != null) {<br />
sb.append(line);<br />
}<br />
rd.close();<br />
result = sb.toString();<br />
} catch (Exception e) {<br />
e.printStackTrace();<br />
}<br />
<br />
return result;<br />
}<br />
<br />
<br />
<br />
public static void main(String args[]) throws Exception{<br />
String template = "templateDBScan.txt";<br />
// String template = "templateNOV_QRA.txt";<br />
<br />
StringWriter sw = new StringWriter();<br />
FileReader fr = new FileReader(new File(template));<br />
<br />
postData(fr , new URL(dataMinerUrl+"&gcube-token="+token), sw);<br />
<br />
fr.close();<br />
<br />
String answer = sw.toString();<br />
<br />
String statusLocation = answer.substring(answer.indexOf("statusLocation=\"")+"statusLocation=\"".length(), answer.indexOf("\">")); <br />
<br />
System.out.println(sw.toString());<br />
System.out.println(statusLocation);<br />
<br />
String status = getStatus(statusLocation+"&gcube-token="+token);<br />
<br />
while (!(status.contains("wps:ProcessSucceeded") || status.contains("wps:ProcessFailed"))){<br />
System.out.println(status);<br />
status = getStatus(statusLocation+"&gcube-token="+token);<br />
Thread.sleep(5000);<br />
}<br />
<br />
<br />
System.out.println(status);<br />
<br />
if (status.contains("wps:ProcessFailed"))<br />
System.out.println("Process Failed!");<br />
else{<br />
String UrlToOutput = status.substring(status.lastIndexOf("<d4science:Data>")+"<d4science:Data>".length(), status.lastIndexOf("</d4science:Data>"));<br />
System.out.println("Url to output:"+UrlToOutput);<br />
}<br />
}<br />
}<br />
<br />
</source><br />
<br />
<br />
Another template file example is available [http://data.d4science.org/dlhZbHNqazVGbjBQc3NicTZ1SkpGZ21ZbXk4TW1HSDZHbWJQNStIS0N6Yz0 here]<br />
<br />
<br />
=WPS Client=<br />
A WPS Client is also available to invoke algorithms from R. Click [https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/RConfiguration/RD4SFunctions/WPSRConnector.zip this link] to get the client and examples.<br />
<br />
* An example of model (BiOnym) invoked via POST and asynchronously with input data table embedding and uploading is available at [https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/RConfiguration/RD4SFunctions/BiOnym%20from%20R_v3.zip this link]<br />
* An example of long running model (Ichtyop) invoked via POST and asynchronously is available at [https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/RConfiguration/RD4SFunctions/IchtyopClient.zip this link].<br />
* An example of model (XMeans) invoked via POST and asynchronously with input data table embedding and uploading is available at [https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/RConfiguration/RD4SFunctions/XMeansClientExample.zip this link]<br />
<br />
<br />
The template to invoke a process, copies the process description that can be obtained by building a link like this:<br />
<br />
http://<DataMiner Cluster>/wps/WebProcessingService?Request=DescribeProcess&Service=WPS&Version=1.0.0&gcube-token=<your_token>&Identifier=<Process ID><br />
<br />
<br />
For example, the BiOnym process description is: <br />
<br />
http://dataminer-cloud1.d4science.org/wps/WebProcessingService?Request=DescribeProcess&Service=WPS&Version=1.0.0&gcube-token=<your_token>&Identifier=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.generators.BIONYM<br />
<br />
Note, that the following information is required in order to reconstruct a description like the one above in the VRE you are using:<br />
<br />
1. '''Your token''': you can find this in the home page of the VRE;<br />
<br />
2. '''The DataMiner cluster''' (can change depending on the VRE): after executing your target process through the Web interface, press "Show". The DataMiner cluster address is the one reported after "http://";<br />
<br />
3. '''The Process ID''': in the link contained in the "Show" window, the Process ID is the value of the "Identifier=" parameter.<br />
<br />
=QGIS=<br />
QGIS supports a number of clients for WPS. We advice using the [http://geolabs.fr/BlogPost;id=80 GeoLabs WPS plugin] (add the plugin repository at http://geolabs.fr/plugins.xml) and enter the http://dataminer.d4science.org/wps/WebProcessingService endpoint.<br />
<br />
[[Image:Qgis.png|frame|center|QGIS WPS interface]]<br />
<br />
[[Image:Mapscompqgis.png|frame|center|Maps comparison with QGIS]]<br />
<br />
= Related Links =<br />
[[DataMiner_Manager | DataMiner Tutorial]]<br />
<br />
[[Data_Mining_Facilities | Data Mining page]]</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Statistical_Algorithms_Importer&diff=31018Statistical Algorithms Importer2018-03-14T14:46:44Z<p>Gianpaolo.coro: /* Main Steps */</p>
<hr />
<div>{| align="right"<br />
||__TOC__<br />
|}<br />
<br />
In this guide we describe the Statistical Algorithms Importer web interface.<br />
<br />
== Overview ==<br />
Statistical Algorithms Importer (SAI) is a tool to import algorithms in the D4Science e-Infrastructure. Currently, it supports different types of software integration. SAI separates software development from its deployment in the infrastructure in a very flexible way. After the first deployment, made in collaboration with the e-Infrastructure team, script developers can modify and update their scripts by themselves, without the intervention of the e-Infrastructure team.<br />
<br />
In order to transform an algorithm, three main passages are required: <br />
<br />
1 - Indicate Input, Output and types of a main script orchestrating the process<br />
<br />
2 - Create the Software: this operation creates the interface from the e-Infrastructure service to the script and should be used each time either the interface (I/O) or the name of the algorithm or the required additional packages change<br />
<br />
3 - Publish the Software: this operation communicates to the infrastructure that a newly create software should be put online<br />
<br />
Additionally, the Repackage function can be used when only the internal code of the orchestrating script changes and the algorithms has already been published.<br />
The pages in this Wiki explain the details of these operations. <br />
<br />
[[Image:StatisticalAlgorithmsImporter1.png|thumb|center|800px|Statistical Algorithms Importer (SAI), portlet. Main interface.]]<br />
<br />
== F.A.Q. ==<br />
Please, read our best practices first: [[Statistical Algorithms Importer: FAQ|F.A.Q.]]<br />
<br />
== Demonstration ==<br />
A demonstration video is available [http://data.d4science.org/SkhVR3AyTUNLaStCV2tNdUhsL2VIQ3AwMG1tTjVka3dHbWJQNStIS0N6Yz0 here].<br />
<br />
== Main Steps==<br />
<br />
# [[Statistical Algorithms Importer: Create Project|Creating a new Project]]<br />
# [[Statistical Algorithms Importer: Publish Algorithms|Publishing Algorithms for deployment]]<br />
# [[Statistical Algorithms Importer: Repackage| Repackaging a script]]<br />
# [[Advanced Input| Advanced Input ]]<br />
# [[Statistical Algorithms Importer: FAQ|F.A.Q.]]<br />
# [[Statistical_Algorithms_Importer:_R_Project#Import_Resources_From_GitHub | Import projects from GitHub ]]<br />
<br />
<!--<br />
==References==<br />
{{Reflist}} --><br />
<br />
<br />
<!--<br />
[[Template:Statistical Algorithms Importer]] <br />
--><br />
<br />
[[Category:Statistical Algorithms Importer]]</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=How-to_Implement_Algorithms_for_DataMiner&diff=30640How-to Implement Algorithms for DataMiner2017-12-22T10:44:27Z<p>Gianpaolo.coro: </p>
<hr />
<div><!-- CATEGORIES --><br />
[[Category:Developer's Guide]]<br />
<!-- END CATEGORIES --><br />
=Prerequisites=<br />
IDE: Eclipse Java EE IDE for Web Developers. Version: 3.7+<br />
<br />
We advice you to also follow this video:<br />
<br />
http://i-marine.eu/Content/eTraining.aspx?id=e1777006-a08c-49ad-b2e6-c13e094f27d4<br />
<br />
Maven<ref>[https://maven.apache.org/guides/getting-started/index.html Maven Tutorial]</ref><br />
repository configuration: [[File:settings.xml|settings.xml]]<br />
<br />
Hello World Algorithm: [[File:hello-world-algorithm.zip|hello-world-algorithm.zip]]<br />
<br />
=Step by Step=<br />
Let's start by creating a project using the eclipse IDE that is mavenized according to our [http://gcube.wiki.gcube-system.org/gcube/index.php/Creating_gCube_Maven_components:_How-To indications].<br />
After having mavenized the project in eclipse you have to put dependencies.<br />
====Maven coordinates ====<br />
The maven artifact coordinates are: <br />
<source lang="java"><br />
<dependencyManagement><br />
<dependencies><br />
<dependency><br />
<groupId>org.gcube.distribution</groupId><br />
<artifactId>gcube-bom</artifactId><br />
<version>LATEST</version><br />
<type>pom</type><br />
<scope>import</scope><br />
</dependency><br />
</dependencies><br />
</dependencyManagement><br />
<br />
<br />
<dependencies> <br />
<br />
<dependency><br />
<groupId>org.gcube.dataanalysis</groupId><br />
<artifactId>ecological-engine</artifactId><br />
<version>[1.6.0-SNAPSHOT,2.0.0-SNAPSHOT)</version><br />
<scope>provided</scope><br />
</dependency><br />
<br />
<dependency><br />
<groupId>junit</groupId><br />
<artifactId>junit</artifactId><br />
<version>[4.12,)</version><br />
<scope>test</scope><br />
</dependency><br />
<br />
<dependency><br />
<groupId>org.slf4j</groupId><br />
<artifactId>slf4j-api</artifactId><br />
</dependency><br />
<br />
<dependency><br />
<groupId>org.slf4j</groupId><br />
<artifactId>slf4j-log4j12</artifactId><br />
<version>1.7.5</version><br />
<scope>test</scope><br />
</dependency><br />
<br />
.....<br />
<br />
</dependencies><br />
</source><br />
<br />
And add '''BasicConfigurator.configure();''' at the beginning of your test methods (AND ONLY IN THE TEST METHODS) to activate the logs.<br />
<br />
Lets start creating a new call which implements a basic algorithm; it will be executed by the DataMiner.<br />
The next step is to extend a basic interface <code>StandardLocalExternalAlgorithm</code>.<br />
The following snippet shows unimplemented interface methods that we are going to fulfill.<br />
<source lang="java"><br />
public class SimpleAlgorithm extends StandardLocalExternalAlgorithm{<br />
<br />
@Override<br />
public void init() throws Exception {<br />
// TODO Auto-generated method stub <br />
}<br />
@Override<br />
public String getDescription() {<br />
// TODO Auto-generated method stub<br />
return null;<br />
}<br />
@Override<br />
protected void process() throws Exception {<br />
// TODO Auto-generated method stub<br />
<br />
}<br />
@Override<br />
protected void setInputParameters() {<br />
// TODO Auto-generated method stub<br />
<br />
}<br />
@Override<br />
public void shutdown() {<br />
// TODO Auto-generated method stub <br />
}<br />
@Override<br />
public StatisticalType getOutput() {<br />
return null;<br />
}<br />
}<br />
</source><br />
The <code>init()</code> is the initialization method. In this simple example we need to initialize the loging facility and we use the logger from the ecological engine library. In case the algorithm uses a database, we have to open its connection in this method.<br />
The <code>shutdown()</code> closes database connection.<br />
In the <code>getDescription()</code> method we add a simple description for the algorithm.<br />
<br />
=Customize input visualization =<br />
==== String input parameters ====<br />
The user's input is obtained by calling from <code>setInputParameters()</code> the method addStringInput with following parameters:<br />
* name of the variable ;<br />
* description for the variable;<br />
* default value;<br />
<br />
User input is retrieved using <code>getInputParameter()</code> passing name used as parameter into <code>setInputParameters()</code>.<br />
<source lang="java"><br />
protected void setInputParameters() {<br />
addStringInput(NameOfVariable, "Description", "DefaultInput");<br />
<br />
}<br />
</source><br />
The input parameter will be automatically passed by DataMiner to the procedure.<br />
In particular, to process the method we can retrieve such parameter by name that we set in the addStringInput method.<br />
<source lang="java"><br />
@Override<br />
protected void process() throws Exception {<br />
....<br />
String userInputValue = getInputParameter(NameOfVariable);<br />
}<br />
</source><br />
<br />
==== Combo box input parameter ====<br />
In order to obtain a combo box we have to define a enumerator that contains the possible<br />
choices that could be selected in the combo box and you have to pass it to the method <code>addEnumerateInput</code> as follows:<br />
<br />
<source lang="java"><br />
public enum Enum {<br />
FIRST_ENUM,<br />
SECOND_ENUM<br />
}<br />
<br />
protected void setInputParameters() {<br />
addEnumerateInput(Enum.values(), variableName, "Description",<br />
Enum.FIRST_ENUM.name());<br />
}<br />
</source><br />
<code>addEnumerateInput</code> parameters are respectively:<br />
* values of declared enumerator;<br />
* name of variable used to extract value insert by user;<br />
* description of value;<br />
* default value visualized in comboBox<br />
<br />
==== File input parameter ====<br />
User can be upload his data in the DataMiner as file. After the uploading of a file, it's possible to use uploaded data as input for an algorithm.<br />
<source lang="java"><br />
@Override<br />
protected void setInputParameters() {<br />
inputs.add(new PrimitiveType(File.class.getName(), null, PrimitiveTypes.FILE, <br />
"inputFileParameterName", "Input File Description", "Input File Name")); <br />
}<br />
<br />
@Override<br />
protected void process() throws Exception {<br />
String fileParameter = getInputParameter("inputFileParameterName");<br />
FileInputStream fileStream = new FileInputStream(fileParameter);<br />
}<br />
<br />
</source><br />
<br />
==== Import input from the DataMiner database ====<br />
User can be upload his data in the DataMiner "Access to the Data Space" Section.<br />
After the uploading of a file (for example csv file), it's possible to use uploaded data as input for an algorithm.<br />
In order to select the columns values of a table that is extrapolated from csv, an algorithm developer fulfills the methods in the following way:<br />
<source lang="java"><br />
<br />
@Override<br />
protected void setInputParameters() {<br />
List<TableTemplates> templates = new ArrayList<TableTemplates>();<br />
templates.add(TableTemplates.GENERIC);<br />
InputTable tinput = new InputTable(templates, "Table","Table Description");<br />
ColumnTypesList columns = new ColumnTypesList("Table","Columns", "Selceted Columns Description", false);<br />
inputs.add(tinput);<br />
inputs.add(columns);<br />
DatabaseType.addDefaultDBPars(inputs);<br />
<br />
}<br />
<br />
@Override<br />
protected void process() throws Exception {<br />
{<br />
config.setParam("DatabaseDriver", "org.postgresql.Driver");<br />
SessionFactory dbconnection = DatabaseUtils.initDBSession(config);<br />
String[] columnlist = columnnames.split(AlgorithmConfiguration.getListSeparator());<br />
List<Object> speciesList = DatabaseFactory.executeSQLQuery("select " + columnlist[0]+ " from " + tablename, dbconnection);<br />
}<br />
</source><br />
<br />
===Advanced Input===<br />
It is possible to indicate spatial inputs or time/date inputs. The details for the definition of these dare are reported in the [[Advanced_Input| Advanced Input page]]<br />
.<br />
<br />
= Case of algorithms using databases =<br />
In order to use a database it is required to call, into <code>setInputParameters()</code>, the method <code>addRemoteDatabaseInput()</code>.<br />
An important step is to pass as first parameter the name of the Runtime Resource addressing the database. <br />
The DataMiner automatically retrieves thew following parameters from the runtime resource: url ,user and password. Into the process method, before database connection, url,user and password will be retrieve using <code>getInputParameter</code>. Each of them is retrieved using the name and passing it into <code>addRemoteDatabaseInput</code> as parameters.<br />
<source lang="java"><br />
@Override<br />
protected void setInputParameters() { <br />
... <br />
addRemoteDatabaseInput("Obis2Repository", urlParameterName,userParameterName, passwordParameterName, "driver", "dialect");<br />
<br />
@Override<br />
protected void process() throws Exception {<br />
...<br />
<br />
String databaseJdbc = getInputParameter(urlParameterName);<br />
String databaseUser = getInputParameter(userParameterName);<br />
String databasePwd = getInputParameter(passwordParameterName);<br />
<br />
connection = DriverManager.getConnection(databaseJdbc, databaseUser,databasePwd);<br />
...<br />
<br />
}<br />
<br />
</source><br />
<br />
= Customize output =<br />
The last step is to set and to specify output of procedure.<br />
For this purpose we override the method <code>getOutput()</code> which return StatisticalType.<br />
First output parameter we instantiate is a PrimitiveType object that wraps a string; so, we set type as string.<br />
We associate name and description to the output value.<br />
We can istantiate a second output as an another PrimitiveType<br />
We set them as a map which will keep the order of the parameter used to store both output.<br />
We add both the output object into the map.<br />
<br />
<code>getOutput()</code> procedure which will invoke DataMiner to understand type of the output object and at this point in the ecological engine library the algorithm will be indexed with the name set in the file of property.<br />
<br />
==== String Output ====<br />
<br />
In ordert to have a string as output you have to create a <code>PrimitiveType</code> as follows:<br />
<source lang="java"><br />
<br />
@Override<br />
public StatisticalType getOutput() {<br />
….<br />
PrimitiveType val = new PrimitiveType(String.class.getName(), myString , PrimitiveTypes.STRING, stringName, defaultValue);<br />
return val;<br />
<br />
}<br />
<br />
</source><br />
<br />
==== Bar Chart Output ====<br />
In order to create an Histogram Chart you have to fulfill a <code>DafaultCategoryDataser</code> object and use it to create chart<br />
<source lang="java"><br />
<br />
DefaultCategoryDataset dataset;<br />
…<br />
dataset.addValue(...); <br />
….<br />
<br />
<br />
@Override<br />
public StatisticalType getOutput() {<br />
….<br />
HashMap<String, Image> producedImages = new HashMap<String, Image>();<br />
JFreeChart chart = HistogramGraph.createStaticChart(dataset);<br />
Image image = ImageTools.toImage(chart.createBufferedImage(680, 420));<br />
producedImages.put("Species Observations", image);<br />
…<br />
}<br />
<br />
</source><br />
<br />
==== Timeseries Chart Output ====<br />
<br />
In order to create a TimeSeries Chart you have to fulfill a <code>DafaultCategoryDataser</code> object and use it to create the chart.<br />
The second parameter of createStatiChart method is the format of time.<br />
<source lang="java"><br />
<br />
DefaultCategoryDataset dataset;<br />
…<br />
dataset.addValue(...); <br />
….<br />
@Override<br />
public StatisticalType getOutput() {<br />
...<br />
HashMap<String, Image> producedImages = new HashMap<String, Image>();<br />
JFreeChart chart = TimeSeriesGraph.createStaticChart(dataset, "yyyy");<br />
Image image = ImageTools.toImage(chart.createBufferedImage(680, 420));<br />
producedImages.put("TimeSeries chart", image);<br />
... <br />
}<br />
<br />
<br />
</source><br />
<br />
==== File Output ====<br />
In order to create a results file that user can download, algorithm developers have to add following code:<br />
<source lang="java"><br />
protected String fileName;<br />
protected BufferedWriter out;<br />
<br />
@Override<br />
protected void process() throws Exception {<br />
//Note you must add timestamp to the file name <br />
//<br />
fileName = super.config.getPersistencePath() + "results.csv";<br />
out = new BufferedWriter(new FileWriter(fileName));<br />
out.write(results);<br />
out.newLine();<br />
}<br />
<br />
@Override<br />
public StatisticalType getOutput() {<br />
...<br />
PrimitiveType file = new PrimitiveType(File.class.getName(), new File(fileName), PrimitiveTypes.FILE, "Description ", "Default value");<br />
map.put("Output",file);<br />
...<br />
}<br />
</source><br />
<br />
= Test the algorithm = <br />
This is a template example to test an algorithm from Eclipse. Download the following folder https://goo.gl/r16rfF and put it locally to the code. <br />
<br />
<source lang="java"><br />
package org.gcube.dataanalysis.ecoengine.test.regression;<br />
<br />
import java.util.List;<br />
<br />
import org.gcube.dataanalysis.ecoengine.configuration.AlgorithmConfiguration;<br />
import org.gcube.dataanalysis.ecoengine.evaluation.bioclimate.InterpolateTables.INTERPOLATIONFUNCTIONS;<br />
import org.gcube.dataanalysis.ecoengine.interfaces.ComputationalAgent;<br />
import org.gcube.dataanalysis.ecoengine.interfaces.Transducerer;<br />
<br />
public class TestTransducers {<br />
<br />
public static void main(String[] args) throws Exception {<br />
System.out.println("TEST 1");<br />
ComputationalAgent computationalAgent = new yourClassName();<br />
computationalAgent.setConfiguration(testConfigLocal());<br />
computationalAgent.init();<br />
Regressor.process(computationalAgent);<br />
computationalAgent.shutdown();<br />
}<br />
<br />
private static AlgorithmConfiguration testConfigLocal() {<br />
<br />
AlgorithmConfiguration config = Regressor.getConfig();<br />
config.setAgent("OCCURRENCES_DUPLICATES_DELETER");<br />
<br />
config.setParam("longitudeColumn", "decimallongitude");<br />
config.setParam("latitudeColumn", "decimallatitude");<br />
config.setParam("recordedByColumn", "recordedby");<br />
config.setParam("scientificNameColumn", "scientificname");<br />
config.setParam("eventDateColumn", "eventdate");<br />
config.setParam("lastModificationColumn", "modified");<br />
config.setParam("OccurrencePointsTableName", "whitesharkoccurrences2");<br />
config.setParam("finalTableName", "whitesharkoccurrencesnoduplicates");<br />
config.setParam("spatialTolerance", "0.5");<br />
config.setParam("confidence", "80");<br />
<br />
return config;<br />
}<br />
<br />
}<br />
</source><br />
<br />
= Properties File and Deploy =<br />
In order to deploy an algorithm we must create:<br />
* the jar corresponding to the eclipse Java project containing the algorithm;<br />
* a file of property containing the name you want the algorithm to be displayed on the GUI and the classpath to algorithm class. E.g. MY_ALGORITHM=org.gcube.cnr.Myalgorithm<br />
<br />
You must provide these two files to the i-Marine team. They will move the algorithm onto a DataMiner instance and the interface will be automatically generated.<br />
<br />
In the following example, inside the src/main/java folder, the package <code>org.gcube.dataanalysis.myAlgorithms</code> exists that contains the class <code>SimpleAlgorithm</code> implementing an algorithm.<br />
<code><br />
SIMPLE_ALGORITHM=org.gcube.dataanalysis.myrAlgorithms.SimpleAlgorithm<br />
</code><br />
<br />
= Complete Example with multiple outputs =<br />
<source lang="java"><br />
public class AbsoluteSpeciesBarChartsAlgorithm extends<br />
StandardLocalExternalAlgorithm {<br />
LinkedHashMap<String, StatisticalType> map = new LinkedHashMap<String, StatisticalType>();<br />
static String databaseName = "DatabaseName";<br />
static String userParameterName = "DatabaseUserName";<br />
static String passwordParameterName = "DatabasePassword";<br />
static String urlParameterName = "DatabaseURL";<br />
private String firstSpeciesNumber="Species";<br />
private String yearStart="Starting_year";<br />
private String yearEnd="Ending_year";<br />
private int speciesNumber;<br />
private DefaultCategoryDataset defaultcategorydataset;<br />
@Override<br />
public void init() throws Exception {<br />
AnalysisLogger.getLogger().debug("Initialization"); <br />
}<br />
<br />
@Override<br />
public String getDescription() {<br />
return "Algorithm returning bar chart of most observed species in a specific years range (with respect to the OBIS database)";<br />
}<br />
<br />
@Override<br />
protected void process() throws Exception {<br />
defaultcategorydataset = new DefaultCategoryDataset();<br />
String driverName = "org.postgresql.Driver";<br />
String tmp=getInputParameter(firstSpeciesNumber);<br />
<br />
speciesNumber = Integer.parseInt(tmp);<br />
Class driverClass = Class.forName(driverName);<br />
Driver driver = (Driver) driverClass.newInstance();<br />
String databaseJdbc = getInputParameter(urlParameterName);<br />
String year_start = getInputParameter(yearStart);<br />
String year_end = getInputParameter(yearEnd);<br />
<br />
String databaseUser = getInputParameter(userParameterName);<br />
String databasePwd = getInputParameter(passwordParameterName);<br />
Connection connection = null;<br />
connection = DriverManager.getConnection(databaseJdbc, databaseUser,<br />
databasePwd);<br />
Statement stmt = connection.createStatement();<br />
String query = "SELECT tname, sum(count)AS count FROM public.count_species_per_year WHERE year::integer >="<br />
+ year_start<br />
+ "AND year::integer <="<br />
+ year_end<br />
+ "GROUP BY tname ORDER BY count desc;";<br />
ResultSet rs = stmt.executeQuery(query);<br />
int i =0;<br />
String s = "Species";<br />
while (rs.next()&& i<speciesNumber) {<br />
<br />
String tname = rs.getString("tname");<br />
String count = rs.getString("count");<br />
int countOcc=Integer.parseInt(count);<br />
<br />
// First output (list of string)<br />
PrimitiveType val = new PrimitiveType(String.class.getName(), count, PrimitiveTypes.STRING, tname, tname);<br />
map.put(tname, val); <br />
if(i<16)<br />
defaultcategorydataset.addValue(countOcc,s,tname); <br />
else<br />
break;<br />
i++;<br />
<br />
}<br />
connection.close();<br />
<br />
<br />
<br />
}<br />
<br />
@Override<br />
protected void setInputParameters() {<br />
addStringInput(firstSpeciesNumber,<br />
"Number of shown species", "10");<br />
addStringInput(yearStart, "Starting year of observations",<br />
"1800");<br />
addStringInput(yearEnd, "Ending year of observations", "2020");<br />
addRemoteDatabaseInput("Obis2Repository", urlParameterName,<br />
userParameterName, passwordParameterName, "driver", "dialect");<br />
<br />
<br />
}<br />
<br />
@Override<br />
public void shutdown() {<br />
AnalysisLogger.getLogger().debug("Shutdown"); <br />
}<br />
<br />
<br />
@Override<br />
public StatisticalType getOutput() {<br />
PrimitiveType p = new PrimitiveType(Map.class.getName(), PrimitiveType.stringMap2StatisticalMap(outputParameters), PrimitiveTypes.MAP, "Discrepancy Analysis","");<br />
AnalysisLogger.getLogger().debug("MapsComparator: Producing Gaussian Distribution for the errors"); <br />
//build image:<br />
HashMap<String, Image> producedImages = new HashMap<String, Image>();<br />
<br />
JFreeChart chart = HistogramGraph.createStaticChart(defaultcategorydataset);<br />
Image image = ImageTools.toImage(chart.createBufferedImage(680, 420));<br />
producedImages.put("Species Observations", image);<br />
<br />
PrimitiveType images = new PrimitiveType(HashMap.class.getName(), producedImages, PrimitiveTypes.IMAGES, "ErrorRepresentation", "Graphical representation of the error spread");<br />
<br />
//end build image<br />
AnalysisLogger.getLogger().debug("Bar Charts Species Occurrences Produced");<br />
//collect all the outputs<br />
<br />
map.put("Result", p);<br />
map.put("Images", images);<br />
<br />
//generate a primitive type for the collection<br />
PrimitiveType output = new PrimitiveType(HashMap.class.getName(), map, PrimitiveTypes.MAP, "ResultsMap", "Results Map");<br />
<br />
<br />
return output;<br />
}<br />
<br />
}<br />
<br />
</source><br />
<br />
=Integrating R Scripts=<br />
DataMiner (DM) supports R scripts integration. This section explains how to integrate R scripts that will be executed by one single powerful machine in sequential mode. The calculation will be distributed on one of the machines that make up the DataMiner system, and the DM will automatically account for multi-users requests management. This section does not deal with parallel processing enabled for the script, which will be discussed later.<br />
<br />
In the Eclipse project, download the following configuration folder: http://goo.gl/bNKrZK<br />
Then add the following maven dependency:<br />
<br />
<source lang="java"><br />
<dependency><br />
<groupId>org.gcube.dataanalysis</groupId><br />
<artifactId>ecological-engine-smart-executor</artifactId><br />
<version>[1.0.0-SNAPSHOT,2.0.0)</version><br />
</dependency><br />
</source><br />
<br />
Then copy an R script inside the cfg folder. The DM framework assumes that the R file (i) accepts an input file whose name is hard-coded in the script, (ii) produces an output file whose name is hard-coded in the script, (iii) requires an R context made up of user's variables, (iv) possibly requires custom adjustment to the code.<br />
<br />
The DM framework facilitates the call to the script by adding context variables "on the fly" and managing multi-user synchronous calls. This mechanism is performed by generating new on-the-fly temporary R scripts for each user. The DM will be also responsible for distributing the script on one powerful machine. Required packages are assumed to be preinstalled on the backend system.<br />
<br />
One example of an algorithm calling a complex interpolation model is the following:<br />
<br />
<source lang="java"><br />
package org.gcube.dataanalysis.executor.rscripts;<br />
<br />
import java.io.File;<br />
import java.util.HashMap;<br />
import java.util.LinkedHashMap;<br />
<br />
import org.gcube.contentmanagement.lexicalmatcher.utils.AnalysisLogger;<br />
import org.gcube.dataanalysis.ecoengine.datatypes.PrimitiveType;<br />
import org.gcube.dataanalysis.ecoengine.datatypes.StatisticalType;<br />
import org.gcube.dataanalysis.ecoengine.datatypes.enumtypes.PrimitiveTypes;<br />
import org.gcube.dataanalysis.ecoengine.interfaces.StandardLocalExternalAlgorithm;<br />
import org.gcube.dataanalysis.executor.util.RScriptsManager;<br />
<br />
public class SGVMS_Interpolation extends StandardLocalExternalAlgorithm {<br />
<br />
private static int maxPoints = 10000;<br />
public enum methodEnum { cHs, SL};<br />
RScriptsManager scriptmanager;<br />
String outputFile;<br />
<br />
@Override<br />
public void init() throws Exception {<br />
AnalysisLogger.getLogger().debug("Initializing SGVMS_Interpolation");<br />
}<br />
<br />
@Override<br />
public String getDescription() {<br />
return "An interpolation method relying on the implementation by the Study Group on VMS (SGVMS). The method uses two interpolation approached to simulate vessels points at a certain temporal resolution. The input is a file in TACSAT format uploaded on the DataMiner. The output is another TACSAT file containing interpolated points." +<br />
"The underlying R code has been extracted from the SGVM VMSTools framework. This algorithm comes after a feasibility study (http://goo.gl/risQre) which clarifies the features an e-Infrastructure adds to the original scripts. Limitation: the input will be processed up to "+maxPoints+" vessels trajectory points.";<br />
}<br />
<br />
@Override<br />
protected void process() throws Exception {<br />
<br />
status = 0;<br />
//instantiate the R Script executor<br />
scriptmanager = new RScriptsManager();<br />
//this is the script name<br />
String scriptName = "interpolateTacsat.r";<br />
//absolute path to the input, provided by the DM <br />
String inputFile = config.getParam("InputFile");<br />
<br />
AnalysisLogger.getLogger().debug("Starting SGVM Interpolation-> Config path "+config.getConfigPath()+" Persistence path: "+config.getPersistencePath());<br />
//default input and outputs <br />
String defaultInputFileInTheScript = "tacsat.csv";<br />
String defaultOutputFileInTheScript = "tacsat_interpolated.csv";<br />
//input parameters: represent the context of the script. Values will be assigned in the R environment.<br />
LinkedHashMap<String,String> inputParameters = new LinkedHashMap<String, String>();<br />
inputParameters.put("npoints",config.getParam("npoints"));<br />
inputParameters.put("interval",config.getParam("interval"));<br />
inputParameters.put("margin",config.getParam("margin"));<br />
inputParameters.put("res",config.getParam("res"));<br />
inputParameters.put("fm",config.getParam("fm"));<br />
inputParameters.put("distscale",config.getParam("distscale"));<br />
inputParameters.put("sigline",config.getParam("sigline"));<br />
inputParameters.put("minspeedThr",config.getParam("minspeedThr"));<br />
inputParameters.put("maxspeedThr",config.getParam("maxspeedThr"));<br />
inputParameters.put("headingAdjustment",config.getParam("headingAdjustment"));<br />
inputParameters.put("equalDist",config.getParam("equalDist").toUpperCase());<br />
//add static context variables<br />
inputParameters.put("st", "c(minspeedThr,maxspeedThr)");<br />
inputParameters.put("fast", "TRUE");<br />
inputParameters.put("method", "\""+config.getParam("method")+"\"");<br />
<br />
AnalysisLogger.getLogger().debug("Starting SGVM Interpolation-> Input Parameters: "+inputParameters);<br />
//if other code injection is required, put the strings to substitute as keys and the substituting ones as values<br />
HashMap<String,String> codeInjection = null;<br />
//force the script to produce an output file, otherwise generate an exception <br />
boolean scriptMustReturnAFile = true;<br />
boolean uploadScriptOnTheInfrastructureWorkspace = false; //the DataMiner service will manage the upload<br />
AnalysisLogger.getLogger().debug("SGVM Interpolation-> Executing the script ");<br />
status = 10;<br />
//execute the script in multi-user mode<br />
scriptmanager.executeRScript(config, scriptName, inputFile, inputParameters, defaultInputFileInTheScript, defaultOutputFileInTheScript, codeInjection, scriptMustReturnAFile,uploadScriptOnTheInfrastructureWorkspace, config.getConfigPath());<br />
//assign the file path to an output variable for the DM<br />
outputFile = scriptmanager.currentOutputFileName;<br />
AnalysisLogger.getLogger().debug("SGVM Interpolation-> Output File is "+outputFile);<br />
status = 100;<br />
}<br />
<br />
@Override<br />
protected void setInputParameters() {<br />
//declare the input parameters the user will set: they will basically correspond to the R context<br />
inputs.add(new PrimitiveType(File.class.getName(), null, PrimitiveTypes.FILE, "InputFile", "Input file in TACSAT format. E.g. http://goo.gl/i16kPw"));<br />
addIntegerInput("npoints", "The number of pings or positions required between each real or actual vessel position or ping", "10");<br />
addIntegerInput("interval", "Average time in minutes between two adjacent datapoints", "120");<br />
addIntegerInput("margin", "Maximum deviation from specified interval to find adjacent datapoints (tolerance)", "10");<br />
addIntegerInput("res", "Number of points to use to create interpolation (including start and end point)", "100");<br />
addEnumerateInput(methodEnum.values(), "method","Set to cHs for cubic Hermite spline or SL for Straight Line interpolation", "cHs");<br />
addDoubleInput("fm", "The FM parameter in cubic interpolation", "0.5");<br />
addIntegerInput("distscale", "The DistScale parameter for cubic interpolation", "20");<br />
addDoubleInput("sigline", "The Sigline parameter in cubic interpolation", "0.2");<br />
addDoubleInput("minspeedThr", "A filter on the minimum speed to take into account for interpolation", "2");<br />
addDoubleInput("maxspeedThr", "A filter on the maximum speed to take into account for interpolation", "6");<br />
addIntegerInput("headingAdjustment", "Parameter to adjust the choice of heading depending on its own or previous point (0 or 1). Set 1 in case the heading at the endpoint does not represent the heading of the arriving vessel to that point but the departing vessel.", "0");<br />
inputs.add(new PrimitiveType(Boolean.class.getName(), null, PrimitiveTypes.BOOLEAN, "equalDist", "Whether the number of positions returned should be equally spaced or not", "true"));<br />
}<br />
<br />
@Override<br />
public StatisticalType getOutput() {<br />
//return the output file by the procedure to the DM<br />
PrimitiveType o = new PrimitiveType(File.class.getName(), new File(outputFile), PrimitiveTypes.FILE, "OutputFile", "Output file in TACSAT format.");<br />
return o;<br />
}<br />
<br />
@Override<br />
public void shutdown() {<br />
//in the case of forced shutdown, stop the R process<br />
if (scriptmanager!=null)<br />
scriptmanager.stop();<br />
System.gc();<br />
}<br />
<br />
}<br />
</source><br />
<br />
In order to test the above algorithm, just modify the "transducerers.properties" file inside the cfg folder by adding the following string:<br />
<br />
SGVM_INTERPOLATION=org.gcube.dataanalysis.executor.rscripts.SGVMS_Interpolation<br />
<br />
which will assign a name to the algorithm. Then a test class for this algorithm will be the following:<br />
<br />
<source lang="java"><br />
package org.gcube.dataanalysis.executor.tests;<br />
<br />
import java.util.List;<br />
<br />
import org.gcube.dataanalysis.ecoengine.configuration.AlgorithmConfiguration;<br />
import org.gcube.dataanalysis.ecoengine.datatypes.PrimitiveType;<br />
import org.gcube.dataanalysis.ecoengine.datatypes.StatisticalType;<br />
import org.gcube.dataanalysis.ecoengine.interfaces.ComputationalAgent;<br />
import org.gcube.dataanalysis.ecoengine.test.regression.Regressor;<br />
<br />
public class TestSGVMInterpolation {<br />
<br />
public static void main(String[] args) throws Exception {<br />
// setup the configuration<br />
AlgorithmConfiguration config = new AlgorithmConfiguration();<br />
// set the path to the cfg folder and to the PARALLEL_PROCESSING folder<br />
config.setConfigPath("./cfg/");<br />
config.setPersistencePath("./PARALLEL_PROCESSING");<br />
//set the user's inputs. They will passed by the DM to the script in the following way:<br />
config.setParam("InputFile", "<absolute path to the file>/tacsatmini.csv"); //put the absolute path to the input file<br />
config.setParam("npoints", "10");<br />
config.setParam("interval", "120");<br />
config.setParam("margin", "10");<br />
config.setParam("res", "100");<br />
config.setParam("method", "SL");<br />
config.setParam("fm", "0.5");<br />
config.setParam("distscale", "20");<br />
config.setParam("sigline", "0.2");<br />
config.setParam("minspeedThr", "2");<br />
config.setParam("maxspeedThr", "6");<br />
config.setParam("headingAdjustment", "0");<br />
config.setParam("equalDist", "true");<br />
<br />
//set the scope and the user (optional for this test)<br />
config.setGcubeScope( "/gcube/devsec/devVRE");<br />
config.setParam("ServiceUserName", "test.user");<br />
<br />
//set the name of the algorithm to call, as is is in the transducerer.properties file<br />
config.setAgent("SGVM_INTERPOLATION");<br />
<br />
//recall the transducerer with the above name <br />
ComputationalAgent transducer = new SGVMS_Interpolation();<br />
tansducer.setConfiguration(config);<br />
<br />
//init the transducer<br />
transducer.init();<br />
//start the process<br />
Regressor.process(transducer);<br />
//retrieve the output<br />
StatisticalType st = transducer.getOutput();<br />
System.out.println("st:"+((PrimitiveType)st).getContent());<br />
}<br />
<br />
}<br />
</source><br />
<br />
=Enabling Cloud Computing for R Scripts=<br />
In the case of a process running in the Infrastructure and using Cloud computing, you have to extend the ActorNode class, define how to setup the process, chunkize the input space, run the script and perform the Reduce phase.<br />
These steps are performed using the following methods respectively:<br />
<br />
* setup(AlgorithmConfiguration config)<br />
* getNumberOfRightElements() <br />
* getNumberOfLeftElements()<br />
* postProcess(boolean manageDuplicates, boolean manageFault)<br />
<br />
<source lang="java"><br />
package org.gcube.dataanalysis.executor.nodes.algorithms;<br />
<br />
public class LWR extends ActorNode {<br />
<br />
public String destinationTable;<br />
public String destinationTableLabel;<br />
public String originTable;<br />
public String familyColumn;<br />
public int count;<br />
<br />
public float status = 0;<br />
<br />
//specify the kind of parallel process: the following performs a matrix-to-matrix comparison<br />
@Override<br />
public ALG_PROPS[] getProperties() {<br />
ALG_PROPS[] p = { ALG_PROPS.PHENOMENON_VS_PARALLEL_PHENOMENON };<br />
return p;<br />
}<br />
<br />
@Override<br />
public String getName() {<br />
return "LWR";<br />
}<br />
<br />
@Override<br />
public String getDescription() {<br />
return "An algorithm to estimate Length-Weight relationship parameters for marine species, using Bayesian methods. Runs an R procedure. Based on the Cube-law theory.";<br />
}<br />
<br />
@Override<br />
public List<StatisticalType> getInputParameters() {<br />
List<TableTemplates> templateLWRInput = new ArrayList<TableTemplates>();<br />
templateLWRInput.add(TableTemplates.GENERIC);<br />
InputTable p1 = new InputTable(templateLWRInput, "LWR_Input", "Input table containing taxa and species information", "lwr");<br />
ColumnType p3 = new ColumnType("LWR_Input", "FamilyColumn", "The column containing Family information", "Family", false);<br />
ServiceType p4 = new ServiceType(ServiceParameters.RANDOMSTRING, "RealOutputTable", "name of the resulting table", "lwr_");<br />
PrimitiveType p2 = new PrimitiveType(String.class.getName(), null, PrimitiveTypes.STRING, "TableLabel", "Name of the table which will contain the model output", "lwrout");<br />
<br />
List<StatisticalType> parameters = new ArrayList<StatisticalType>();<br />
parameters.add(p1);<br />
parameters.add(p3);<br />
parameters.add(p2);<br />
parameters.add(p4);<br />
<br />
DatabaseType.addDefaultDBPars(parameters);<br />
<br />
return parameters;<br />
}<br />
<br />
@Override<br />
public StatisticalType getOutput() {<br />
List<TableTemplates> template = new ArrayList<TableTemplates>();<br />
template.add(TableTemplates.GENERIC);<br />
OutputTable p = new OutputTable(template, destinationTableLabel, destinationTable, "Output lwr table");<br />
return p;<br />
}<br />
<br />
@Override<br />
public void initSingleNode(AlgorithmConfiguration config) {<br />
<br />
}<br />
<br />
@Override<br />
public float getInternalStatus() {<br />
return status;<br />
}<br />
<br />
private static String scriptName = "UpdateLWR_4.R";<br />
<br />
//the inputs delivered by the DM are: the index and number of elements to take from the left and right tables, the indication on if the same requeste was yet asked to another worker node (in the case of errors), the sandobox folder in which the script will be executed, the configuration of the algorithm<br />
@Override<br />
public int executeNode(int leftStartIndex, int numberOfLeftElementsToProcess, int rightStartIndex, int numberOfRightElementsToProcess, boolean duplicate, String sandboxFolder, String nodeConfigurationFileObject, String logfileNameToProduce) {<br />
String insertQuery = null;<br />
try {<br />
status = 0;<br />
//reconstruct the configuration<br />
AlgorithmConfiguration config = Transformations.restoreConfig(nodeConfigurationFileObject);<br />
config.setConfigPath(sandboxFolder);<br />
System.out.println("Initializing DB");<br />
//take the parameters and possibly initialize connection to the DB<br />
dbconnection = DatabaseUtils.initDBSession(config);<br />
destinationTableLabel = config.getParam("TableLabel");<br />
destinationTable = config.getParam("RealOutputTable");<br />
System.out.println("Destination Table: "+destinationTable);<br />
System.out.println("Destination Table Label: "+destinationTableLabel);<br />
originTable = config.getParam("LWR_Input");<br />
familyColumn = config.getParam("FamilyColumn");<br />
System.out.println("Origin Table: "+originTable);<br />
<br />
// take the families to process<br />
List<Object> families = DatabaseFactory.executeSQLQuery(DatabaseUtils.getDinstictElements(originTable, familyColumn, ""), dbconnection);<br />
<br />
// transform the families into a string<br />
StringBuffer familiesFilter = new StringBuffer();<br />
familiesFilter.append("Families <- Fam.All[");<br />
<br />
int end = rightStartIndex + numberOfRightElementsToProcess;<br />
//build the substitution string<br />
for (int i = rightStartIndex; i < end; i++) {<br />
familiesFilter.append("Fam.All == \"" + families.get(i) + "\"");<br />
if (i < end - 1)<br />
familiesFilter.append(" | ");<br />
}<br />
familiesFilter.append("]");<br />
<br />
//substitution to perform in the script<br />
String substitutioncommand = "sed -i 's/Families <- Fam.All[Fam.All== \"Acanthuridae\" | Fam.All == \"Achiridae\"]/" + familiesFilter + "/g' " + "UpdateLWR_Test2.R";<br />
System.out.println("Preparing for processing the families names: "+familiesFilter.toString());<br />
<br />
substituteInScript(sandboxFolder+scriptName,sandboxFolder+"UpdateLWR_Tester.R","Families <- Fam.All[Fam.All== \"Acanthuridae\" | Fam.All == \"Achiridae\"]",familiesFilter.toString());<br />
//for test only<br />
<br />
System.out.println("Creating local file from remote table");<br />
// download the table in csv format to feed the procedure<br />
DatabaseUtils.createLocalFileFromRemoteTable(sandboxFolder+"RF_LWR.csv", originTable, ",", config.getDatabaseUserName(),config.getDatabasePassword(),config.getDatabaseURL());<br />
<br />
String headers = "Subfamily,Family,Genus,Species,FBname,SpecCode,AutoCtr,Type,a,b,CoeffDetermination,Number,LengthMin,Score,BodyShapeI";<br />
System.out.println("Adding headers to the file");<br />
<br />
String headerscommand = "sed -i '1s/^/"+headers+"\\n/g' "+"RF_LWR2.csv";<br />
// substitute the string in the RCode<br />
addheader(sandboxFolder+"RF_LWR.csv",sandboxFolder+"RF_LWR2.csv",headers);<br />
System.out.println("Headers added");<br />
System.out.println("Executing R script " + "R --no-save < UpdateLWR_Tester.R");<br />
// run the R code: it can be alternatively made with the methods of the previous example<br />
Process process = Runtime.getRuntime().exec("R --no-save");<br />
BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(process.getOutputStream()));<br />
bw.write("source('UpdateLWR_Tester.R')\n");<br />
bw.write("q()\n");<br />
bw.close();<br />
BufferedReader br = new BufferedReader(new InputStreamReader(process.getInputStream()));<br />
String line = br.readLine();<br />
System.out.println(line);<br />
while (line!=null){<br />
line = br.readLine();<br />
System.out.println(line);<br />
}<br />
process.destroy();<br />
System.out.println("Appending csv to table");<br />
// transform the output into table<br />
StringBuffer lines = readFromCSV("LWR_Test1.csv");<br />
insertQuery = DatabaseUtils.insertFromBuffer(destinationTable, columnNames, lines);<br />
DatabaseFactory.executeSQLUpdate(insertQuery, dbconnection);<br />
System.out.println("The procedure was successful");<br />
status = 1f;<br />
} catch (Exception e) {<br />
e.printStackTrace();<br />
System.out.println("warning: error in node execution " + e.getLocalizedMessage());<br />
System.out.println("Insertion Query: "+insertQuery);<br />
System.err.println("Error in node execution " + e.getLocalizedMessage());<br />
return -1;<br />
} finally {<br />
if (dbconnection != null)<br />
try {<br />
dbconnection.close();<br />
} catch (Exception e) {<br />
}<br />
}<br />
return 0;<br />
}<br />
<br />
//setup phase of the algorithm<br />
@Override<br />
public void setup(AlgorithmConfiguration config) throws Exception {<br />
<br />
destinationTableLabel = config.getParam("TableLabel");<br />
AnalysisLogger.getLogger().info("Table Label: "+destinationTableLabel);<br />
destinationTable = config.getParam("RealOutputTable");<br />
AnalysisLogger.getLogger().info("Uderlying Table Name: "+destinationTable);<br />
originTable = config.getParam("LWR_Input");<br />
AnalysisLogger.getLogger().info("Original Table: "+originTable);<br />
familyColumn = config.getParam("FamilyColumn");<br />
AnalysisLogger.getLogger().info("Family Column: "+familyColumn);<br />
haspostprocessed = false;<br />
<br />
AnalysisLogger.getLogger().info("Initializing DB Connection");<br />
dbconnection = DatabaseUtils.initDBSession(config);<br />
List<Object> families = DatabaseFactory.executeSQLQuery(DatabaseUtils.getDinstictElements(originTable, familyColumn, ""), dbconnection);<br />
count = families.size();<br />
<br />
//create the table were the script will write the output<br />
DatabaseFactory.executeSQLUpdate(String.format(createOutputTable, destinationTable), dbconnection);<br />
AnalysisLogger.getLogger().info("Destination Table Created! Addressing " + count + " species");<br />
} <br />
<br />
@Override<br />
public int getNumberOfRightElements() {<br />
return count; //each Worker node has to get all the elements in the right table<br />
}<br />
<br />
@Override<br />
public int getNumberOfLeftElements() {<br />
return 1; //each Worker node has to get only one element in the left table<br />
}<br />
<br />
@Override<br />
public void stop() {<br />
<br />
//if has not postprocessed, then abort the computations by removing the database table<br />
if (!haspostprocessed){<br />
try{<br />
AnalysisLogger.getLogger().info("The procedure did NOT correctly postprocessed ....Removing Table "+destinationTable+" because of computation stop!");<br />
DatabaseFactory.executeSQLUpdate(DatabaseUtils.dropTableStatement(destinationTable), dbconnection);<br />
}catch (Exception e) {<br />
AnalysisLogger.getLogger().info("Table "+destinationTable+" did not exist");<br />
}<br />
}<br />
else<br />
AnalysisLogger.getLogger().info("The procedure has correctly postprocessed: shutting down the connection!");<br />
if (dbconnection != null)<br />
try {<br />
dbconnection.close();<br />
} catch (Exception e) {<br />
}<br />
}<br />
<br />
boolean haspostprocessed = false;<br />
@Override<br />
public void postProcess(boolean manageDuplicates, boolean manageFault) {<br />
haspostprocessed=true;<br />
}<br />
<br />
}<br />
</source><br />
<br />
=Video=<br />
<br />
We advice you to also follow this video which practically show how to build an algorithm:<br />
<br />
http://i-marine.eu/Content/eTraining.aspx?id=e1777006-a08c-49ad-b2e6-c13e094f27d4<br />
<br />
= Related Links =<br />
[https://wiki.gcube-system.org/gcube/DataMiner_Manager DataMiner Tutorial]<br />
<br />
[https://wiki.gcube-system.org/gcube/Data_Mining_Facilities Data Mining Facilities]<br />
<br />
==References==<br />
{{Reflist}}</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=How-to_Implement_Algorithms_for_DataMiner&diff=30639How-to Implement Algorithms for DataMiner2017-12-22T10:37:31Z<p>Gianpaolo.coro: </p>
<hr />
<div><!-- CATEGORIES --><br />
[[Category:Developer's Guide]]<br />
<!-- END CATEGORIES --><br />
=Prerequisites=<br />
IDE: Eclipse Java EE IDE for Web Developers. Version: 3.7+<br />
<br />
We advice you to also follow this video:<br />
<br />
http://i-marine.eu/Content/eTraining.aspx?id=e1777006-a08c-49ad-b2e6-c13e094f27d4<br />
<br />
Maven<ref>[https://maven.apache.org/guides/getting-started/index.html Maven Tutorial]</ref><br />
repository configuration: [[File:settings.xml|settings.xml]]<br />
<br />
Hello World Algorithm: [[File:hello-world-algorithm.zip|hello-world-algorithm.zip]]<br />
<br />
=Step by Step=<br />
Let's start by creating a project using the eclipse IDE that is mavenized according to our [http://gcube.wiki.gcube-system.org/gcube/index.php/Creating_gCube_Maven_components:_How-To indications].<br />
After having mavenized the project in eclipse you have to put dependencies.<br />
====Maven coordinates ====<br />
The maven artifact coordinates are: <br />
<source lang="java"><br />
<dependencyManagement><br />
<dependencies><br />
<dependency><br />
<groupId>org.gcube.distribution</groupId><br />
<artifactId>gcube-bom</artifactId><br />
<version>LATEST</version><br />
<type>pom</type><br />
<scope>import</scope><br />
</dependency><br />
</dependencies><br />
</dependencyManagement><br />
<br />
<br />
<dependencies> <br />
<br />
<dependency><br />
<groupId>org.gcube.dataanalysis</groupId><br />
<artifactId>ecological-engine</artifactId><br />
<version>[1.6.0-SNAPSHOT,2.0.0-SNAPSHOT)</version><br />
<scope>provided</scope><br />
</dependency><br />
<br />
<dependency><br />
<groupId>junit</groupId><br />
<artifactId>junit</artifactId><br />
<version>[4.12,)</version><br />
<scope>test</scope><br />
</dependency><br />
<br />
<dependency><br />
<groupId>org.slf4j</groupId><br />
<artifactId>slf4j-api</artifactId><br />
</dependency><br />
<br />
<dependency><br />
<groupId>org.slf4j</groupId><br />
<artifactId>slf4j-log4j12</artifactId><br />
<version>1.7.5</version><br />
<scope>test</scope><br />
</dependency><br />
<br />
.....<br />
<br />
</dependencies><br />
</source><br />
<br />
And add [https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/cfg/log4j.properties this log4j properties file] to your classpath (e.g. your Eclipse project folder).<br />
<br />
Lets start creating a new call which implements a basic algorithm; it will be executed by the DataMiner.<br />
The next step is to extend a basic interface <code>StandardLocalExternalAlgorithm</code>.<br />
The following snippet shows unimplemented interface methods that we are going to fulfill.<br />
<source lang="java"><br />
public class SimpleAlgorithm extends StandardLocalExternalAlgorithm{<br />
<br />
@Override<br />
public void init() throws Exception {<br />
// TODO Auto-generated method stub <br />
}<br />
@Override<br />
public String getDescription() {<br />
// TODO Auto-generated method stub<br />
return null;<br />
}<br />
@Override<br />
protected void process() throws Exception {<br />
// TODO Auto-generated method stub<br />
<br />
}<br />
@Override<br />
protected void setInputParameters() {<br />
// TODO Auto-generated method stub<br />
<br />
}<br />
@Override<br />
public void shutdown() {<br />
// TODO Auto-generated method stub <br />
}<br />
@Override<br />
public StatisticalType getOutput() {<br />
return null;<br />
}<br />
}<br />
</source><br />
The <code>init()</code> is the initialization method. In this simple example we need to initialize the loging facility and we use the logger from the ecological engine library. In case the algorithm uses a database, we have to open its connection in this method.<br />
The <code>shutdown()</code> closes database connection.<br />
In the <code>getDescription()</code> method we add a simple description for the algorithm.<br />
<br />
=Customize input visualization =<br />
==== String input parameters ====<br />
The user's input is obtained by calling from <code>setInputParameters()</code> the method addStringInput with following parameters:<br />
* name of the variable ;<br />
* description for the variable;<br />
* default value;<br />
<br />
User input is retrieved using <code>getInputParameter()</code> passing name used as parameter into <code>setInputParameters()</code>.<br />
<source lang="java"><br />
protected void setInputParameters() {<br />
addStringInput(NameOfVariable, "Description", "DefaultInput");<br />
<br />
}<br />
</source><br />
The input parameter will be automatically passed by DataMiner to the procedure.<br />
In particular, to process the method we can retrieve such parameter by name that we set in the addStringInput method.<br />
<source lang="java"><br />
@Override<br />
protected void process() throws Exception {<br />
....<br />
String userInputValue = getInputParameter(NameOfVariable);<br />
}<br />
</source><br />
<br />
==== Combo box input parameter ====<br />
In order to obtain a combo box we have to define a enumerator that contains the possible<br />
choices that could be selected in the combo box and you have to pass it to the method <code>addEnumerateInput</code> as follows:<br />
<br />
<source lang="java"><br />
public enum Enum {<br />
FIRST_ENUM,<br />
SECOND_ENUM<br />
}<br />
<br />
protected void setInputParameters() {<br />
addEnumerateInput(Enum.values(), variableName, "Description",<br />
Enum.FIRST_ENUM.name());<br />
}<br />
</source><br />
<code>addEnumerateInput</code> parameters are respectively:<br />
* values of declared enumerator;<br />
* name of variable used to extract value insert by user;<br />
* description of value;<br />
* default value visualized in comboBox<br />
<br />
==== File input parameter ====<br />
User can be upload his data in the DataMiner as file. After the uploading of a file, it's possible to use uploaded data as input for an algorithm.<br />
<source lang="java"><br />
@Override<br />
protected void setInputParameters() {<br />
inputs.add(new PrimitiveType(File.class.getName(), null, PrimitiveTypes.FILE, <br />
"inputFileParameterName", "Input File Description", "Input File Name")); <br />
}<br />
<br />
@Override<br />
protected void process() throws Exception {<br />
String fileParameter = getInputParameter("inputFileParameterName");<br />
FileInputStream fileStream = new FileInputStream(fileParameter);<br />
}<br />
<br />
</source><br />
<br />
==== Import input from the DataMiner database ====<br />
User can be upload his data in the DataMiner "Access to the Data Space" Section.<br />
After the uploading of a file (for example csv file), it's possible to use uploaded data as input for an algorithm.<br />
In order to select the columns values of a table that is extrapolated from csv, an algorithm developer fulfills the methods in the following way:<br />
<source lang="java"><br />
<br />
@Override<br />
protected void setInputParameters() {<br />
List<TableTemplates> templates = new ArrayList<TableTemplates>();<br />
templates.add(TableTemplates.GENERIC);<br />
InputTable tinput = new InputTable(templates, "Table","Table Description");<br />
ColumnTypesList columns = new ColumnTypesList("Table","Columns", "Selceted Columns Description", false);<br />
inputs.add(tinput);<br />
inputs.add(columns);<br />
DatabaseType.addDefaultDBPars(inputs);<br />
<br />
}<br />
<br />
@Override<br />
protected void process() throws Exception {<br />
{<br />
config.setParam("DatabaseDriver", "org.postgresql.Driver");<br />
SessionFactory dbconnection = DatabaseUtils.initDBSession(config);<br />
String[] columnlist = columnnames.split(AlgorithmConfiguration.getListSeparator());<br />
List<Object> speciesList = DatabaseFactory.executeSQLQuery("select " + columnlist[0]+ " from " + tablename, dbconnection);<br />
}<br />
</source><br />
<br />
===Advanced Input===<br />
It is possible to indicate spatial inputs or time/date inputs. The details for the definition of these dare are reported in the [[Advanced_Input| Advanced Input page]]<br />
.<br />
<br />
= Case of algorithms using databases =<br />
In order to use a database it is required to call, into <code>setInputParameters()</code>, the method <code>addRemoteDatabaseInput()</code>.<br />
An important step is to pass as first parameter the name of the Runtime Resource addressing the database. <br />
The DataMiner automatically retrieves thew following parameters from the runtime resource: url ,user and password. Into the process method, before database connection, url,user and password will be retrieve using <code>getInputParameter</code>. Each of them is retrieved using the name and passing it into <code>addRemoteDatabaseInput</code> as parameters.<br />
<source lang="java"><br />
@Override<br />
protected void setInputParameters() { <br />
... <br />
addRemoteDatabaseInput("Obis2Repository", urlParameterName,userParameterName, passwordParameterName, "driver", "dialect");<br />
<br />
@Override<br />
protected void process() throws Exception {<br />
...<br />
<br />
String databaseJdbc = getInputParameter(urlParameterName);<br />
String databaseUser = getInputParameter(userParameterName);<br />
String databasePwd = getInputParameter(passwordParameterName);<br />
<br />
connection = DriverManager.getConnection(databaseJdbc, databaseUser,databasePwd);<br />
...<br />
<br />
}<br />
<br />
</source><br />
<br />
= Customize output =<br />
The last step is to set and to specify output of procedure.<br />
For this purpose we override the method <code>getOutput()</code> which return StatisticalType.<br />
First output parameter we instantiate is a PrimitiveType object that wraps a string; so, we set type as string.<br />
We associate name and description to the output value.<br />
We can istantiate a second output as an another PrimitiveType<br />
We set them as a map which will keep the order of the parameter used to store both output.<br />
We add both the output object into the map.<br />
<br />
<code>getOutput()</code> procedure which will invoke DataMiner to understand type of the output object and at this point in the ecological engine library the algorithm will be indexed with the name set in the file of property.<br />
<br />
==== String Output ====<br />
<br />
In ordert to have a string as output you have to create a <code>PrimitiveType</code> as follows:<br />
<source lang="java"><br />
<br />
@Override<br />
public StatisticalType getOutput() {<br />
….<br />
PrimitiveType val = new PrimitiveType(String.class.getName(), myString , PrimitiveTypes.STRING, stringName, defaultValue);<br />
return val;<br />
<br />
}<br />
<br />
</source><br />
<br />
==== Bar Chart Output ====<br />
In order to create an Histogram Chart you have to fulfill a <code>DafaultCategoryDataser</code> object and use it to create chart<br />
<source lang="java"><br />
<br />
DefaultCategoryDataset dataset;<br />
…<br />
dataset.addValue(...); <br />
….<br />
<br />
<br />
@Override<br />
public StatisticalType getOutput() {<br />
….<br />
HashMap<String, Image> producedImages = new HashMap<String, Image>();<br />
JFreeChart chart = HistogramGraph.createStaticChart(dataset);<br />
Image image = ImageTools.toImage(chart.createBufferedImage(680, 420));<br />
producedImages.put("Species Observations", image);<br />
…<br />
}<br />
<br />
</source><br />
<br />
==== Timeseries Chart Output ====<br />
<br />
In order to create a TimeSeries Chart you have to fulfill a <code>DafaultCategoryDataser</code> object and use it to create the chart.<br />
The second parameter of createStatiChart method is the format of time.<br />
<source lang="java"><br />
<br />
DefaultCategoryDataset dataset;<br />
…<br />
dataset.addValue(...); <br />
….<br />
@Override<br />
public StatisticalType getOutput() {<br />
...<br />
HashMap<String, Image> producedImages = new HashMap<String, Image>();<br />
JFreeChart chart = TimeSeriesGraph.createStaticChart(dataset, "yyyy");<br />
Image image = ImageTools.toImage(chart.createBufferedImage(680, 420));<br />
producedImages.put("TimeSeries chart", image);<br />
... <br />
}<br />
<br />
<br />
</source><br />
<br />
==== File Output ====<br />
In order to create a results file that user can download, algorithm developers have to add following code:<br />
<source lang="java"><br />
protected String fileName;<br />
protected BufferedWriter out;<br />
<br />
@Override<br />
protected void process() throws Exception {<br />
//Note you must add timestamp to the file name <br />
//<br />
fileName = super.config.getPersistencePath() + "results.csv";<br />
out = new BufferedWriter(new FileWriter(fileName));<br />
out.write(results);<br />
out.newLine();<br />
}<br />
<br />
@Override<br />
public StatisticalType getOutput() {<br />
...<br />
PrimitiveType file = new PrimitiveType(File.class.getName(), new File(fileName), PrimitiveTypes.FILE, "Description ", "Default value");<br />
map.put("Output",file);<br />
...<br />
}<br />
</source><br />
<br />
= Test the algorithm = <br />
This is a template example to test an algorithm from Eclipse. Download the following folder https://goo.gl/r16rfF and put it locally to the code. <br />
<br />
<source lang="java"><br />
package org.gcube.dataanalysis.ecoengine.test.regression;<br />
<br />
import java.util.List;<br />
<br />
import org.gcube.dataanalysis.ecoengine.configuration.AlgorithmConfiguration;<br />
import org.gcube.dataanalysis.ecoengine.evaluation.bioclimate.InterpolateTables.INTERPOLATIONFUNCTIONS;<br />
import org.gcube.dataanalysis.ecoengine.interfaces.ComputationalAgent;<br />
import org.gcube.dataanalysis.ecoengine.interfaces.Transducerer;<br />
<br />
public class TestTransducers {<br />
<br />
public static void main(String[] args) throws Exception {<br />
System.out.println("TEST 1");<br />
ComputationalAgent computationalAgent = new yourClassName();<br />
computationalAgent.setConfiguration(testConfigLocal());<br />
computationalAgent.init();<br />
Regressor.process(computationalAgent);<br />
computationalAgent.shutdown();<br />
}<br />
<br />
private static AlgorithmConfiguration testConfigLocal() {<br />
<br />
AlgorithmConfiguration config = Regressor.getConfig();<br />
config.setAgent("OCCURRENCES_DUPLICATES_DELETER");<br />
<br />
config.setParam("longitudeColumn", "decimallongitude");<br />
config.setParam("latitudeColumn", "decimallatitude");<br />
config.setParam("recordedByColumn", "recordedby");<br />
config.setParam("scientificNameColumn", "scientificname");<br />
config.setParam("eventDateColumn", "eventdate");<br />
config.setParam("lastModificationColumn", "modified");<br />
config.setParam("OccurrencePointsTableName", "whitesharkoccurrences2");<br />
config.setParam("finalTableName", "whitesharkoccurrencesnoduplicates");<br />
config.setParam("spatialTolerance", "0.5");<br />
config.setParam("confidence", "80");<br />
<br />
return config;<br />
}<br />
<br />
}<br />
</source><br />
<br />
= Properties File and Deploy =<br />
In order to deploy an algorithm we must create:<br />
* the jar corresponding to the eclipse Java project containing the algorithm;<br />
* a file of property containing the name you want the algorithm to be displayed on the GUI and the classpath to algorithm class. E.g. MY_ALGORITHM=org.gcube.cnr.Myalgorithm<br />
<br />
You must provide these two files to the i-Marine team. They will move the algorithm onto a DataMiner instance and the interface will be automatically generated.<br />
<br />
In the following example, inside the src/main/java folder, the package <code>org.gcube.dataanalysis.myAlgorithms</code> exists that contains the class <code>SimpleAlgorithm</code> implementing an algorithm.<br />
<code><br />
SIMPLE_ALGORITHM=org.gcube.dataanalysis.myrAlgorithms.SimpleAlgorithm<br />
</code><br />
<br />
= Complete Example with multiple outputs =<br />
<source lang="java"><br />
public class AbsoluteSpeciesBarChartsAlgorithm extends<br />
StandardLocalExternalAlgorithm {<br />
LinkedHashMap<String, StatisticalType> map = new LinkedHashMap<String, StatisticalType>();<br />
static String databaseName = "DatabaseName";<br />
static String userParameterName = "DatabaseUserName";<br />
static String passwordParameterName = "DatabasePassword";<br />
static String urlParameterName = "DatabaseURL";<br />
private String firstSpeciesNumber="Species";<br />
private String yearStart="Starting_year";<br />
private String yearEnd="Ending_year";<br />
private int speciesNumber;<br />
private DefaultCategoryDataset defaultcategorydataset;<br />
@Override<br />
public void init() throws Exception {<br />
AnalysisLogger.getLogger().debug("Initialization"); <br />
}<br />
<br />
@Override<br />
public String getDescription() {<br />
return "Algorithm returning bar chart of most observed species in a specific years range (with respect to the OBIS database)";<br />
}<br />
<br />
@Override<br />
protected void process() throws Exception {<br />
defaultcategorydataset = new DefaultCategoryDataset();<br />
String driverName = "org.postgresql.Driver";<br />
String tmp=getInputParameter(firstSpeciesNumber);<br />
<br />
speciesNumber = Integer.parseInt(tmp);<br />
Class driverClass = Class.forName(driverName);<br />
Driver driver = (Driver) driverClass.newInstance();<br />
String databaseJdbc = getInputParameter(urlParameterName);<br />
String year_start = getInputParameter(yearStart);<br />
String year_end = getInputParameter(yearEnd);<br />
<br />
String databaseUser = getInputParameter(userParameterName);<br />
String databasePwd = getInputParameter(passwordParameterName);<br />
Connection connection = null;<br />
connection = DriverManager.getConnection(databaseJdbc, databaseUser,<br />
databasePwd);<br />
Statement stmt = connection.createStatement();<br />
String query = "SELECT tname, sum(count)AS count FROM public.count_species_per_year WHERE year::integer >="<br />
+ year_start<br />
+ "AND year::integer <="<br />
+ year_end<br />
+ "GROUP BY tname ORDER BY count desc;";<br />
ResultSet rs = stmt.executeQuery(query);<br />
int i =0;<br />
String s = "Species";<br />
while (rs.next()&& i<speciesNumber) {<br />
<br />
String tname = rs.getString("tname");<br />
String count = rs.getString("count");<br />
int countOcc=Integer.parseInt(count);<br />
<br />
// First output (list of string)<br />
PrimitiveType val = new PrimitiveType(String.class.getName(), count, PrimitiveTypes.STRING, tname, tname);<br />
map.put(tname, val); <br />
if(i<16)<br />
defaultcategorydataset.addValue(countOcc,s,tname); <br />
else<br />
break;<br />
i++;<br />
<br />
}<br />
connection.close();<br />
<br />
<br />
<br />
}<br />
<br />
@Override<br />
protected void setInputParameters() {<br />
addStringInput(firstSpeciesNumber,<br />
"Number of shown species", "10");<br />
addStringInput(yearStart, "Starting year of observations",<br />
"1800");<br />
addStringInput(yearEnd, "Ending year of observations", "2020");<br />
addRemoteDatabaseInput("Obis2Repository", urlParameterName,<br />
userParameterName, passwordParameterName, "driver", "dialect");<br />
<br />
<br />
}<br />
<br />
@Override<br />
public void shutdown() {<br />
AnalysisLogger.getLogger().debug("Shutdown"); <br />
}<br />
<br />
<br />
@Override<br />
public StatisticalType getOutput() {<br />
PrimitiveType p = new PrimitiveType(Map.class.getName(), PrimitiveType.stringMap2StatisticalMap(outputParameters), PrimitiveTypes.MAP, "Discrepancy Analysis","");<br />
AnalysisLogger.getLogger().debug("MapsComparator: Producing Gaussian Distribution for the errors"); <br />
//build image:<br />
HashMap<String, Image> producedImages = new HashMap<String, Image>();<br />
<br />
JFreeChart chart = HistogramGraph.createStaticChart(defaultcategorydataset);<br />
Image image = ImageTools.toImage(chart.createBufferedImage(680, 420));<br />
producedImages.put("Species Observations", image);<br />
<br />
PrimitiveType images = new PrimitiveType(HashMap.class.getName(), producedImages, PrimitiveTypes.IMAGES, "ErrorRepresentation", "Graphical representation of the error spread");<br />
<br />
//end build image<br />
AnalysisLogger.getLogger().debug("Bar Charts Species Occurrences Produced");<br />
//collect all the outputs<br />
<br />
map.put("Result", p);<br />
map.put("Images", images);<br />
<br />
//generate a primitive type for the collection<br />
PrimitiveType output = new PrimitiveType(HashMap.class.getName(), map, PrimitiveTypes.MAP, "ResultsMap", "Results Map");<br />
<br />
<br />
return output;<br />
}<br />
<br />
}<br />
<br />
</source><br />
<br />
=Integrating R Scripts=<br />
DataMiner (DM) supports R scripts integration. This section explains how to integrate R scripts that will be executed by one single powerful machine in sequential mode. The calculation will be distributed on one of the machines that make up the DataMiner system, and the DM will automatically account for multi-users requests management. This section does not deal with parallel processing enabled for the script, which will be discussed later.<br />
<br />
In the Eclipse project, download the following configuration folder: http://goo.gl/bNKrZK<br />
Then add the following maven dependency:<br />
<br />
<source lang="java"><br />
<dependency><br />
<groupId>org.gcube.dataanalysis</groupId><br />
<artifactId>ecological-engine-smart-executor</artifactId><br />
<version>[1.0.0-SNAPSHOT,2.0.0)</version><br />
</dependency><br />
</source><br />
<br />
Then copy an R script inside the cfg folder. The DM framework assumes that the R file (i) accepts an input file whose name is hard-coded in the script, (ii) produces an output file whose name is hard-coded in the script, (iii) requires an R context made up of user's variables, (iv) possibly requires custom adjustment to the code.<br />
<br />
The DM framework facilitates the call to the script by adding context variables "on the fly" and managing multi-user synchronous calls. This mechanism is performed by generating new on-the-fly temporary R scripts for each user. The DM will be also responsible for distributing the script on one powerful machine. Required packages are assumed to be preinstalled on the backend system.<br />
<br />
One example of an algorithm calling a complex interpolation model is the following:<br />
<br />
<source lang="java"><br />
package org.gcube.dataanalysis.executor.rscripts;<br />
<br />
import java.io.File;<br />
import java.util.HashMap;<br />
import java.util.LinkedHashMap;<br />
<br />
import org.gcube.contentmanagement.lexicalmatcher.utils.AnalysisLogger;<br />
import org.gcube.dataanalysis.ecoengine.datatypes.PrimitiveType;<br />
import org.gcube.dataanalysis.ecoengine.datatypes.StatisticalType;<br />
import org.gcube.dataanalysis.ecoengine.datatypes.enumtypes.PrimitiveTypes;<br />
import org.gcube.dataanalysis.ecoengine.interfaces.StandardLocalExternalAlgorithm;<br />
import org.gcube.dataanalysis.executor.util.RScriptsManager;<br />
<br />
public class SGVMS_Interpolation extends StandardLocalExternalAlgorithm {<br />
<br />
private static int maxPoints = 10000;<br />
public enum methodEnum { cHs, SL};<br />
RScriptsManager scriptmanager;<br />
String outputFile;<br />
<br />
@Override<br />
public void init() throws Exception {<br />
AnalysisLogger.getLogger().debug("Initializing SGVMS_Interpolation");<br />
}<br />
<br />
@Override<br />
public String getDescription() {<br />
return "An interpolation method relying on the implementation by the Study Group on VMS (SGVMS). The method uses two interpolation approached to simulate vessels points at a certain temporal resolution. The input is a file in TACSAT format uploaded on the DataMiner. The output is another TACSAT file containing interpolated points." +<br />
"The underlying R code has been extracted from the SGVM VMSTools framework. This algorithm comes after a feasibility study (http://goo.gl/risQre) which clarifies the features an e-Infrastructure adds to the original scripts. Limitation: the input will be processed up to "+maxPoints+" vessels trajectory points.";<br />
}<br />
<br />
@Override<br />
protected void process() throws Exception {<br />
<br />
status = 0;<br />
//instantiate the R Script executor<br />
scriptmanager = new RScriptsManager();<br />
//this is the script name<br />
String scriptName = "interpolateTacsat.r";<br />
//absolute path to the input, provided by the DM <br />
String inputFile = config.getParam("InputFile");<br />
<br />
AnalysisLogger.getLogger().debug("Starting SGVM Interpolation-> Config path "+config.getConfigPath()+" Persistence path: "+config.getPersistencePath());<br />
//default input and outputs <br />
String defaultInputFileInTheScript = "tacsat.csv";<br />
String defaultOutputFileInTheScript = "tacsat_interpolated.csv";<br />
//input parameters: represent the context of the script. Values will be assigned in the R environment.<br />
LinkedHashMap<String,String> inputParameters = new LinkedHashMap<String, String>();<br />
inputParameters.put("npoints",config.getParam("npoints"));<br />
inputParameters.put("interval",config.getParam("interval"));<br />
inputParameters.put("margin",config.getParam("margin"));<br />
inputParameters.put("res",config.getParam("res"));<br />
inputParameters.put("fm",config.getParam("fm"));<br />
inputParameters.put("distscale",config.getParam("distscale"));<br />
inputParameters.put("sigline",config.getParam("sigline"));<br />
inputParameters.put("minspeedThr",config.getParam("minspeedThr"));<br />
inputParameters.put("maxspeedThr",config.getParam("maxspeedThr"));<br />
inputParameters.put("headingAdjustment",config.getParam("headingAdjustment"));<br />
inputParameters.put("equalDist",config.getParam("equalDist").toUpperCase());<br />
//add static context variables<br />
inputParameters.put("st", "c(minspeedThr,maxspeedThr)");<br />
inputParameters.put("fast", "TRUE");<br />
inputParameters.put("method", "\""+config.getParam("method")+"\"");<br />
<br />
AnalysisLogger.getLogger().debug("Starting SGVM Interpolation-> Input Parameters: "+inputParameters);<br />
//if other code injection is required, put the strings to substitute as keys and the substituting ones as values<br />
HashMap<String,String> codeInjection = null;<br />
//force the script to produce an output file, otherwise generate an exception <br />
boolean scriptMustReturnAFile = true;<br />
boolean uploadScriptOnTheInfrastructureWorkspace = false; //the DataMiner service will manage the upload<br />
AnalysisLogger.getLogger().debug("SGVM Interpolation-> Executing the script ");<br />
status = 10;<br />
//execute the script in multi-user mode<br />
scriptmanager.executeRScript(config, scriptName, inputFile, inputParameters, defaultInputFileInTheScript, defaultOutputFileInTheScript, codeInjection, scriptMustReturnAFile,uploadScriptOnTheInfrastructureWorkspace, config.getConfigPath());<br />
//assign the file path to an output variable for the DM<br />
outputFile = scriptmanager.currentOutputFileName;<br />
AnalysisLogger.getLogger().debug("SGVM Interpolation-> Output File is "+outputFile);<br />
status = 100;<br />
}<br />
<br />
@Override<br />
protected void setInputParameters() {<br />
//declare the input parameters the user will set: they will basically correspond to the R context<br />
inputs.add(new PrimitiveType(File.class.getName(), null, PrimitiveTypes.FILE, "InputFile", "Input file in TACSAT format. E.g. http://goo.gl/i16kPw"));<br />
addIntegerInput("npoints", "The number of pings or positions required between each real or actual vessel position or ping", "10");<br />
addIntegerInput("interval", "Average time in minutes between two adjacent datapoints", "120");<br />
addIntegerInput("margin", "Maximum deviation from specified interval to find adjacent datapoints (tolerance)", "10");<br />
addIntegerInput("res", "Number of points to use to create interpolation (including start and end point)", "100");<br />
addEnumerateInput(methodEnum.values(), "method","Set to cHs for cubic Hermite spline or SL for Straight Line interpolation", "cHs");<br />
addDoubleInput("fm", "The FM parameter in cubic interpolation", "0.5");<br />
addIntegerInput("distscale", "The DistScale parameter for cubic interpolation", "20");<br />
addDoubleInput("sigline", "The Sigline parameter in cubic interpolation", "0.2");<br />
addDoubleInput("minspeedThr", "A filter on the minimum speed to take into account for interpolation", "2");<br />
addDoubleInput("maxspeedThr", "A filter on the maximum speed to take into account for interpolation", "6");<br />
addIntegerInput("headingAdjustment", "Parameter to adjust the choice of heading depending on its own or previous point (0 or 1). Set 1 in case the heading at the endpoint does not represent the heading of the arriving vessel to that point but the departing vessel.", "0");<br />
inputs.add(new PrimitiveType(Boolean.class.getName(), null, PrimitiveTypes.BOOLEAN, "equalDist", "Whether the number of positions returned should be equally spaced or not", "true"));<br />
}<br />
<br />
@Override<br />
public StatisticalType getOutput() {<br />
//return the output file by the procedure to the DM<br />
PrimitiveType o = new PrimitiveType(File.class.getName(), new File(outputFile), PrimitiveTypes.FILE, "OutputFile", "Output file in TACSAT format.");<br />
return o;<br />
}<br />
<br />
@Override<br />
public void shutdown() {<br />
//in the case of forced shutdown, stop the R process<br />
if (scriptmanager!=null)<br />
scriptmanager.stop();<br />
System.gc();<br />
}<br />
<br />
}<br />
</source><br />
<br />
In order to test the above algorithm, just modify the "transducerers.properties" file inside the cfg folder by adding the following string:<br />
<br />
SGVM_INTERPOLATION=org.gcube.dataanalysis.executor.rscripts.SGVMS_Interpolation<br />
<br />
which will assign a name to the algorithm. Then a test class for this algorithm will be the following:<br />
<br />
<source lang="java"><br />
package org.gcube.dataanalysis.executor.tests;<br />
<br />
import java.util.List;<br />
<br />
import org.gcube.dataanalysis.ecoengine.configuration.AlgorithmConfiguration;<br />
import org.gcube.dataanalysis.ecoengine.datatypes.PrimitiveType;<br />
import org.gcube.dataanalysis.ecoengine.datatypes.StatisticalType;<br />
import org.gcube.dataanalysis.ecoengine.interfaces.ComputationalAgent;<br />
import org.gcube.dataanalysis.ecoengine.test.regression.Regressor;<br />
<br />
public class TestSGVMInterpolation {<br />
<br />
public static void main(String[] args) throws Exception {<br />
// setup the configuration<br />
AlgorithmConfiguration config = new AlgorithmConfiguration();<br />
// set the path to the cfg folder and to the PARALLEL_PROCESSING folder<br />
config.setConfigPath("./cfg/");<br />
config.setPersistencePath("./PARALLEL_PROCESSING");<br />
//set the user's inputs. They will passed by the DM to the script in the following way:<br />
config.setParam("InputFile", "<absolute path to the file>/tacsatmini.csv"); //put the absolute path to the input file<br />
config.setParam("npoints", "10");<br />
config.setParam("interval", "120");<br />
config.setParam("margin", "10");<br />
config.setParam("res", "100");<br />
config.setParam("method", "SL");<br />
config.setParam("fm", "0.5");<br />
config.setParam("distscale", "20");<br />
config.setParam("sigline", "0.2");<br />
config.setParam("minspeedThr", "2");<br />
config.setParam("maxspeedThr", "6");<br />
config.setParam("headingAdjustment", "0");<br />
config.setParam("equalDist", "true");<br />
<br />
//set the scope and the user (optional for this test)<br />
config.setGcubeScope( "/gcube/devsec/devVRE");<br />
config.setParam("ServiceUserName", "test.user");<br />
<br />
//set the name of the algorithm to call, as is is in the transducerer.properties file<br />
config.setAgent("SGVM_INTERPOLATION");<br />
<br />
//recall the transducerer with the above name <br />
ComputationalAgent transducer = new SGVMS_Interpolation();<br />
tansducer.setConfiguration(config);<br />
<br />
//init the transducer<br />
transducer.init();<br />
//start the process<br />
Regressor.process(transducer);<br />
//retrieve the output<br />
StatisticalType st = transducer.getOutput();<br />
System.out.println("st:"+((PrimitiveType)st).getContent());<br />
}<br />
<br />
}<br />
</source><br />
<br />
=Enabling Cloud Computing for R Scripts=<br />
In the case of a process running in the Infrastructure and using Cloud computing, you have to extend the ActorNode class, define how to setup the process, chunkize the input space, run the script and perform the Reduce phase.<br />
These steps are performed using the following methods respectively:<br />
<br />
* setup(AlgorithmConfiguration config)<br />
* getNumberOfRightElements() <br />
* getNumberOfLeftElements()<br />
* postProcess(boolean manageDuplicates, boolean manageFault)<br />
<br />
<source lang="java"><br />
package org.gcube.dataanalysis.executor.nodes.algorithms;<br />
<br />
public class LWR extends ActorNode {<br />
<br />
public String destinationTable;<br />
public String destinationTableLabel;<br />
public String originTable;<br />
public String familyColumn;<br />
public int count;<br />
<br />
public float status = 0;<br />
<br />
//specify the kind of parallel process: the following performs a matrix-to-matrix comparison<br />
@Override<br />
public ALG_PROPS[] getProperties() {<br />
ALG_PROPS[] p = { ALG_PROPS.PHENOMENON_VS_PARALLEL_PHENOMENON };<br />
return p;<br />
}<br />
<br />
@Override<br />
public String getName() {<br />
return "LWR";<br />
}<br />
<br />
@Override<br />
public String getDescription() {<br />
return "An algorithm to estimate Length-Weight relationship parameters for marine species, using Bayesian methods. Runs an R procedure. Based on the Cube-law theory.";<br />
}<br />
<br />
@Override<br />
public List<StatisticalType> getInputParameters() {<br />
List<TableTemplates> templateLWRInput = new ArrayList<TableTemplates>();<br />
templateLWRInput.add(TableTemplates.GENERIC);<br />
InputTable p1 = new InputTable(templateLWRInput, "LWR_Input", "Input table containing taxa and species information", "lwr");<br />
ColumnType p3 = new ColumnType("LWR_Input", "FamilyColumn", "The column containing Family information", "Family", false);<br />
ServiceType p4 = new ServiceType(ServiceParameters.RANDOMSTRING, "RealOutputTable", "name of the resulting table", "lwr_");<br />
PrimitiveType p2 = new PrimitiveType(String.class.getName(), null, PrimitiveTypes.STRING, "TableLabel", "Name of the table which will contain the model output", "lwrout");<br />
<br />
List<StatisticalType> parameters = new ArrayList<StatisticalType>();<br />
parameters.add(p1);<br />
parameters.add(p3);<br />
parameters.add(p2);<br />
parameters.add(p4);<br />
<br />
DatabaseType.addDefaultDBPars(parameters);<br />
<br />
return parameters;<br />
}<br />
<br />
@Override<br />
public StatisticalType getOutput() {<br />
List<TableTemplates> template = new ArrayList<TableTemplates>();<br />
template.add(TableTemplates.GENERIC);<br />
OutputTable p = new OutputTable(template, destinationTableLabel, destinationTable, "Output lwr table");<br />
return p;<br />
}<br />
<br />
@Override<br />
public void initSingleNode(AlgorithmConfiguration config) {<br />
<br />
}<br />
<br />
@Override<br />
public float getInternalStatus() {<br />
return status;<br />
}<br />
<br />
private static String scriptName = "UpdateLWR_4.R";<br />
<br />
//the inputs delivered by the DM are: the index and number of elements to take from the left and right tables, the indication on if the same requeste was yet asked to another worker node (in the case of errors), the sandobox folder in which the script will be executed, the configuration of the algorithm<br />
@Override<br />
public int executeNode(int leftStartIndex, int numberOfLeftElementsToProcess, int rightStartIndex, int numberOfRightElementsToProcess, boolean duplicate, String sandboxFolder, String nodeConfigurationFileObject, String logfileNameToProduce) {<br />
String insertQuery = null;<br />
try {<br />
status = 0;<br />
//reconstruct the configuration<br />
AlgorithmConfiguration config = Transformations.restoreConfig(nodeConfigurationFileObject);<br />
config.setConfigPath(sandboxFolder);<br />
System.out.println("Initializing DB");<br />
//take the parameters and possibly initialize connection to the DB<br />
dbconnection = DatabaseUtils.initDBSession(config);<br />
destinationTableLabel = config.getParam("TableLabel");<br />
destinationTable = config.getParam("RealOutputTable");<br />
System.out.println("Destination Table: "+destinationTable);<br />
System.out.println("Destination Table Label: "+destinationTableLabel);<br />
originTable = config.getParam("LWR_Input");<br />
familyColumn = config.getParam("FamilyColumn");<br />
System.out.println("Origin Table: "+originTable);<br />
<br />
// take the families to process<br />
List<Object> families = DatabaseFactory.executeSQLQuery(DatabaseUtils.getDinstictElements(originTable, familyColumn, ""), dbconnection);<br />
<br />
// transform the families into a string<br />
StringBuffer familiesFilter = new StringBuffer();<br />
familiesFilter.append("Families <- Fam.All[");<br />
<br />
int end = rightStartIndex + numberOfRightElementsToProcess;<br />
//build the substitution string<br />
for (int i = rightStartIndex; i < end; i++) {<br />
familiesFilter.append("Fam.All == \"" + families.get(i) + "\"");<br />
if (i < end - 1)<br />
familiesFilter.append(" | ");<br />
}<br />
familiesFilter.append("]");<br />
<br />
//substitution to perform in the script<br />
String substitutioncommand = "sed -i 's/Families <- Fam.All[Fam.All== \"Acanthuridae\" | Fam.All == \"Achiridae\"]/" + familiesFilter + "/g' " + "UpdateLWR_Test2.R";<br />
System.out.println("Preparing for processing the families names: "+familiesFilter.toString());<br />
<br />
substituteInScript(sandboxFolder+scriptName,sandboxFolder+"UpdateLWR_Tester.R","Families <- Fam.All[Fam.All== \"Acanthuridae\" | Fam.All == \"Achiridae\"]",familiesFilter.toString());<br />
//for test only<br />
<br />
System.out.println("Creating local file from remote table");<br />
// download the table in csv format to feed the procedure<br />
DatabaseUtils.createLocalFileFromRemoteTable(sandboxFolder+"RF_LWR.csv", originTable, ",", config.getDatabaseUserName(),config.getDatabasePassword(),config.getDatabaseURL());<br />
<br />
String headers = "Subfamily,Family,Genus,Species,FBname,SpecCode,AutoCtr,Type,a,b,CoeffDetermination,Number,LengthMin,Score,BodyShapeI";<br />
System.out.println("Adding headers to the file");<br />
<br />
String headerscommand = "sed -i '1s/^/"+headers+"\\n/g' "+"RF_LWR2.csv";<br />
// substitute the string in the RCode<br />
addheader(sandboxFolder+"RF_LWR.csv",sandboxFolder+"RF_LWR2.csv",headers);<br />
System.out.println("Headers added");<br />
System.out.println("Executing R script " + "R --no-save < UpdateLWR_Tester.R");<br />
// run the R code: it can be alternatively made with the methods of the previous example<br />
Process process = Runtime.getRuntime().exec("R --no-save");<br />
BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(process.getOutputStream()));<br />
bw.write("source('UpdateLWR_Tester.R')\n");<br />
bw.write("q()\n");<br />
bw.close();<br />
BufferedReader br = new BufferedReader(new InputStreamReader(process.getInputStream()));<br />
String line = br.readLine();<br />
System.out.println(line);<br />
while (line!=null){<br />
line = br.readLine();<br />
System.out.println(line);<br />
}<br />
process.destroy();<br />
System.out.println("Appending csv to table");<br />
// transform the output into table<br />
StringBuffer lines = readFromCSV("LWR_Test1.csv");<br />
insertQuery = DatabaseUtils.insertFromBuffer(destinationTable, columnNames, lines);<br />
DatabaseFactory.executeSQLUpdate(insertQuery, dbconnection);<br />
System.out.println("The procedure was successful");<br />
status = 1f;<br />
} catch (Exception e) {<br />
e.printStackTrace();<br />
System.out.println("warning: error in node execution " + e.getLocalizedMessage());<br />
System.out.println("Insertion Query: "+insertQuery);<br />
System.err.println("Error in node execution " + e.getLocalizedMessage());<br />
return -1;<br />
} finally {<br />
if (dbconnection != null)<br />
try {<br />
dbconnection.close();<br />
} catch (Exception e) {<br />
}<br />
}<br />
return 0;<br />
}<br />
<br />
//setup phase of the algorithm<br />
@Override<br />
public void setup(AlgorithmConfiguration config) throws Exception {<br />
<br />
destinationTableLabel = config.getParam("TableLabel");<br />
AnalysisLogger.getLogger().info("Table Label: "+destinationTableLabel);<br />
destinationTable = config.getParam("RealOutputTable");<br />
AnalysisLogger.getLogger().info("Uderlying Table Name: "+destinationTable);<br />
originTable = config.getParam("LWR_Input");<br />
AnalysisLogger.getLogger().info("Original Table: "+originTable);<br />
familyColumn = config.getParam("FamilyColumn");<br />
AnalysisLogger.getLogger().info("Family Column: "+familyColumn);<br />
haspostprocessed = false;<br />
<br />
AnalysisLogger.getLogger().info("Initializing DB Connection");<br />
dbconnection = DatabaseUtils.initDBSession(config);<br />
List<Object> families = DatabaseFactory.executeSQLQuery(DatabaseUtils.getDinstictElements(originTable, familyColumn, ""), dbconnection);<br />
count = families.size();<br />
<br />
//create the table were the script will write the output<br />
DatabaseFactory.executeSQLUpdate(String.format(createOutputTable, destinationTable), dbconnection);<br />
AnalysisLogger.getLogger().info("Destination Table Created! Addressing " + count + " species");<br />
} <br />
<br />
@Override<br />
public int getNumberOfRightElements() {<br />
return count; //each Worker node has to get all the elements in the right table<br />
}<br />
<br />
@Override<br />
public int getNumberOfLeftElements() {<br />
return 1; //each Worker node has to get only one element in the left table<br />
}<br />
<br />
@Override<br />
public void stop() {<br />
<br />
//if has not postprocessed, then abort the computations by removing the database table<br />
if (!haspostprocessed){<br />
try{<br />
AnalysisLogger.getLogger().info("The procedure did NOT correctly postprocessed ....Removing Table "+destinationTable+" because of computation stop!");<br />
DatabaseFactory.executeSQLUpdate(DatabaseUtils.dropTableStatement(destinationTable), dbconnection);<br />
}catch (Exception e) {<br />
AnalysisLogger.getLogger().info("Table "+destinationTable+" did not exist");<br />
}<br />
}<br />
else<br />
AnalysisLogger.getLogger().info("The procedure has correctly postprocessed: shutting down the connection!");<br />
if (dbconnection != null)<br />
try {<br />
dbconnection.close();<br />
} catch (Exception e) {<br />
}<br />
}<br />
<br />
boolean haspostprocessed = false;<br />
@Override<br />
public void postProcess(boolean manageDuplicates, boolean manageFault) {<br />
haspostprocessed=true;<br />
}<br />
<br />
}<br />
</source><br />
<br />
=Video=<br />
<br />
We advice you to also follow this video which practically show how to build an algorithm:<br />
<br />
http://i-marine.eu/Content/eTraining.aspx?id=e1777006-a08c-49ad-b2e6-c13e094f27d4<br />
<br />
= Related Links =<br />
[https://wiki.gcube-system.org/gcube/DataMiner_Manager DataMiner Tutorial]<br />
<br />
[https://wiki.gcube-system.org/gcube/Data_Mining_Facilities Data Mining Facilities]<br />
<br />
==References==<br />
{{Reflist}}</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Pre_Installed_Packages&diff=30471Pre Installed Packages2017-12-13T22:18:16Z<p>Gianpaolo.coro: </p>
<hr />
<div><!-- CATEGORIES --><br />
[[Category:Developer's Guide]]<br />
<!-- END CATEGORIES --><br />
=Preamble=<br />
This Wiki reports the packages pre-installed on the computational machines for a variety of languages.<br />
On this machines also run [[DataMiner Manager|DataMiner]] service with algorithms created through [[Statistical Algorithms Importer|Statistical Algorithms Importer (SAI)]].<br />
<br />
==R Packages==<br />
A constantly updated list of installed R 3.4.0 Packages is available at the following [https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/RConfiguration/RPackagesManagement/r_cran_pkgs.txt LINK].<br />
<br />
Github packages are show here:[https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/RConfiguration/RPackagesManagement/r_github_pkgs.txt LINK].<br />
<br />
==Linux Debian Packages==<br />
A constantly updated list of installed Debian 3.2.51-1 x86_64 Packages is available at the following [https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/RConfiguration/RPackagesManagement/r_deb_pkgs.txt LINK]<br />
<br />
==Octave Packages==<br />
Octave 4.0.2 is installed on the computational machines with basic packages.<br />
<br />
==Knime Packages==<br />
Knime-Full 3.3.2 is installed on the computational machines.<br />
<br />
==Java Packages==<br />
Java 8 is installed on the computational machines, with the dependencies of the Ecological Engine framework, retrievable through Maven [http://maven.research-infrastructures.eu/nexus/index.html#nexus-search;gav~org.gcube.dataanalysis~ecological-engine-geospatial-extensions HERE] (refer to the latest version).<br />
<br />
==Python Packages==<br />
Python 2.7.6 is installed on the computational machines with basic packages.<br />
<br />
==Windows Packages==<br />
Windows .Net compiled programs are supported through the [http://www.mono-project.com/ Mono] 3.2.8 simulator.<br />
<br />
[[Category:Statistical Algorithms Importer]]</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=How_to_add_my_data_to_the_gCube_Spatial_Data_Infrastructure&diff=30469How to add my data to the gCube Spatial Data Infrastructure2017-12-12T23:38:11Z<p>Gianpaolo.coro: </p>
<hr />
<div>[[Category:User's Guide]]<br />
<br />
{|align=right<br />
||__TOC__<br />
|}<br />
<br />
== Overview == <br />
gCube offers a comprehensive set of [[Spatial_Data_Infrastructure_Facilities | facilities for managing geospatial data]]. <br />
<br />
This "How to" describes how a user can add its own data to the gCube SDI. In particular, the following options are supported:<br />
<br />
* [[#Import an external CSW catalog to the SDI Catalog|Import an external CSW catalog to the SDI Catalog]]<br />
* [[#Register your GeoSpatial data in the SDI Catalog|Register your GeoSpatial data in the SDI Catalog]]<br />
<br />
Once data are published into the gCube SDI they can be accessed and consumed by means of several approaches including: <br />
<br />
* [[GeoExplorer | Visualization and inspection]]<br />
* [[DataMiner_Manager | Processing]]<br />
<br />
'''NB:''' Please note that it is also possible to visualize and inspect external layers not published in the infrastructure. See [[GeoExplorer]] portlet user's guide for more details.<br />
<br />
For a more comprehensive list on gCube features, please refer to our [[User's_Guide| User's Guide]].<br />
<br />
==Import an external CSW catalog to the SDI Catalog==<br />
External OGC Web-Services can be harvested into the gCube SDI Catalog, making its entries available throughout the infrastructure but still physically hosted on remote services. To do so, please open a ticket here : [https://support.d4science.org] specifying :<br />
<br />
* The OGC interfaces exposed by your service<br />
* Possible credentials needed<br />
* The target VRE in which the harvested entries should be published<br />
* Any other detail of interest<br />
<br />
==Register your GeoSpatial data in the SDI Catalog==<br />
The gCube Framework offers several ways of registering your GeoSpatial data as '''Layers''' published inside the '''SDI Catalog''', making them available throughout the infrastructure.<br />
<br />
For this purpose, a set of processes are offered by our [[Spatial_Data_Processing | Spatial Data Analytics]] facilities:<br />
<br />
*Publish '''ShapeFile'''<br />
*Publish '''RasterFile'''<br />
*Publish Layer from '''coordinates-referenced data'''<br />
*Publish Layer from '''CSquareCode-referenced data'''<br />
*Publish Layer from '''GeoSpatial-polygons data'''<br />
<br />
Please refer to [[DataMiner_Manager|this guide]] on how to execute these algorithms.<br />
<br />
For a more comprehensive list of available processing algorithms, please refer to our [[DataMiner_Algorithms | default provided algorithms list]].<br />
<br />
===Ad-hoc Registration===<br />
If none of the existing processes suits your needs, the infrastructure lets you develop, test and run '''your own algorithms''', that can fulfill the registration of your specific datasets.<br />
<br />
Please refer to [[Statistical_Algorithms_Importer | this guide]] on how to import/develop your own algorithms.</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=How_to_add_my_data_to_the_gCube_Spatial_Data_Infrastructure&diff=30468How to add my data to the gCube Spatial Data Infrastructure2017-12-12T23:36:33Z<p>Gianpaolo.coro: /* Register your GeoSpatial data in the SDI Catalog */</p>
<hr />
<div>[[Category:User's Guide]]<br />
<br />
{|align=right<br />
||__TOC__<br />
|}<br />
<br />
== Overview == <br />
gCube offers a comprehensive set of [[Spatial_Data_Infrastructure_Facilities | facilities for managing geospatial data]]. <br />
<br />
This "How to" describes how a user can add its own data to the gCube SDI. In particular, the following options are supported:<br />
<br />
* [[#Import an external CSW catalog to the SDI Catalog|Import an external CSW catalog to the SDI Catalog]]<br />
* [[#Register your GeoSpatial data in the SDI Catalog|Register your GeoSpatial data in the SDI Catalog]]<br />
<br />
Once data are published into the gCube SDI they can be accessed and consumed by means of several approaches including: <br />
<br />
* [[GeoExplorer | Visualization and inspection]]<br />
* [[Statistical_Manager_Tutorial | Processing]]<br />
<br />
'''NB:''' Please note that it is also possible to visualize and inspect external layers not published in the infrastructure. See [[GeoExplorer]] portlet user's guide for more details.<br />
<br />
For a more comprehensive list on gCube features, please refer to our [[User's_Guide| User's Guide]].<br />
<br />
==Import an external CSW catalog to the SDI Catalog==<br />
External OGC Web-Services can be harvested into the gCube SDI Catalog, making its entries available throughout the infrastructure but still physically hosted on remote services. To do so, please open a ticket here : [https://support.d4science.org] specifying :<br />
<br />
* The OGC interfaces exposed by your service<br />
* Possible credentials needed<br />
* The target VRE in which the harvested entries should be published<br />
* Any other detail of interest<br />
<br />
==Register your GeoSpatial data in the SDI Catalog==<br />
The gCube Framework offers several ways of registering your GeoSpatial data as '''Layers''' published inside the '''SDI Catalog''', making them available throughout the infrastructure.<br />
<br />
For this purpose, a set of processes are offered by our [[Spatial_Data_Processing | Spatial Data Analytics]] facilities:<br />
<br />
*Publish '''ShapeFile'''<br />
*Publish '''RasterFile'''<br />
*Publish Layer from '''coordinates-referenced data'''<br />
*Publish Layer from '''CSquareCode-referenced data'''<br />
*Publish Layer from '''GeoSpatial-polygons data'''<br />
<br />
Please refer to [[Statistical_Manager_Tutorial|this guide]] on how to execute these algorithms.<br />
<br />
For a more comprehensive list of available processing algorithms, please refer to our [[DataMiner_Algorithms | default provided algorithms list]].<br />
<br />
===Ad-hoc Registration===<br />
If none of the existing processes suits your needs, the infrastructure lets you develop, test and run '''your own algorithms''', that can fulfill the registration of your specific datasets.<br />
<br />
Please refer to [[Statistical_Algorithms_Importer | this guide]] on how to import/develop your own algorithms.</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=How_to_add_my_data_to_the_gCube_Spatial_Data_Infrastructure&diff=30467How to add my data to the gCube Spatial Data Infrastructure2017-12-12T23:35:56Z<p>Gianpaolo.coro: /* Import an external CSW catalog to the SDI Catalog */</p>
<hr />
<div>[[Category:User's Guide]]<br />
<br />
{|align=right<br />
||__TOC__<br />
|}<br />
<br />
== Overview == <br />
gCube offers a comprehensive set of [[Spatial_Data_Infrastructure_Facilities | facilities for managing geospatial data]]. <br />
<br />
This "How to" describes how a user can add its own data to the gCube SDI. In particular, the following options are supported:<br />
<br />
* [[#Import an external CSW catalog to the SDI Catalog|Import an external CSW catalog to the SDI Catalog]]<br />
* [[#Register your GeoSpatial data in the SDI Catalog|Register your GeoSpatial data in the SDI Catalog]]<br />
<br />
Once data are published into the gCube SDI they can be accessed and consumed by means of several approaches including: <br />
<br />
* [[GeoExplorer | Visualization and inspection]]<br />
* [[Statistical_Manager_Tutorial | Processing]]<br />
<br />
'''NB:''' Please note that it is also possible to visualize and inspect external layers not published in the infrastructure. See [[GeoExplorer]] portlet user's guide for more details.<br />
<br />
For a more comprehensive list on gCube features, please refer to our [[User's_Guide| User's Guide]].<br />
<br />
==Import an external CSW catalog to the SDI Catalog==<br />
External OGC Web-Services can be harvested into the gCube SDI Catalog, making its entries available throughout the infrastructure but still physically hosted on remote services. To do so, please open a ticket here : [https://support.d4science.org] specifying :<br />
<br />
* The OGC interfaces exposed by your service<br />
* Possible credentials needed<br />
* The target VRE in which the harvested entries should be published<br />
* Any other detail of interest<br />
<br />
==Register your GeoSpatial data in the SDI Catalog==<br />
The gCube Framework offer several ways of registering your GeoSpatial data as '''Layers''' published inside the '''SDI Catalog''', making them available throughout the infrastructure.<br />
<br />
For this purpose, a set of processes are offered by our [[Spatial_Data_Processing | Spatial Data Analytics]] facilities:<br />
<br />
*Publish '''ShapeFile'''<br />
*Publish '''RasterFile'''<br />
*Publish Layer from '''coordinates-referenced data'''<br />
*Publish Layer from '''CSquareCode-referenced data'''<br />
*Publish Layer from '''GeoSpatial-polygons data'''<br />
<br />
Please refer to [[Statistical_Manager_Tutorial|this guide]] on how to execute these algorithms.<br />
<br />
For a more comprehensive list of available processing algorithms, please refer to our [[DataMiner_Algorithms | default provided algorithms list]].<br />
<br />
===Ad-hoc Registration===<br />
If none of the existing processes suits your needs, the infrastructure lets you develop, test and run '''your own algorithms''', that can fulfill the registration of your specific datasets.<br />
<br />
Please refer to [[Statistical_Algorithms_Importer | this guide]] on how to import/develop your own algorithms.</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Statistical_Algorithms_Importer:_Pre-Installed_Project&diff=30466Statistical Algorithms Importer: Pre-Installed Project2017-12-12T14:40:23Z<p>Gianpaolo.coro: </p>
<hr />
<div>{| align="right"<br />
||__TOC__<br />
|}<br />
<br />
:This page explains how to create a Pre-Installed project using Statistical Algorithms Importer(SAI) portlet.<br />
[[Image:StatisticalAlgorithmsImporter_PreInstalledBlackBox0.png|thumb|center|250px|Pre-Installed Project, SAI]]<br />
<br />
==Project Configuration==<br />
:Define project's metadata<br />
[[Image:StatisticalAlgorithmsImporter_PreInstalledBlackBox1.png|thumb|center|800px|Pre-Installed Info, SAI]]<br />
<br />
:Add input and output parameters and click on "Set Code" to indicate the main file to execute (i.e. the .sh file)<br />
[[Image:StatisticalAlgorithmsImporter_PreInstalledBlackBox2.png|thumb|center|800px|Pre-Installed I/O, SAI]]<br />
<br />
:Add information about the running environment (e.g. Linux OS version etc.) <br />
[[Image:StatisticalAlgorithmsImporter_PreInstalledBlackBox4.png|thumb|center|800px|Pre-Installed Interpreter, SAI]]<br />
<br />
:After the [https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer:_Create_Software software creation phase] a Main.R file and a Taget folder are created<br />
[[Image:StatisticalAlgorithmsImporter_PreInstalledBlackBox3.png|thumb|center|800px|Pre-Installed Create, SAI]]<br />
<br />
==Inheritance of Global and Infrastructure Variables==<br />
<br />
at each run of the process the '''globalvariables.csv''' file is created locally to the process (i.e. it can be read as ./globalvariables.csv), which contains the following global variables that are meant to allow the process to properly contact the e-Infrastructure services:<br />
<br />
* '''gcube_username''' (the user who run the computation, e.g. gianpaolo.coro)<br />
<br />
* '''gcube_context''' (the VRE the process was run in, e.g. d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab)<br />
<br />
* '''gcube_token''' (the token of the user for the VRE, e.g. 1234-567-890)<br />
<br />
The format of the CSV file is like the one of the following example:<br />
<br />
<source lang='vim'><br />
globalvariable,globalvalue<br />
gcube_username,gianpaolo.coro<br />
gcube_context,/d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab<br />
gcube_token,1234-567-890<br />
</source><br />
<br />
==Example Download==<br />
[[File:PreInstBlackBox.zip|PreInstBlackBox.zip]]<br />
<br />
<!--<br />
==References==<br />
{{Reflist}} --><br />
<br />
<br />
<!--<br />
[[Template:Statistical Algorithms Importer]] <br />
--><br />
<br />
[[Category:Statistical Algorithms Importer]]</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Statistical_Algorithms_Importer:_Windows_Project&diff=30465Statistical Algorithms Importer: Windows Project2017-12-12T14:40:13Z<p>Gianpaolo.coro: </p>
<hr />
<div>{| align="right"<br />
||__TOC__<br />
|}<br />
<br />
:This page explains how to create a Windows-compiled project using the Statistical Algorithms Importer (SAI) portlet.<br />
[[Image:StatisticalAlgorithmsImporter_WindowsBlackBox0.png|thumb|center|250px|Windows Project, SAI]]<br />
<br />
==Project Configuration==<br />
:Define project's metadata<br />
[[Image:StatisticalAlgorithmsImporter_WindowsBlackBox1.png|thumb|center|800px|Windows Info, SAI]]<br />
<br />
:Add input and output parameters and click on "Set Code" to indicate the main file to execute (i.e. the .exe file)<br />
[[Image:StatisticalAlgorithmsImporter_WindowsBlackBox2.png|thumb|center|800px|Windows I/O, SAI]]<br />
<br />
:Add information about the running environment (e.g. Mono version etc.) <br />
[[Image:StatisticalAlgorithmsImporter_WindowsBlackBox3.png|thumb|center|800px|Windows Interpreter, SAI]]<br />
<br />
:After the [https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer:_Create_Software software creation phase] a Main.R file and a Taget folder are created<br />
[[Image:StatisticalAlgorithmsImporter_WindowsBlackBox4.png|thumb|center|800px|Windows Create, SAI]]<br />
<br />
==Inheritance of Global and Infrastructure Variables==<br />
<br />
at each run of the process the '''globalvariables.csv''' file is created locally to the process (i.e. it can be read as ./globalvariables.csv), which contains the following global variables that are meant to allow the process to properly contact the e-Infrastructure services:<br />
<br />
* '''gcube_username''' (the user who run the computation, e.g. gianpaolo.coro)<br />
<br />
* '''gcube_context''' (the VRE the process was run in, e.g. d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab)<br />
<br />
* '''gcube_token''' (the token of the user for the VRE, e.g. 1234-567-890)<br />
<br />
The format of the CSV file is like the one of the following example:<br />
<br />
<source lang='vim'><br />
globalvariable,globalvalue<br />
gcube_username,gianpaolo.coro<br />
gcube_context,/d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab<br />
gcube_token,1234-567-890<br />
</source><br />
<br />
==Example Download==<br />
[[File:WindowsBlackBox.zip|WindowsBlackBox.zip]]<br />
<br />
<!--<br />
==References==<br />
{{Reflist}} --><br />
<br />
<br />
<!--<br />
[[Template:Statistical Algorithms Importer]] <br />
--><br />
<br />
[[Category:Statistical Algorithms Importer]]</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Statistical_Algorithms_Importer:_Python_Project&diff=30464Statistical Algorithms Importer: Python Project2017-12-12T14:40:02Z<p>Gianpaolo.coro: </p>
<hr />
<div>{| align="right"<br />
||__TOC__<br />
|}<br />
<br />
:This page explains how to create a Python project using the Statistical Algorithms Importer (SAI) portlet.<br />
[[Image:StatisticalAlgorithmsImporter_PythonBlackBox0.png|thumb|center|250px|Python Project, SAI]]<br />
<br />
==Project Configuration==<br />
:Define project's metadata<br />
[[Image:StatisticalAlgorithmsImporter_PythonBlackBox1.png|thumb|center|800px|Python Info, SAI]]<br />
<br />
:Add input and output parameters and click on "Set Code" to indicate the main file to execute (i.e. the .py file)<br />
[[Image:StatisticalAlgorithmsImporter_PythonBlackBox2.png|thumb|center|800px|Python I/O, SAI]]<br />
<br />
:Add information about the running environment (e.g. Python version etc.) <br />
[[Image:StatisticalAlgorithmsImporter_PythonBlackBox3.png|thumb|center|800px|Python Interpreter, SAI]]<br />
<br />
:After the [https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer:_Create_Software software creation phase] a Main.R file and a Taget folder are created<br />
[[Image:StatisticalAlgorithmsImporter_PythonBlackBox4.png|thumb|center|800px|Python Create, SAI]]<br />
<br />
==Inheritance of Global and Infrastructure Variables==<br />
<br />
at each run of the process the '''globalvariables.csv''' file is created locally to the process (i.e. it can be read as ./globalvariables.csv), which contains the following global variables that are meant to allow the process to properly contact the e-Infrastructure services:<br />
<br />
* '''gcube_username''' (the user who run the computation, e.g. gianpaolo.coro)<br />
<br />
* '''gcube_context''' (the VRE the process was run in, e.g. d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab)<br />
<br />
* '''gcube_token''' (the token of the user for the VRE, e.g. 1234-567-890)<br />
<br />
The format of the CSV file is like the one of the following example:<br />
<br />
<source lang='vim'><br />
globalvariable,globalvalue<br />
gcube_username,gianpaolo.coro<br />
gcube_context,/d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab<br />
gcube_token,1234-567-890<br />
</source><br />
<br />
== Example Code ==<br />
:Python code in sample:<br />
<br />
<source lang='python'><br />
#<br />
# author Giancarlo Panichi<br />
#<br />
# HelloWorld<br />
# <br />
import sys<br />
<br />
for arg in sys.argv: 1<br />
out_file = open("helloworld.txt","w")<br />
out_file.write("Hello World\n"+arg+"\n")<br />
out_file.close()<br />
</source><br />
<br />
==Example Download==<br />
[[File:PythonBlackBox.zip|PythonBlackBox.zip]]<br />
<br />
<!--<br />
==References==<br />
{{Reflist}} --><br />
<br />
<br />
<!--<br />
[[Template:Statistical Algorithms Importer]] <br />
--><br />
<br />
[[Category:Statistical Algorithms Importer]]</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Statistical_Algorithms_Importer:_Octave_Project&diff=30463Statistical Algorithms Importer: Octave Project2017-12-12T14:39:46Z<p>Gianpaolo.coro: </p>
<hr />
<div>{| align="right"<br />
||__TOC__<br />
|}<br />
<br />
:This page explains how to create a Octave project using the Statistical Algorithms Importer (SAI) portlet.<br />
[[Image:StatisticalAlgorithmsImporter_OctaveBlackBox0.png|thumb|center|250px|Octave Project, SAI]]<br />
<br />
==Project Configuration==<br />
:Define project's metadata<br />
[[Image:StatisticalAlgorithmsImporter_OctaveBlackBox1.png|thumb|center|800px|Octave Info, SAI]]<br />
<br />
:Add input and output parameters and click on "Set Code" to indicate the main file to execute (i.e. the .m file)<br />
[[Image:StatisticalAlgorithmsImporter_OctaveBlackBox2.png|thumb|center|800px|Octave I/O, SAI]]<br />
<br />
:Add information about the running environment (e.g. Octave version etc.) <br />
[[Image:StatisticalAlgorithmsImporter_OctaveBlackBox3.png|thumb|center|800px|Octave Interpreter, SAI]]<br />
<br />
:After the [https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer:_Create_Software software creation phase] a Main.R file and a Taget folder are created<br />
[[Image:StatisticalAlgorithmsImporter_OctaveBlackBox4.png|thumb|center|800px|Octave Create, SAI]]<br />
<br />
==Inheritance of Global and Infrastructure Variables==<br />
<br />
at each run of the process the '''globalvariables.csv''' file is created locally to the process (i.e. it can be read as ./globalvariables.csv), which contains the following global variables that are meant to allow the process to properly contact the e-Infrastructure services:<br />
<br />
* '''gcube_username''' (the user who run the computation, e.g. gianpaolo.coro)<br />
<br />
* '''gcube_context''' (the VRE the process was run in, e.g. d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab)<br />
<br />
* '''gcube_token''' (the token of the user for the VRE, e.g. 1234-567-890)<br />
<br />
The format of the CSV file is like the one of the following example:<br />
<br />
<source lang='vim'><br />
globalvariable,globalvalue<br />
gcube_username,gianpaolo.coro<br />
gcube_context,/d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab<br />
gcube_token,1234-567-890<br />
</source><br />
<br />
== Example Code ==<br />
:Octave code in sample:<br />
<br />
<source lang='octave'><br />
#<br />
# author: Giancarlo Panichi<br />
#<br />
arg_list = argv()<br />
testin=arg_list{1}<br />
filename = "free.txt";<br />
fid = fopen (filename, "w");<br />
fputs (fid, testin);<br />
fclose (fid);<br />
</source><br />
<br />
==Example Download==<br />
[[File:OctaveBlackBox.zip|OctaveBlackBox.zip]]<br />
<br />
<!--<br />
==References==<br />
{{Reflist}} --><br />
<br />
<br />
<!--<br />
[[Template:Statistical Algorithms Importer]] <br />
--><br />
<br />
[[Category:Statistical Algorithms Importer]]</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Statistical_Algorithms_Importer:_Linux-compiled_Project&diff=30462Statistical Algorithms Importer: Linux-compiled Project2017-12-12T14:39:37Z<p>Gianpaolo.coro: </p>
<hr />
<div>{| align="right"<br />
||__TOC__<br />
|}<br />
<br />
:This page explains how to create a Linux-compiled project using the Statistical Algorithms Importer (SAI) portlet.<br />
[[Image:StatisticalAlgorithmsImporter_LinuxBlackBox0.png|thumb|center|250px|Linux Project, SAI]]<br />
<br />
==Project Configuration==<br />
:Define project's metadata<br />
[[Image:StatisticalAlgorithmsImporter_LinuxBlackBox1.png|thumb|center|800px|Linux Info, SAI]]<br />
<br />
:Add input and output parameters and click on "Set Code" to indicate the main file to execute (i.e. the .sh file)<br />
[[Image:StatisticalAlgorithmsImporter_LinuxBlackBox2.png|thumb|center|800px|Linux I/O, SAI]]<br />
<br />
:Add information about the running environment (e.g. Linux version etc.) <br />
[[Image:StatisticalAlgorithmsImporter_LinuxBlackBox3.png|thumb|center|800px|Linux Interpreter, SAI]]<br />
<br />
:After the [https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer:_Create_Software software creation phase] a Main.R file and a Taget folder are created<br />
[[Image:StatisticalAlgorithmsImporter_LinuxBlackBox4.png|thumb|center|800px|Linux Create, SAI]]<br />
<br />
==Inheritance of Global and Infrastructure Variables==<br />
<br />
at each run of the process the '''globalvariables.csv''' file is created locally to the process (i.e. it can be read as ./globalvariables.csv), which contains the following global variables that are meant to allow the process to properly contact the e-Infrastructure services:<br />
<br />
* '''gcube_username''' (the user who run the computation, e.g. gianpaolo.coro)<br />
<br />
* '''gcube_context''' (the VRE the process was run in, e.g. d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab)<br />
<br />
* '''gcube_token''' (the token of the user for the VRE, e.g. 1234-567-890)<br />
<br />
The format of the CSV file is like the one of the following example:<br />
<br />
<source lang='vim'><br />
globalvariable,globalvalue<br />
gcube_username,gianpaolo.coro<br />
gcube_context,/d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab<br />
gcube_token,1234-567-890<br />
</source><br />
<br />
==Example Download==<br />
[[File:LinuxBlackBox.zip|LinuxBlackBox.zip]]<br />
<br />
<!--<br />
==References==<br />
{{Reflist}} --><br />
<br />
<br />
<!--<br />
[[Template:Statistical Algorithms Importer]] <br />
--><br />
<br />
[[Category:Statistical Algorithms Importer]]</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Statistical_Algorithms_Importer:_Knime-Workflow_Project&diff=30461Statistical Algorithms Importer: Knime-Workflow Project2017-12-12T14:39:25Z<p>Gianpaolo.coro: </p>
<hr />
<div>{| align="right"<br />
||__TOC__<br />
|}<br />
<br />
:This page explains how to create a Knime-Workflow project using the Statistical Algorithms Importer (SAI) portlet.<br />
[[Image:StatisticalAlgorithmsImporter_KnimeBlackBox0.png|thumb|center|250px|Knime Project, SAI]]<br />
<br />
==Project Configuration==<br />
:Define project's metadata<br />
[[Image:StatisticalAlgorithmsImporter_KnimeBlackBox1.png|thumb|center|800px|Knime Info, SAI]]<br />
<br />
:Add input and output parameters and click on "Set Code" to indicate the main file to execute (i.e. the .knwf file)<br />
[[Image:StatisticalAlgorithmsImporter_KnimeBlackBox2.png|thumb|center|800px|Knime I/O, SAI]]<br />
<br />
:Add information about the running environment (e.g. Knime version etc.) <br />
[[Image:StatisticalAlgorithmsImporter_KnimeBlackBox3.png|thumb|center|800px|Knime Interpreter, SAI]]<br />
<br />
:After the [https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer:_Create_Software software creation phase] a Main.R file and a Taget folder are created<br />
[[Image:StatisticalAlgorithmsImporter_KnimeBlackBox4.png|thumb|center|800px|Knime Create, SAI]]<br />
<br />
==Inheritance of Global and Infrastructure Variables==<br />
<br />
at each run of the process the '''globalvariables.csv''' file is created locally to the process (i.e. it can be read as ./globalvariables.csv), which contains the following global variables that are meant to allow the process to properly contact the e-Infrastructure services:<br />
<br />
* '''gcube_username''' (the user who run the computation, e.g. gianpaolo.coro)<br />
<br />
* '''gcube_context''' (the VRE the process was run in, e.g. d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab)<br />
<br />
* '''gcube_token''' (the token of the user for the VRE, e.g. 1234-567-890)<br />
<br />
The format of the CSV file is like the one of the following example:<br />
<br />
<source lang='vim'><br />
globalvariable,globalvalue<br />
gcube_username,gianpaolo.coro<br />
gcube_context,/d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab<br />
gcube_token,1234-567-890<br />
</source><br />
<br />
==Example Download==<br />
[[File:KnimeBlackBox.zip|KnimeBlackBox.zip]]<br />
<br />
<!--<br />
==References==<br />
{{Reflist}} --><br />
<br />
<br />
<!--<br />
[[Template:Statistical Algorithms Importer]] <br />
--><br />
<br />
[[Category:Statistical Algorithms Importer]]</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Statistical_Algorithms_Importer:_Java_Project&diff=30460Statistical Algorithms Importer: Java Project2017-12-12T14:39:05Z<p>Gianpaolo.coro: </p>
<hr />
<div>{| align="right"<br />
||__TOC__<br />
|}<br />
<br />
This page explains how to create a Java project using two alternative approaches: [https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer:_Java_Project#Black_Box_Integration '''Black-box'''] and [https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer:_Java_Project#White_Box_Integration '''White-box'''] integration. The next sections explain how these work and which cases these two approaches seaddress.<br />
<br />
=Black Box Integration=<br />
<br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox0.png|thumb|center|250px|Java Project, SAI]]<br />
<br />
This is the preferred way for developers who want their processes executions distributed based on the load of the requests. Each process request will run on one dedicated machine and is allowed to use multi-core processing. Black box processed usually do not use the e-Infrastructure resources but "live on their own". '''The Statistical Algorithms Importer (SAI) portlet must be used for this integration'''.<br />
<br />
==Project Configuration==<br />
:Define project's metadata<br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox1.png|thumb|center|800px|Java Info, SAI]]<br />
<br />
:Add input and output parameters and click on "Set Code" to indicate the main file to execute (i.e. the .jar file)<br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox2.png|thumb|center|800px|Java I/O, SAI]]<br />
<br />
:Add information about the running environment (e.g. Java version etc.) <br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox3.png|thumb|center|800px|Java Interpreter, SAI]]<br />
<br />
:After the [https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer:_Create_Software software creation phase] a Main.R file and a Taget folder are created<br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox4.png|thumb|center|800px|Java Create, SAI]]<br />
<br />
==Inheritance of Global and Infrastructure Variables==<br />
<br />
at each run of the process the '''globalvariables.csv''' file is created locally to the process (i.e. it can be read as ./globalvariables.csv), which contains the following global variables that are meant to allow the process to properly contact the e-Infrastructure services:<br />
<br />
* '''gcube_username''' (the user who run the computation, e.g. gianpaolo.coro)<br />
<br />
* '''gcube_context''' (the VRE the process was run in, e.g. d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab)<br />
<br />
* '''gcube_token''' (the token of the user for the VRE, e.g. 1234-567-890)<br />
<br />
The format of the CSV file is like the one of the following example:<br />
<br />
<source lang='vim'><br />
globalvariable,globalvalue<br />
gcube_username,gianpaolo.coro<br />
gcube_context,/d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab<br />
gcube_token,1234-567-890<br />
</source><br />
<br />
== Example Code ==<br />
:Java code in sample:<br />
<br />
<source lang='java'><br />
/**<br />
* <br />
* @author Giancarlo Panichi<br />
* <br />
*<br />
*/<br />
import java.io.File;<br />
import java.io.FileWriter;<br />
<br />
public class SimpleProducer<br />
{<br />
public static void main(String[] args)<br />
{<br />
try<br />
{<br />
FileWriter fw = new FileWriter(new File("program.txt"));<br />
fw.write("Check: " + args[0]);<br />
fw.close();<br />
}<br />
catch (Exception e)<br />
{<br />
e.printStackTrace();<br />
}<br />
}<br />
}<br />
</source><br />
<br />
==Example Download==<br />
[[File:JavaBlackBox.zip|JavaBlackBox.zip]]<br />
<br />
<!--<br />
==References==<br />
{{Reflist}} --><br />
<br />
=White Box Integration=<br />
This is the preferred way for developers who want their processes to fully exploit the e-Infrastructure resources, for example to implement Cloud computing using the e-Infrastructure computational resources. This integration modality also allows to fully reuse the Java data mining frameworks integrated by DataMiner, i.e. [https://www.knime.com/ Knime], [https://rapidminer.com/ RapidMiner], [https://www.cs.waikato.ac.nz/ml/weka/ Weka], [https://wiki.gcube-system.org/gcube/Statistical_Manager_Algorithms gCube EcologicalEngine]. '''The Eclipse IDE should be used for this integration'''.<br />
<br />
[https://gcube.wiki.gcube-system.org/gcube/How-to_Implement_Algorithms_for_DataMiner Step-by-step guide to integrate Java processes as white boxes]<br />
<!--<br />
[[Template:Statistical Algorithms Importer]] <br />
--><br />
<br />
[[Category:Statistical Algorithms Importer]]</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Statistical_Algorithms_Importer:_R-blackbox_Project&diff=30459Statistical Algorithms Importer: R-blackbox Project2017-12-12T14:38:39Z<p>Gianpaolo.coro: </p>
<hr />
<div>{| align="right"<br />
||__TOC__<br />
|}<br />
<br />
:This page explains how to create a R-blackbox project using the Statistical Algorithms Importer (SAI) portlet.<br />
[[Image:StatisticalAlgorithmsImporter_RBlackBox0.png|thumb|center|250px|R-blackbox Project, SAI]]<br />
<br />
==Project Configuration==<br />
:Define project's metadata<br />
[[Image:StatisticalAlgorithmsImporter_RBlackBox1.png|thumb|center|800px|R-blackbox Info, SAI]]<br />
<br />
:Add input and output parameters and click on "Set Code" to indicate the main file to execute (i.e. the .r file)<br />
[[Image:StatisticalAlgorithmsImporter_RBlackBox2.png|thumb|center|800px|R-blackbox I/O, SAI]]<br />
<br />
:Add information about the running environment (e.g. R version etc.) <br />
[[Image:StatisticalAlgorithmsImporter_RBlackBox3.png|thumb|center|800px|R-blackbox Interpreter, SAI]]<br />
<br />
:After the [https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer:_Create_Software software creation phase] a Main.R file and a Taget folder are created<br />
[[Image:StatisticalAlgorithmsImporter_RBlackBox4.png|thumb|center|800px|R-blackbox Create, SAI]]<br />
<br />
==Inheritance of Global and Infrastructure Variables==<br />
<br />
at each run of the process the '''globalvariables.csv''' file is created locally to the process (i.e. it can be read as ./globalvariables.csv), which contains the following global variables that are meant to allow the process to properly contact the e-Infrastructure services:<br />
<br />
* '''gcube_username''' (the user who run the computation, e.g. gianpaolo.coro)<br />
<br />
* '''gcube_context''' (the VRE the process was run in, e.g. d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab)<br />
<br />
* '''gcube_token''' (the token of the user for the VRE, e.g. 1234-567-890)<br />
<br />
The format of the CSV file is like the one of the following example:<br />
<br />
<source lang='vim'><br />
globalvariable,globalvalue<br />
gcube_username,gianpaolo.coro<br />
gcube_context,/d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab<br />
gcube_token,1234-567-890<br />
</source><br />
<br />
== Example Code ==<br />
:R code in sample:<br />
<br />
<source lang='javascript'><br />
#<br />
# author Giancarlo Panichi<br />
#<br />
test<-"checkinput"<br />
write.csv(test,file="program.txt")<br />
</source><br />
<br />
==Example Download==<br />
[[File:RBlackBox.zip|RBlackBox.zip]]<br />
<br />
<!--<br />
==References==<br />
{{Reflist}} --><br />
<br />
<br />
<!--<br />
[[Template:Statistical Algorithms Importer]] <br />
--><br />
<br />
[[Category:Statistical Algorithms Importer]]</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Statistical_Algorithms_Importer:_R_Project&diff=30458Statistical Algorithms Importer: R Project2017-12-12T14:31:05Z<p>Gianpaolo.coro: </p>
<hr />
<div>{| align="right"<br />
||__TOC__<br />
|}<br />
<br />
:This page explains how to create a R project using Statistical Algorithms Importer(SAI) portlet. <br />
<br />
[[Image:StatisticalAlgorithmsImporter_RBase1.png|thumb|center|250px|R Project, SAI]]<br />
<br />
==Project Folder==<br />
:After select an empty folder on the e-Infrastructure Workspace, the system creates an empty project in that folder.<br />
<br />
[[Image:StatisticalAlgorithmsImporter_CreateProject.png|thumb|center|800px|Create Project, SAI]]<br />
<br />
==Import Resources==<br />
:Any resource needed to run the script can be imported in the Project Folder. Resources cab be added either via the Workspace or using the Add Resource button in main menu, or dragging and dropping files in the folder window.<br />
[[Image:StatisticalAlgorithmsImporter_AddResource.png|thumb|center|800px|Add Resource, SAI]]<br />
<br />
:Thus, if the resource is on the user's local file system, (s)he can use the Drag and Drop facility, working also with multiple files.<br />
<br />
[[Image:StatisticalAlgorithmsImporter_ProjectExplorerDND.png|thumb|center|800px|Adding resources with Drag and Drop, SAI]]<br />
<br />
==Import Resources From GitHub==<br />
:If you have a project on GitHub, you can import it into SAI. After creating a new project, just click the menu button on GitHub.<br />
<br />
[[Image:StatisticalAlgorithmsImporter_GitHubMenu.png|thumb|center|800px|GitHub on Menu, SAI]]<br />
<br />
:You may access the GitHub Connector wizard. Please, read here to see how to use it: [[GitHub Connector|GitHub Connector]]<br />
<br />
==Set Main Code==<br />
:After adding the scripts and resources, one of the script files should be indicated as Main code. The e-Infrastructure will run this code, which is supposed to import and orchestrate the other scripts. Indicating a script as Main code can be done by clicking the Set Main button in Project Explorer. The file will be loaded in the Editor. In this phase the system also reads possible annotations inside the script (e.g. WPS4R annotations). At this point, the user can change the code and save it using the Save button on the Editor panel. Alternatively, the user can also use Copy and Paste by writing the code directly in the editor and then save it, still using the Save button in Editor menu (A file name will be requested).<br />
<br />
[[Image:StatisticalAlgorithmsImporter_MainCodeFull.png|thumb|center|800px|Set Main Code facility, SAI]]<br />
<br />
==Input==<br />
:In this area the system collects all the information required by the system to create software for the e-Infrastructure and communicate with the e-Infrastructure team. Metadata, input/output information, global parameters and required packages are collected here.<br />
<br />
===Global Variables===<br />
:In this panel you can add any Global Variable that are used by the script as pre-requisite.<br />
[[Image:StatisticalAlgorithmsImporter_AbsenceSpecies_GlobalV.png|thumb|center|800px|Global Variables indication, SAI]]<br />
<br />
===Input/Output===<br />
:In this area, selected input and output from the script is collected. In order to add a new I/O, the user should select a row in the code (from the the Editor) and than click the +Input (or +Output) button in the Menu Editor. <br />
A new row is added to the Input/Output list. The system parses the code behind the scenes and guesses the best type, description and name of the parameter. Once a row has been created in the Input/Output window, the user can change information by clicking on the row. At least one input is required for compiling the project. '''The name of the input variable and the default value should not be changed unless a parsing error occurred'''. The reason is that the infrastructure will discover the variables inside the script by using the name and the default value.<br />
<br />
'''Note: as a general rule, always set a default value for a variable, otherwise the execution of the algorithm may be compromised. Thus, do not use empty strings as default values.'''<br />
<br />
[[Image:StatisticalAlgorithmsImporter_AbsenceSpecies_InputOutput.png|thumb|center|800px|Input/Output window, SAI]]<br />
<br />
<br />
===Advanced Input===<br />
It is possible to indicate spatial inputs or time/date inputs. The details for the definition of these dare are reported in the [[Advanced Input| Advanced Input ]]<br />
<br />
===Interprer Info===<br />
:You can add Version and Packages information in the Interpreter Info panel. The version number is mandatory for the project. Here, for example, a user should specify the version of the R interpreter and the packages needed to run the script. These will be installed on the e-Infrastructure machines during the first deployment session.<br />
<br />
[[Image:StatisticalAlgorithmsImporter_AbsenceSpecies_InterpreterInfo.png|thumb|center|800px|Interpreter Info, SAI]]<br />
<br />
===Project Info===<br />
:A name and a description of the project are mandatory. These will be displayed to the user of the e-Infrastructure and should also contain proper citation of the algorithm. Special characters are not allowed for the algorithm name. The user can include the category of the algorithm.<br />
[[Image:StatisticalAlgorithmsImporter_AbsenceSpecies_ProjectInfo.png|thumb|center|800px|Project Info, SAI]]<br />
<br />
==Save Project==<br />
:You can save project by click on Save button in main menu. A file called stat_algo.project is add to Project Folder.<br />
[[Image:StatisticalAlgorithmsImporter_AbsenceSpecies_SaveProject.png|thumb|center|800px|Save Project, SAI]]<br />
<br />
==Inheritance of Global and Infrastructure Variables==<br />
<br />
the following global variables are inherited by all the R scripts running in the e-Infrastructure. They are meant to allow the scripts to contact the infrastructure services:<br />
<br />
* '''gcube_username''' (the user who run the computation, e.g. gianpaolo.coro)<br />
<br />
* '''gcube_context''' (the VRE the process was run in, e.g. d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab)<br />
<br />
* '''gcube_token''' (the token of the user for the VRE, e.g. 1234-567-890)<br />
<br />
<br />
==Using WPS4R Annotations==<br />
:SAI automatically parses R code containing [https://wiki.52north.org/bin/view/Geostatistics/WPS4R WPS4R annotations], the system automatically transforms annotations into Input/Output panel and Project Info panel information. The name of algorithm is mandatory in the annotations. We report a full example of annotated algorithm and attach the complete algorithm in a zip package:<br />
<br />
<source lang='javascript' style="display:block;font-family:monospace;white-space:pre;margin:1em 0;"><br />
############################################################################################################################<br />
############# Absence Generation Script - Gianpaolo Coro and Chiara Magliozzi, CNR 2015, Last version 06-07-2015 ###########<br />
############################################################################################################################<br />
#Modified 25-05-2017<br />
<br />
#52North WPS annotations<br />
# wps.des: id = Absence_generation_from_OBIS, title = Absence_generation_from_OBIS, abstract = A script to estimate absence records from OBIS;<br />
<br />
####REST API VERSION#####<br />
rm(list=ls(all=TRUE))<br />
graphics.off() <br />
<br />
## charging the libraries<br />
library(DBI)<br />
library(RPostgreSQL)<br />
library(raster)<br />
library(maptools)<br />
library("sqldf")<br />
library(RJSONIO)<br />
library(httr)<br />
library(data.table)<br />
<br />
# time<br />
t0<-Sys.time()<br />
<br />
## parameters <br />
# wps.in: id = list, type = text/plain, title = list of species beginning with the speciesname header,value="species.txt";<br />
list= "species.txt"<br />
specieslist<-read.table(list,header=T,sep=",") # my short dataset 2 species<br />
#attach(specieslist)<br />
# wps.in: id = res, type = double, title = resolution of the analysis,value=1;<br />
res=1;<br />
extent_x=180<br />
extent_y=90<br />
n=extent_y*2/res;<br />
m=extent_x*2/res;<br />
# wps.in: id = occ_percentage, type = double, title = percentage of observations occurrence of a viable survey,value=0.1;<br />
occ_percentage=0.05 #between 0 and 1<br />
<br />
#uncomment for time filtering<br />
<br />
#No time filter<br />
TimeStart<-"";<br />
TimeEnd<-"";<br />
<br />
TimeStart<-gsub("(^ +)|( +$)", "",TimeStart)<br />
TimeEnd<-gsub("(^ +)|( +$)", "", TimeEnd)<br />
<br />
#AUX function<br />
pos_id<-function(latitude,longitude){<br />
#latitude<-round(latitude, digits = 3)<br />
#longitude<-round(longitude, digits = 3)<br />
latitude<-latitude<br />
longitude<-longitude<br />
code<-paste(latitude,";",longitude,sep="")<br />
return(code)<br />
}<br />
<br />
## opening the connection with postgres<br />
cat("REST API VERSION\n")<br />
cat("PROCESS VERSION 6 \n")<br />
cat("Opening the connection with the catalog\n")<br />
#drv <- dbDriver("PostgreSQL")<br />
#con <- dbConnect(drv, dbname="obis", host="obisdb-stage.vliz.be", port="5432", user="obisreader", password="0815r3@d3r")<br />
<br />
cat("Analyzing the list of species\n")<br />
counter=0;<br />
overall=length(specieslist$scientificname)<br />
<br />
cat("Extraction from the different contributors the total number of obs per resource id...\n")<br />
<br />
timefilter<-""<br />
if (nchar(TimeStart)>0 && nchar(TimeEnd)>0)<br />
timefilter<-paste(" where datecollected>'",TimeStart,"' and datecollected<'",TimeEnd,"'",sep="");<br />
<br />
queryCache <- paste("select drs.resource_id, count(distinct position_id) as allcount from obis.drs", timefilter, " group by drs.resource_id",sep="")<br />
cat("Resources extraction query:",queryCache,"\n")<br />
<br />
allresfile="allresources.dat"<br />
if (file.exists(allresfile)){<br />
load(allresfile)<br />
} else{<br />
#allresources1<-dbGetQuery(con,queryCache)<br />
######QUERY 0 - REST CALL<br />
cat("Q0:querying for resources\n")<br />
<br />
getJsonQ0<-function(limit,offset){<br />
cat("Q0: offset",offset,"limit",limit,"\n")<br />
resources_query<-paste("http://api.iobis.org/resource?limit=",limit,"&offset=",offset,sep="")<br />
<br />
json_file <- fromJSON(resources_query)<br />
<br />
#res_count<-json_file$count<br />
res_count<-length(json_file$results)<br />
res_count_json<<-json_file$count<br />
cat("Q0:json count vs count",res_count_json,"vs",res_count,"\n",sep=" ")<br />
<br />
allresources1 <- data.frame(resource_id=integer(),allcount=integer())<br />
<br />
for (i in 1:res_count){<br />
#cat(i,"\n")<br />
if (is.null(json_file$results[[i]]$record_cnt))<br />
json_file$results[[i]]$record_cnt=0<br />
row<-data.frame(resource_id = json_file$results[[i]]$id, allcount = json_file$results[[i]]$record_cnt)<br />
allresources1 <- rbind(allresources1, row)<br />
}<br />
rm(json_file)<br />
return(allresources1)<br />
}<br />
objects = 1000<br />
allresources1<-getJsonQ0(objects,0)<br />
ceil<-ceiling(res_count_json/objects)<br />
if (ceil>1){<br />
for (i in 2:ceil){<br />
cat(">call n.",i,"\n")<br />
allresources1.1<-getJsonQ0(objects,objects*(i-1))<br />
allresources1<-rbind(allresources1,allresources1.1)<br />
}<br />
}<br />
######END REST CALL<br />
save(allresources1,file=allresfile)<br />
}<br />
<br />
<br />
cat("All resources saved\n")<br />
<br />
files<-vector()<br />
f<-0<br />
if (!file.exists("./data"))<br />
dir.create("./data")<br />
<br />
cat("About to analyse species\n")<br />
<br />
for (sp in specieslist$scientificname){<br />
f<-f+1<br />
t1<-Sys.time()<br />
graphics.off()<br />
grid=matrix(data=0,nrow=n,ncol=m)<br />
gridInfo=matrix(data="",nrow=n,ncol=m)<br />
outputfileAbs=paste("data/Absences_",sp,"_",res,"deg.csv",sep="");<br />
outputimage=paste("data/Absences_",sp,"_",res,"deg.png",sep="");<br />
<br />
counter=counter+1;<br />
cat("analyzing species",sp,"\n")<br />
cat("***Species status",counter,"of",overall,"\n")<br />
<br />
## first query: select the species<br />
cat("Extraction the species id from the OBIS database...\n")<br />
query1<-paste("select id from obis.tnames where tname='",sp,"'", sep="")<br />
#obis_id<- dbGetQuery(con,query1)<br />
<br />
######QUERY 1 - REST CALL<br />
cat("Q1:querying for the species",sp," \n")<br />
query1<-paste("http://api.iobis.org/taxa?scientificname=",URLencode(sp),sep="")<br />
cat("Q1:query: ",query1," \n")<br />
result_from_httr1<-GET(query1, timeout(1*3600))<br />
json_obis_taxa_id <- fromJSON(content(result_from_httr1, as="text"))<br />
<br />
#json_obis_taxa_id <- fromJSON(query1)<br />
cat("Q1:query done\n")<br />
res_count_json<-json_obis_taxa_id$count<br />
res_count<-length(json_obis_taxa_id$results)<br />
cat("Q1:json count vs count",res_count_json,"vs",res_count,"\n",sep=" ")<br />
obis_id<-json_obis_taxa_id$results[[1]]$id<br />
obis_id<-data.frame(id=obis_id)<br />
######END REST CALL<br />
<br />
cat("The ID extracted is ", obis_id$id, "for the species", sp, "\n", sep=" ")<br />
if (nrow(obis_id)==0) {<br />
cat("WARNING: there is no reference code for", sp,"\n")<br />
next;<br />
}<br />
<br />
## second query: select the contributors<br />
cat("Selection of the contributors in the database having recorded the species...\n")<br />
query2<- paste("select distinct resource_id from obis.drs where valid_id='",obis_id$id,"'", sep="")<br />
#posresource<-dbGetQuery(con,query2)<br />
<br />
######QUERY 2 - REST CALL<br />
cat("Q2:querying for obisid ",obis_id$id," \n")<br />
<br />
<br />
downlq<-paste("http://api.iobis.org/occurrence/download?obisid=",obis_id$id,"&sync=true",sep="")<br />
cat("Q2:query",downlq," \n")<br />
<br />
filezip<-paste("sp_",obis_id$id,".zip",sep="")<br />
dirzip<-paste("./sp_",obis_id$id,sep="")<br />
download.file(downlq, filezip, method="wget", quiet = F, mode = "w",cacheOK = FALSE)<br />
cat("Q2:dirzip",dirzip," \n")<br />
<br />
if (!file.exists(dirzip))<br />
dir.create(dirzip)<br />
cat("Q2:unzipping",dirzip," \n")<br />
unzip(filezip,exdir=dirzip)<br />
<br />
csvfile<-dir(dirzip)<br />
csvfile<-paste(dirzip,"/",csvfile[1],sep="") <br />
cat("Q2:reading csv file",csvfile," \n")<br />
occurrences<-read.csv(csvfile)<br />
posresource<-sqldf("select resource_id from occurrences",drv="SQLite")<br />
tgtresources1<-sqldf("select resource_id, latitude || ';' || longitude as tgtcount from occurrences",drv="SQLite")<br />
posresource<-sqldf("select distinct * from posresource",drv="SQLite")<br />
rm(occurrences)<br />
######END REST CALL<br />
<br />
if (nrow(posresource)==0) {<br />
cat("WARNING: there are no resources for", sp,"\n")<br />
next;<br />
}<br />
<br />
<br />
## third query: select from the contributors different observations<br />
merge(allresources1, posresource, by="resource_id")-> res_ids<br />
<br />
## forth query: how many obs are contained in each contributors for the species<br />
cat("Extraction from the different contributors the number of obs for the species...\n")<br />
query4 <- paste("select drs.resource_id, count(distinct position_id) as tgtcount from obis.drs where valid_id='",obis_id$id,"'group by drs.resource_id ",sep="")<br />
#tgtresources1<-dbGetQuery(con,query4)<br />
<br />
######QUERY 4 - REST CALL<br />
cat("Q4:extracting obs from contributors ",obis_id$id," \n")<br />
getJsonQ4<-function(limit, offset){<br />
cat("Q4: offset",offset,"limit",limit,"\n")<br />
query4<-paste("http://api.iobis.org/occurrence?obisid=",obis_id$id,"&limit=",limit,"&offset=",offset,sep="")<br />
result_from_httr<-GET(query4, timeout(1*3600))<br />
jsonDoc <- fromJSON(content(result_from_httr, as="text"))<br />
res_count_json<<-jsonDoc$count<br />
res_count<-length(jsonDoc$results)<br />
cat("Q4:json count vs count",res_count_json,"vs",res_count,"\n",sep=" ")<br />
<br />
tgtresources1 <- data.frame(resource_id=integer(),tgtcount=character())<br />
res_count<-length(jsonDoc$results)<br />
for (i in 1:res_count){<br />
positionID<-pos_id(jsonDoc$results[[i]]$decimalLatitude,jsonDoc$results[[i]]$decimalLongitude)<br />
row<-data.frame(resource_id = jsonDoc$results[[i]]$resourceID , tgtcount=positionID)<br />
tgtresources1 <- rbind(tgtresources1, row)<br />
}<br />
#tgtresources1<-sqldf("select resource_id, count(distinct tgtcount) as tgtcount from tgtresources1 group by resource_id",drv="SQLite")<br />
<br />
return(tgtresources1)<br />
}<br />
<br />
#objects = 1500<br />
#tgtresources1<-getJsonQ4(objects,0)<br />
#ceil<-ceiling(res_count_json/objects)<br />
#if (ceil>1){<br />
#for (i in 2:ceil){<br />
# cat(">call n.",i,"\n")<br />
#tgtresources1.1<-getJsonQ4(objects,objects*(i-1))<br />
#tgtresources1<-rbind(tgtresources1,tgtresources1.1)<br />
#}<br />
#}<br />
<br />
tgtresources1<-sqldf("select resource_id, count(distinct tgtcount) as tgtcount from tgtresources1 group by resource_id",drv="SQLite")<br />
<br />
######END REST CALL<br />
<br />
<br />
merge(tgtresources1, posresource, by="resource_id")-> tgtresourcesSpecies <br />
<br />
## fifth query: select contributors that has al least 0.1 observation of the species<br />
#### we have the table all together: contributors, obs in each contributors for at leat one species and obs of the species in each contributors<br />
cat("Extracting the contributors containing more than 10% of observations for the species\n")<br />
cat("Selected occurrence percentage: ",occ_percentage,"\n")<br />
<br />
tmp <- merge(res_ids, tgtresourcesSpecies, by= "resource_id",all.x=T)<br />
tmp["species_10"] <- NA <br />
as.numeric(tmp$tgtcount) / tmp$allcount -> tmp$species_10<br />
<br />
<br />
<br />
viable_res_ids <- subset(tmp,species_10 >= occ_percentage, select=c("resource_id","allcount","tgtcount", "species_10")) <br />
#cat(viable_res_ids)<br />
<br />
if (nrow(viable_res_ids)==0) {<br />
cat("WARNING: there are no viable points for", sp,"\n")<br />
next;<br />
}<br />
<br />
numericselres<-paste("'",paste(as.character(as.numeric(t(viable_res_ids["resource_id"]))),collapse="','"),"'",sep="")<br />
selresnumbers<-as.numeric(t(viable_res_ids["resource_id"]))<br />
<br />
## sixth query: select all the cell at 0.1 degrees resolution in the main contributors<br />
cat("Select the cells at 0.1 degrees resolution for the main contributors\n")<br />
query6 <- paste("select position_id, positions.latitude, positions.longitude, count(*) as allcount ", <br />
"from obis.drs ", <br />
"inner join obis.tnames on drs.valid_id=tnames.id ",<br />
"inner join obis.positions on position_id=positions.id ",<br />
"where resource_id in (", numericselres,") ",<br />
"group by position_id, positions.latitude, positions.longitude, resource_id")<br />
#all_cells <- dbGetQuery(con,query6)<br />
<br />
<br />
######QUERY 6 - REST CALL<br />
cat("Q6:extracting 0.1 cells from contributors \n")<br />
<br />
downlq<-paste("http://api.iobis.org/occurrence/download?resourceid=",gsub("'", "", numericselres),"&sync=true",sep="")<br />
cat("Q6:query",downlq," \n")<br />
filezip<-paste("rsp_",obis_id$id,".zip",sep="")<br />
dirzip<-paste("./rsp_",obis_id$id,sep="")<br />
download.file(downlq, filezip, method="wget", quiet = F, mode = "w",cacheOK = FALSE)<br />
cat("Q6:dirzip",dirzip," \n")<br />
<br />
if (!file.exists(dirzip))<br />
dir.create(dirzip)<br />
cat("Q6:unzipping",dirzip," \n")<br />
unzip(filezip,exdir=dirzip)<br />
<br />
csvfile<-dir(dirzip)<br />
csvfile<-paste(dirzip,"/",csvfile[1],sep="") <br />
cat("Q6:reading csv file",csvfile," \n")<br />
occurrences<-read.csv(csvfile)<br />
<br />
all_cells_table<-sqldf("select resource_id, latitude || ';' || longitude as position, latitude ,longitude from occurrences",drv="SQLite")<br />
rm(occurrences)<br />
getJsonQ6<-function(limit,offset,selres){<br />
cat("Q6: offset",offset,"limit",limit,"\n")<br />
cat("Q6: resource",selres,"\n")<br />
#query6<-paste("http://api.iobis.org/occurrence?resourceid=",gsub("'", "", numericselres),"&limit=",limit,"&offset=",offset,sep="")<br />
if (offset>0)<br />
query6<-paste("http://api.iobis.org/occurrence?resourceid=",gsub("'", "", selres),"&limit=",limit,"&skipid=",offset,sep="")<br />
else<br />
query6<-paste("http://api.iobis.org/occurrence?resourceid=",gsub("'", "", selres),"&limit=",limit,sep="")<br />
<br />
cat("Q6:",query6," \n")<br />
<br />
<br />
jsonDoc = tryCatch({<br />
result_from_httr<-GET(query6, timeout(1*3600))<br />
cat("Q6: got answer\n")<br />
jsonDoc <- fromJSON(content(result_from_httr, as="text"))<br />
}, warning = function(w) {<br />
cat("Warning: ",w,"\n")<br />
}, error = function(e) {<br />
cat("Error: Too small value for resolution for this species - the solution spaceis too large!\n")<br />
}, finally = {<br />
jsonDoc=NA<br />
})<br />
<br />
<br />
<br />
res_count_json<<-jsonDoc$count<br />
res_count<-length(jsonDoc$results)<br />
cat("Q6:json count vs count",res_count_json,"vs",res_count,"\n",sep=" ")<br />
<br />
all_cells2 <- data.frame(resource_id=integer(),position_id=character(),latitude=integer(),longitude=integer())<br />
for (i in 1:res_count){<br />
positionID<-pos_id(jsonDoc$results[[i]]$decimalLatitude,jsonDoc$results[[i]]$decimalLongitude)<br />
row<-data.frame(resource_id = jsonDoc$results[[i]]$resourceID, position_id = positionID, latitude=jsonDoc$results[[i]]$decimalLatitude, longitude=jsonDoc$results[[i]]$decimalLongitude)<br />
all_cells2 <- rbind(all_cells2, row)<br />
}<br />
lastid<<-jsonDoc$results[[res_count]]$id<br />
return(all_cells2)<br />
}<br />
<br />
cat("All resources:",numericselres,"\n")<br />
<br />
all_cells<-sqldf("select position as position_id, latitude, longitude, count(*) as allcount from all_cells_table group by position, latitude, longitude, resource_id",drv="SQLite")<br />
<br />
######END REST CALL<br />
<br />
<br />
<br />
## seventh query: select all the cell at 0.1 degrees resolution in the main contributors for the selected species<br />
cat("Select the cells at 0.1 degrees resolution for the species in the main contributors\n")<br />
query7 <- paste("select position_id, positions.latitude, positions.longitude, count(*) as tgtcount ",<br />
"from obis.drs",<br />
"inner join obis.tnames on drs.valid_id=tnames.id ", <br />
"inner join obis.positions on position_id=positions.id ", <br />
"where resource_id in (", numericselres,") ",<br />
"and drs.valid_id='",obis_id$id,"'", <br />
"group by position_id, positions.latitude, positions.longitude")<br />
#presence_cells<-dbGetQuery(con,query7)<br />
<br />
######QUERY 7 - REST CALL<br />
cat("Q7:extracting 0.1 cells for the species ",obis_id$id,"\n")<br />
<br />
downlq<-paste("http://api.iobis.org/occurrence/download?resourceid=",gsub("'", "", numericselres),"&obisid=",obis_id$id,"&sync=true",sep="")<br />
cat("Q7:query",downlq," \n")<br />
filezip<-paste("rspsp_",obis_id$id,".zip",sep="")<br />
dirzip<-paste("./rspsp_",obis_id$id,sep="")<br />
download.file(downlq, filezip, method="wget", quiet = F, mode = "w",cacheOK = FALSE)<br />
cat("Q7:dirzip",dirzip," \n")<br />
<br />
if (!file.exists(dirzip))<br />
dir.create(dirzip)<br />
cat("Q7:unzipping",dirzip," \n")<br />
unzip(filezip,exdir=dirzip)<br />
<br />
csvfile<-dir(dirzip)<br />
csvfile<-paste(dirzip,"/",csvfile[1],sep="") <br />
cat("Q7:reading csv file",csvfile," \n")<br />
occurrences<-read.csv(csvfile)<br />
<br />
presence_cells2<-sqldf("select resource_id, latitude ,longitude, latitude || ';' || longitude as position from occurrences",drv="SQLite")<br />
rm(occurrences)<br />
getJsonQ7<-function(limit,offset){<br />
cat("Q7: offset",offset,"limit",limit,"\n")<br />
if (offset>0)<br />
query7<-paste("http://api.iobis.org/occurrence?resourceid=",gsub("'", "", numericselres),"&obisid=",obis_id$id,"&limit=",limit,sep="")<br />
else query7<-paste("http://api.iobis.org/occurrence?resourceid=",gsub("'", "", numericselres),"&obisid=",obis_id$id,"&limit=",limit,"&skipid=",offset,sep="")<br />
<br />
result_from_httr<-GET(query7, timeout(1*3600))<br />
jsonDoc <- fromJSON(content(result_from_httr, as="text"))<br />
res_count_json<<-jsonDoc$count<br />
res_count<-length(jsonDoc$results)<br />
cat("Q7:json count vs count",res_count_json,"vs",res_count,"\n",sep=" ")<br />
<br />
presence_cells2 <- data.frame(resource_id=integer(),position_id=character(),latitude=integer(),longitude=integer())<br />
for (i in 1:res_count){<br />
<br />
positionID<-pos_id(jsonDoc$results[[i]]$decimalLatitude,jsonDoc$results[[i]]$decimalLongitude)<br />
row<-data.frame(resource_id = jsonDoc$results[[i]]$resourceID, position_id = positionID, latitude=jsonDoc$results[[i]]$decimalLatitude, longitude=jsonDoc$results[[i]]$decimalLongitude)<br />
presence_cells2 <- rbind(presence_cells2, row)<br />
}<br />
<br />
lastid<<-jsonDoc$results[[res_count]]$id<br />
<br />
return(presence_cells2)<br />
}<br />
<br />
<br />
presence_cells<-sqldf("select position as position_id, latitude, longitude, count(*) as tgtcount from presence_cells2 group by position_id, latitude, longitude, resource_id",drv="SQLite")<br />
<br />
######END REST CALL<br />
<br />
## last query: for every cell in the sixth query if there is a correspondent in the seventh query I can put 1 otherwise 0<br />
#data.df<-merge(all_cells, presence_cells, by= "position_id",all.x=T)<br />
#data.df$longitude.y<-NULL <br />
#data.df$latitude.y<-NULL<br />
#data.df[is.na(data.df)] <- 0 <br />
<br />
######### Table resulting from the analysis<br />
#pres_abs_cells <- subset(data.df,select=c("latitude.x","longitude.x", "tgtcount","position_id"))<br />
#positions<-paste("'",paste(as.character(as.numeric(t(pres_abs_cells["position_id"]))),collapse="','"),"'",sep="")<br />
positions<-""<br />
query8<-paste("select position_id, resfullname,digirname,abstract,temporalscope,date_last_harvested",<br />
"from ((select distinct position_id,resource_id from obis.drs where position_id IN (", positions,<br />
") order by position_id ) as a",<br />
"inner join (select id,resfullname,digirname,abstract,temporalscope,date_last_harvested from obis.resources where id in (",<br />
numericselres,")) as b on b.id = a.resource_id) as d")<br />
<br />
#resnames<-dbGetQuery(con,query8)<br />
<br />
######QUERY 8 - REST CALL<br />
cat("Q8:extracting contributors details\n")<br />
data.df2<-merge(all_cells, presence_cells, by= "position_id",all.x=T)<br />
data.df2$longitude.y<-NULL <br />
data.df2$latitude.y<-NULL<br />
data.df2[is.na(data.df2)] <- 0 <br />
rm (all_cells)<br />
pres_abs_cells2 <- subset(data.df2,select=c("latitude.x","longitude.x", "tgtcount","position_id"))<br />
positions2<-paste("'",paste(as.character(as.character(t(pres_abs_cells2["position_id"]))),collapse="','"),"'",sep="")<br />
<br />
refofpositions<-sqldf(paste("select distinct resource_id from all_cells_table where position in (",positions2,")"),drv="SQLite")<br />
referencesn<-nrow(refofpositions)<br />
resnames_res2 <- data.frame(resource_id=integer(),resfullname=character(),digirname=character(),abstract=character(),temporalscope=character(),date_last_harvested=character())<br />
for (i in 1: referencesn){<br />
query8<-paste("http://api.iobis.org/resource/",refofpositions[i,1],sep="")<br />
result_from_httr<-GET(query8, timeout(1*3600))<br />
jsonDoc <- fromJSON(content(result_from_httr, as="text"))<br />
<br />
daterecord<-as.POSIXct(jsonDoc$date_last_harvested/1000, origin="1970-01-01")#origin="1970-01-01")<br />
if (length(daterecord)==0)<br />
daterecord=""<br />
abstractst<-jsonDoc$abstract_str<br />
<br />
if (length(jsonDoc$abstract_str)==0)<br />
jsonDoc$abstract_str=""<br />
<br />
if (length(jsonDoc$id)==0)<br />
jsonDoc$id=""<br />
<br />
if (length(jsonDoc$fullname)==0)<br />
jsonDoc$fullname=""<br />
<br />
if (length(jsonDoc$temporalscope)==0)<br />
jsonDoc$temporalscope=""<br />
<br />
<br />
row<-data.frame(resource_id = jsonDoc$id, resfullname=jsonDoc$fullname, digirname=jsonDoc$digirname, abstract=jsonDoc$abstract_str,temporalscope=jsonDoc$temporalscope,date_last_harvested=daterecord)<br />
<br />
resnames_res2 <- rbind(resnames_res2, row) <br />
}<br />
<br />
resnames2<-sqldf(paste("select distinct position as position_id, resfullname, digirname, abstract, temporalscope, date_last_harvested from (select * from all_cells_table where position in (",positions2,")) as a inner join resnames_res2 as b on a.resource_id=b.resource_id"),drv="SQLite")<br />
resnames<-sqldf("select * from resnames2 order by position_id",drv="SQLite")<br />
pres_abs_cells<-sqldf("select * from pres_abs_cells2 order by position_id",drv="SQLite")<br />
rm(all_cells_table)<br />
######END REST CALL<br />
<br />
#sorting data df<br />
# pres_abs_cells<-pres_abs_cells[with(pres_abs_cells, order(position_id)), ]<br />
nrows = nrow(pres_abs_cells)<br />
######## FIRST Loop inside the rows of the dataset<br />
cat("Looping on the data\n")<br />
for(i in 1: nrows) {<br />
lat<-pres_abs_cells[i,1]<br />
long<-pres_abs_cells[i,2]<br />
value<-pres_abs_cells[i,3]<br />
resource_name<-paste("\"",paste(as.character(t(resnames[i,])),collapse="\",\""),"\"",sep="")#resnames[i,2]<br />
k=round((lat+90)*n/180)<br />
g=round((long+180)*m/360)<br />
if (k==0) k=1;<br />
if (g==0) g=1;<br />
if (k>n || g>m)<br />
next;<br />
if (value>=1){<br />
if (grid[k,g]==0){<br />
grid[k,g]=1<br />
gridInfo[k,g]=resource_name<br />
}<br />
else if (grid[k,g]==-1){<br />
grid[k,g]=-2<br />
gridInfo[k,g]=resource_name<br />
}<br />
}<br />
else if (value==0){<br />
if (grid[k,g]==0){<br />
grid[k,g]=-1<br />
#cat("resource abs",resource_name,"\n")<br />
gridInfo[k,g]=resource_name<br />
}<br />
else if (grid[k,g]==1){<br />
grid[k,g]=-2<br />
gridInfo[k,g]=resource_name<br />
}<br />
<br />
}<br />
}<br />
cat("End looping\n")<br />
<br />
cat("Generating image\n")<br />
absence_cells<-which(grid==-1,arr.ind=TRUE)<br />
presence_cells_idx<-which(grid==1,arr.ind=TRUE)<br />
latAbs<-((absence_cells[,1]*180)/n)-90<br />
longAbs<-((absence_cells[,2]*360)/m)-180<br />
latPres<-((presence_cells_idx[,1]*180)/n)-90<br />
longPres<-((presence_cells_idx[,2]*360)/m)-180<br />
resource_abs<-gridInfo[absence_cells]<br />
rm(gridInfo)<br />
rm(grid)<br />
absPoints <- cbind(longAbs, latAbs)<br />
absPointsData <- cbind(longAbs, latAbs,resource_abs)<br />
<br />
if (length(absPoints)==0)<br />
{<br />
cat("WARNING no viable point found for ",sp," after processing!\n")<br />
next;<br />
}<br />
data(wrld_simpl)<br />
projection(wrld_simpl) <- CRS("+proj=longlat")<br />
png(filename=outputimage, width=1200, height=600)<br />
plot(wrld_simpl, xlim=c(-180, 180), ylim=c(-90, 90), axes=TRUE, col="black")<br />
box()<br />
pts <- SpatialPoints(absPoints,proj4string=CRS(proj4string(wrld_simpl)))<br />
<br />
## Find which points do not fall over land<br />
cat("Retreiving the poing that do not fall on land\n")<br />
pts<-pts[which(is.na(over(pts, wrld_simpl)$FIPS))]<br />
points(pts, col="green", pch=1, cex=0.50)<br />
datapts<-as.data.frame(pts)<br />
colnames(datapts) <- c("longAbs","latAbs")<br />
<br />
abspointstable<-merge(datapts, absPointsData, by.x= c("longAbs","latAbs"), by.y=c("longAbs","latAbs"),all.x=F)<br />
<br />
<br />
header<-"longitude,latitude,resource_id,resource_name,resource_identifier,resource_abstract,resource_temporalscope,resource_last_harvested_date"<br />
write.table(header,file=outputfileAbs,append=F,row.names=F,quote=F,col.names=F)<br />
<br />
write.table(abspointstable,file=outputfileAbs,append=T,row.names=F,quote=F,col.names=F,sep=",")<br />
files[f]<-outputfileAbs<br />
cat("Elapsed: created imaged in ",Sys.time()-t1," sec \n")<br />
graphics.off()<br />
}<br />
<br />
# wps.out: id = zipOutput, type = text/zip, title = zip file containing absence records and images;<br />
zipOutput<-"absences.zip"<br />
zip(zipOutput, files=c("./data"), flags= "-r9X", extras = "",zip = Sys.getenv("R_ZIPCMD", "zip"))<br />
<br />
cat("Closing database connection")<br />
cat("Elapsed: overall process finished in ",Sys.time()-t0," min \n")<br />
#dbDisconnect(con)<br />
graphics.off()<br />
<br />
</source><br />
[[File:AbsencesSpeciesList_prod_annotated.zip|AbsencesSpeciesList_prod_annotated.zip]]<br />
<br />
<br />
:The following screenshot report the result of importing this script into SAI:<br />
<br />
[[Image:StatisticalAlgorithmsImporter_AbsenceSpecies_Annotations_Info.png|thumb|center|800px|Annotations Project Info, SAI]]<br />
[[Image:StatisticalAlgorithmsImporter_AbsenceSpecies_Annotations_InputOutput.png|thumb|center|800px|Annotations Input/Output, SAI]]<br />
<br />
<br />
<!--<br />
==References==<br />
{{Reflist}} --><br />
<br />
<br />
<!--<br />
[[Template:Statistical Algorithms Importer]] <br />
--><br />
<br />
[[Category:Statistical Algorithms Importer]]</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Advanced_Input&diff=30457Advanced Input2017-12-11T15:09:10Z<p>Gianpaolo.coro: </p>
<hr />
<div>{| align="right"<br />
||__TOC__<br />
|}<br />
<br />
:This page explains how to use advanced input definitions when developing an algorithm. These indications are valid for algorithms developed using both the [[Statistical Algorithms Importer|Statistical Algorithms Importer (SAI)]] and [[DataMiner Manager|DataMiner]].<br />
<br />
==Spatial Data==<br />
===Defining Spatial Data=== <br />
:Spatial Data input are passed to an algorithm using Well-Known Text format (WKT), i.e. a text markup language for representing vector geometry objects on a map, spatial reference systems of spatial objects and transformations between spatial reference systems.<ref>[https://en.wikipedia.org/wiki/Well-known_text "Well-known text"], ''[[Wikipedia]]''</ref><br />
:When an algorithm input is defined, a WKT input can be indicated by using proper annotations in the input descriptions. In particular, the annotations highlighted in red in the following table correspond to different geometries. For example a description for an input point could be "Location of an occurrence record [WKT_POINT]". <br />
<br />
'''The currently produced CRS is EPSG:4326 (Examples are still in EPSG:3857).'''<br />
<br />
{| class="wikitable"<br />
|-<br />
! Name<br />
! Description<br />
! Type<br />
! Default<br />
! I/O<br />
|-<br />
| wktPoint<br />
| wktPoint <span style="color:red">[WKT_POINT]</span><br />
| String<br />
| POINT(-12993071.816027395 7729312.300197024)<br />
| Input<br />
|-<br />
| wktLineString<br />
| wktLineString <span style="color:red">[WKT_LINESTRING]</span><br />
| String<br />
| LINESTRING(-16476154.320926309 7161843.802207875,-14617205.79303082 7514065.628545967,-12777825.14437634 6790054.096628778,-12601714.231207293 8081534.126535116,-11271098.442818945 7416226.232340941,-11329802.08054196 6261721.357121639)<br />
| Input<br />
|-<br />
| wktPolygon<br />
| wktPolygon <span style="color:red">[WKT_POLYGON]</span><br />
| String<br />
| POLYGON((-15967389.460660174 7377090.473858931,-14264983.966692729 8257645.039704162,-12895232.41982237 7416226.232340941,-13423565.159329507 5811660.134578522,-13932330.019595642 7181411.681448881,-16123932.494588215 6750918.338146768,-15967389.460660174 7377090.473858931))<br />
| Input<br />
|-<br />
| wktTriangle<br />
| wktTriangle <span style="color:red">[WKT_TRIANGLE]</span><br />
| String<br />
| POLYGON((-16495722.200167313 7259683.198412901,-15855087.594353283 8572650.25102042,-17312467.71915039 8470972.56789504,-16495722.200167313 7259683.198412901))<br />
| Input<br />
|-<br />
| wktSquare<br />
| wktSquare <span style="color:red">[WKT_SQUARE]</span><br />
| String<br />
| POLYGON((-16495722.200167313 7259683.198412901,-15855087.594353283 8572650.25102042,-17312467.71915039 8470972.56789504,-16495722.200167313 7259683.198412901))<br />
| Input<br />
|-<br />
| wktPentagon<br />
| wktPentagon <span style="color:red">[WKT_PENTAGON]</span><br />
| String<br />
| POLYGON((-14284551.845933735 6437832.270290686,-13783375.299284551 7626538.339655062,-14759029.882416394 8470515.936910307,-15863194.122720666 7803416.7083931435,-15569950.56925907 6547149.114045457,-14284551.845933735 6437832.270290686))<br />
| Input<br />
|-<br />
| wktHexagon<br />
| wktHexagon <span style="color:red">[WKT_HEXAGON]</span><br />
| String<br />
| POLYGON((-11995109.97473614 8492459.590596221,-11950593.230512831 9941648.908612307,-13183369.622696154 10704796.199011508,-14460662.759102784 10018754.171394624,-14505179.503326092 8569564.853378538,-13272403.11114277 7806417.562979337,-11995109.97473614 8492459.590596221))<br />
| Input<br />
|-<br />
| wktBox<br />
| wktBox <span style="color:red">[WKT_BOX]</span><br />
| String<br />
| POLYGON((-11995109.97473614 8492459.590596221,-11950593.230512831 9941648.908612307,-13183369.622696154 10704796.199011508,-14460662.759102784 10018754.171394624,-14505179.503326092 8569564.853378538,-13272403.11114277 7806417.562979337,-11995109.97473614 8492459.590596221))<br />
| Input<br />
|-<br />
| wktCircle<br />
| wktCircle <span style="color:red">[WKT_CIRCLE]</span><br />
| String<br />
| POLYGON((-12582146.351966292 8101102.005776119,-12437487.643413674 8354330.862998396,-12345011.010682397 8630915.512769304,-12308270.27890683 8920226.962108605,-12328677.373804664 9211147.13426592,-12405448.062163413 9492496.130175157,-12535632.089482944 9753461.865705855,-12714226.55660141 9984015.573992845,-12934368.178304384 10175297.205322662,-13187597.035526661 10319955.91387528,-13464181.68529757 10412432.546606557,-13753493.13463687 10449173.278382126,-14044413.306794185 10428766.18348429,-14325762.302703422 10351995.495125541,-14586728.03823412 10221811.467806011,-14817281.74652111 10043217.000687543,-15008563.377850927 9823075.37898457,-15153222.086403545 9569846.521762293,-15245698.719134822 9293261.871991385,-15282439.45091039 9003950.422652084,-15262032.356012555 8713030.250494769,-15185261.667653807 8431681.254585532,-15055077.640334276 8170715.519054834,-14876483.173215808 7940161.810767845,-14656341.551512836 7748880.179438027,-14403112.694290558 7604221.470885409,-14126528.04451965 7511744.838154132,-13837216.59518035 7475004.106378564,-13546296.423023034 7495411.2012764,-13264947.427113798 7572181.8896351475,-13003981.691583099 7702365.9169546785,-12773427.983296111 7880960.384073146,-12582146.351966292 8101102.005776119))<br />
| Input<br />
|}<br />
<br />
===Spatial Data on the DataMiner Interface===<br />
<br />
:By using a specific widget, DataMiner is able to build a selection interface suited for the WKT input. This process is done by reading the annotation in the input description. The widget allows to easily select the area of interest and reports the WKT selected in a text area. In the end, the WKT string is passed to the algorithm: <br />
<br />
[[Image:DataMinerManager_SpatialData.png|thumb|center|555px|Spatial Data, DataMiner]]<br />
<br />
[[Image:DataMinerManager_WKT_Point.png|thumb|center|525px|WKT Point, DataMiner]]<br />
<br />
[[Image:DataMinerManager_WKT_LineString.png|thumb|center|525px|WKT LineString, DataMiner]]<br />
<br />
[[Image:DataMinerManager_WKT_Polygon.png|thumb|center|525px|WKT Polygon, DataMiner]]<br />
<br />
[[Image:DataMinerManager_WKT_Triangle.png|thumb|center|525px|WKT Triangle, DataMiner]]<br />
<br />
[[Image:DataMinerManager_WKT_Square.png|thumb|center|525px|WKT Square, DataMiner]]<br />
<br />
[[Image:DataMinerManager_WKT_Pentagon.png|thumb|center|525px|WKT Pentagon, DataMiner]]<br />
<br />
[[Image:DataMinerManager_WKT_Hexagon.png|thumb|center|525px|WKT Hexagon, DataMiner]]<br />
<br />
[[Image:DataMinerManager_WKT_Box.png|thumb|center|525px|WKT Box, DataMiner]]<br />
<br />
[[Image:DataMinerManager_WKT_Circle.png|thumb|center|525px|WKT Circle, DataMiner]]<br />
<br />
==Temporal Data==<br />
===Temporal Data===<br />
:Temporal Data are automatically interpreted using specific tags in the description of the inputs. The tags highlighted in red in the following table should be used to indicate the kind of temporal information to associate to the input. Currently, dates and time stamps are supported. For example a description for an input date could be "Date of the occurrence record [DATE]". <br />
<br />
{| class="wikitable"<br />
|-<br />
! Name<br />
! Description<br />
! Type<br />
! Default<br />
! I/O<br />
|-<br />
| dateParameter<br />
| dateParameter <span style="color:red">[DATE]</span><br />
| String<br />
| 2016-04-01<br />
| Input<br />
|-<br />
| timeParameter<br />
| timeParameter <span style="color:red">[TIME]</span><br />
| String<br />
| 15:22:01<br />
| Input<br />
|}<br />
<br />
===Temporal Data on DataMiner=== <br />
:A widget on DataMiner is able to transform the annotations into a selection panel. In the following, two examples are reported:<br />
<br />
<br />
[[Image:DataMinerManager_TemporalData.png|thumb|center|555px|Temporal Data, DataMiner]]<br />
<br />
==Long strings and TextArea==<br />
===TextArea in SAI===<br />
:DataMiner for string parameters uses a TextField widget as default. There are cases where it is more useful to use a TextArea widget. To enable TextArea, just add the [TEXTAREA] tag to the description of the parameter in SAI.<br />
<br />
{| class="wikitable"<br />
|-<br />
! Name<br />
! Description<br />
! Type<br />
! Default<br />
! I/O<br />
|-<br />
| PInput1<br />
| Long string parameter <span style="color:red">[TEXTAREA]</span><br />
| String<br />
| long string...<br />
| Input<br />
|}<br />
<br />
===TextArea in DataMiner===<br />
<br />
:In the following, what is shown in DataMiner:<br />
[[Image:DataMinerManager_TextArea.png|thumb|center|555px|Long string and TextArea, DataMiner]]<br />
<br />
==NetCDF==<br />
===NetCDF in SAI===<br />
:NetCDF (.nc) file information view is supported. There are cases where it is useful to see the information contained in the NetCDF files. To enable the NetCDF Preview, just add the [NETCDF] tag to the description of the parameter in SAI.<br />
<br />
{| class="wikitable"<br />
|-<br />
! Name<br />
! Description<br />
! Type<br />
! Default<br />
! I/O<br />
|-<br />
| netCDFFile<br />
| NetCDF file <span style="color:red">[NETCDF]</span><br />
| File<br />
| netcdffile.nc<br />
| Input<br />
|}<br />
<br />
===NetCDF in DataMiner===<br />
<br />
:In the following, what is shown in DataMiner:<br />
[[Image:DataMinerManager_NetCDF1.png|thumb|center|700px|NetCDF file selection, DataMiner]]<br />
:After NetCDF file is selected:<br />
[[Image:DataMinerManager_NetCDF2.png|thumb|center|700px|NetCDF file selected, DataMiner]]<br />
:After info icon button is pressed: <br />
[[Image:DataMinerManager_NetCDF3.png|thumb|center|700px|NetCDF Preview, DataMiner]]<br />
<br />
==Other Inputs==<br />
<br />
'''Table Input:''' [a http link to a table ] (Example: [https://i-marine.d4science.org/group/biodiversitylab/data-miner?OperatorId=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.clusterers.DBSCAN DBSCAN])<br />
<br />
'''File Input:''' [a http link to a file] (Example: [https://i-marine.d4science.org/group/biodiversitylab/data-miner?OperatorId=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.generators.FEED_FORWARD_A_N_N_DISTRIBUTION FEED_FORWARD_A_N_N_DISTRIBUTION])<br />
<br />
'''List of Tables:''' [a sequence of http links separated by | , each indicating a table ] (Example [https://i-marine.d4science.org/group/biodiversitylab/data-miner?OperatorId=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.BIOCLIMATE_HCAF BIOCLIMATE_HCAF])<br />
<br />
'''List of columns from a table:''' [a sequence of names of columns from <NameOfTheTableParameter> separated by | ] (Example: [https://i-marine.d4science.org/group/biodiversitylab/data-miner?OperatorId=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.generators.FEED_FORWARD_A_N_N_DISTRIBUTION FEED_FORWARD_A_N_N_DISTRIBUTION])<br />
<br />
'''Column name from a table:''' [the name of a column from <NameOfTheTableParameter> ] (Example: [https://i-marine.d4science.org/group/biodiversitylab/data-miner?OperatorId=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.OCCURRENCE_ENRICHMENT OCCURRENCE_ENRICHMENT])<br />
<br />
'''List of strings:''' [a sequence of values separated by | ] (Example: [https://i-marine.d4science.org/group/biodiversitylab/data-miner?OperatorId=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.OCCURRENCE_ENRICHMENT OCCURRENCE_ENRICHMENT])<br />
<br />
==References==<br />
<!-- {{Reflist}} --><br />
<br />
[[Category:Statistical Algorithms Importer]]</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Statistical_Algorithms_Importer:_Java_Project&diff=30330Statistical Algorithms Importer: Java Project2017-11-24T14:49:06Z<p>Gianpaolo.coro: </p>
<hr />
<div>{| align="right"<br />
||__TOC__<br />
|}<br />
<br />
This page explains how to create a Java project using two alternative approaches: [https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer:_Java_Project#Black_Box_Integration '''Black-box'''] and [https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer:_Java_Project#White_Box_Integration '''White-box'''] integration. The next sections explain how these work and which cases these two approaches seaddress.<br />
<br />
=Black Box Integration=<br />
<br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox0.png|thumb|center|250px|Java Project, SAI]]<br />
<br />
This is the preferred way for developers who want their processes executions distributed based on the load of the requests. Each process request will run on one dedicated machine and is allowed to use multi-core processing. Black box processed usually do not use the e-Infrastructure resources but "live on their own". '''The Statistical Algorithms Importer (SAI) portlet must be used for this integration'''.<br />
<br />
==Project Configuration==<br />
:Define project's metadata<br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox1.png|thumb|center|800px|Java Info, SAI]]<br />
<br />
:Add input and output parameters and click on "Set Code" to indicate the main file to execute (i.e. the .jar file)<br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox2.png|thumb|center|800px|Java I/O, SAI]]<br />
<br />
:Add information about the running environment (e.g. Java version etc.) <br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox3.png|thumb|center|800px|Java Interpreter, SAI]]<br />
<br />
:After the [https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer:_Create_Software software creation phase] a Main.R file and a Taget folder are created<br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox4.png|thumb|center|800px|Java Create, SAI]]<br />
<br />
== Example Code ==<br />
:Java code in sample:<br />
<br />
<source lang='java'><br />
/**<br />
* <br />
* @author Giancarlo Panichi<br />
* <br />
*<br />
*/<br />
import java.io.File;<br />
import java.io.FileWriter;<br />
<br />
public class SimpleProducer<br />
{<br />
public static void main(String[] args)<br />
{<br />
try<br />
{<br />
FileWriter fw = new FileWriter(new File("program.txt"));<br />
fw.write("Check: " + args[0]);<br />
fw.close();<br />
}<br />
catch (Exception e)<br />
{<br />
e.printStackTrace();<br />
}<br />
}<br />
}<br />
</source><br />
<br />
==Example Download==<br />
[[File:JavaBlackBox.zip|JavaBlackBox.zip]]<br />
<br />
<!--<br />
==References==<br />
{{Reflist}} --><br />
<br />
=White Box Integration=<br />
This is the preferred way for developers who want their processes to fully exploit the e-Infrastructure resources, for example to implement Cloud computing using the e-Infrastructure computational resources. This integration modality also allows to fully reuse the Java data mining frameworks integrated by DataMiner, i.e. [https://www.knime.com/ Knime], [https://rapidminer.com/ RapidMiner], [https://www.cs.waikato.ac.nz/ml/weka/ Weka], [https://wiki.gcube-system.org/gcube/Statistical_Manager_Algorithms gCube EcologicalEngine]. '''The Eclipse IDE should be used for this integration'''.<br />
<br />
[https://gcube.wiki.gcube-system.org/gcube/How-to_Implement_Algorithms_for_DataMiner Step-by-step guide to integrate Java processes as white boxes]<br />
<!--<br />
[[Template:Statistical Algorithms Importer]] <br />
--><br />
<br />
[[Category:Statistical Algorithms Importer]]</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Statistical_Algorithms_Importer:_Java_Project&diff=30329Statistical Algorithms Importer: Java Project2017-11-24T14:48:25Z<p>Gianpaolo.coro: </p>
<hr />
<div>{| align="right"<br />
||__TOC__<br />
|}<br />
<br />
This page explains how to create a Java project using two alternative approaches: [https://wiki.gcube-system.org/index.php?title=Statistical_Algorithms_Importer:_Java_Project&action=submit#Black_Box_Integration '''Black-box'''] and [https://wiki.gcube-system.org/index.php?title=Statistical_Algorithms_Importer:_Java_Project&action=submit#White_Box_Integration '''White-box'''] integration. The next sections explain how these work and which cases these two approaches seaddress.<br />
<br />
=Black Box Integration=<br />
<br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox0.png|thumb|center|250px|Java Project, SAI]]<br />
<br />
This is the preferred way for developers who want their processes executions distributed based on the load of the requests. Each process request will run on one dedicated machine and is allowed to use multi-core processing. Black box processed usually do not use the e-Infrastructure resources but "live on their own". '''The Statistical Algorithms Importer (SAI) portlet must be used for this integration'''.<br />
<br />
==Project Configuration==<br />
:Define project's metadata<br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox1.png|thumb|center|800px|Java Info, SAI]]<br />
<br />
:Add input and output parameters and click on "Set Code" to indicate the main file to execute (i.e. the .jar file)<br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox2.png|thumb|center|800px|Java I/O, SAI]]<br />
<br />
:Add information about the running environment (e.g. Java version etc.) <br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox3.png|thumb|center|800px|Java Interpreter, SAI]]<br />
<br />
:After the [https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer:_Create_Software software creation phase] a Main.R file and a Taget folder are created<br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox4.png|thumb|center|800px|Java Create, SAI]]<br />
<br />
== Example Code ==<br />
:Java code in sample:<br />
<br />
<source lang='java'><br />
/**<br />
* <br />
* @author Giancarlo Panichi<br />
* <br />
*<br />
*/<br />
import java.io.File;<br />
import java.io.FileWriter;<br />
<br />
public class SimpleProducer<br />
{<br />
public static void main(String[] args)<br />
{<br />
try<br />
{<br />
FileWriter fw = new FileWriter(new File("program.txt"));<br />
fw.write("Check: " + args[0]);<br />
fw.close();<br />
}<br />
catch (Exception e)<br />
{<br />
e.printStackTrace();<br />
}<br />
}<br />
}<br />
</source><br />
<br />
==Example Download==<br />
[[File:JavaBlackBox.zip|JavaBlackBox.zip]]<br />
<br />
<!--<br />
==References==<br />
{{Reflist}} --><br />
<br />
=White Box Integration=<br />
This is the preferred way for developers who want their processes to fully exploit the e-Infrastructure resources, for example to implement Cloud computing using the e-Infrastructure computational resources. This integration modality also allows to fully reuse the Java data mining frameworks integrated by DataMiner, i.e. [https://www.knime.com/ Knime], [https://rapidminer.com/ RapidMiner], [https://www.cs.waikato.ac.nz/ml/weka/ Weka], [https://wiki.gcube-system.org/gcube/Statistical_Manager_Algorithms gCube EcologicalEngine]. '''The Eclipse IDE should be used for this integration'''.<br />
<br />
[https://gcube.wiki.gcube-system.org/gcube/How-to_Implement_Algorithms_for_DataMiner Step-by-step guide to integrate Java processes as white boxes]<br />
<!--<br />
[[Template:Statistical Algorithms Importer]] <br />
--><br />
<br />
[[Category:Statistical Algorithms Importer]]</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Statistical_Algorithms_Importer:_Java_Project&diff=30328Statistical Algorithms Importer: Java Project2017-11-24T14:41:50Z<p>Gianpaolo.coro: </p>
<hr />
<div>{| align="right"<br />
||__TOC__<br />
|}<br />
<br />
This page explains how to create a Java project using the Statistical Algorithms Importer (SAI) portlet with two approaches.<br />
<br />
Two approaches are possible: '''Black-box''' and '''White-box''' integration. The next sections explain how they work and which cases these two ways address.<br />
<br />
=Black Box Integration=<br />
<br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox0.png|thumb|center|250px|Java Project, SAI]]<br />
<br />
This is the preferred way for developers who want their processes distributed based on the load of requests. Each process request will run on one dedicated machine. Black box processed usually do not use the e-Infrastructure resources but live on their own.<br />
<br />
==Project Configuration==<br />
:Define project's metadata<br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox1.png|thumb|center|800px|Java Info, SAI]]<br />
<br />
:Add input and output parameters and click on "Set Code" to indicate the main file to execute (i.e. the .jar file)<br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox2.png|thumb|center|800px|Java I/O, SAI]]<br />
<br />
:Add information about the running environment (e.g. Java version etc.) <br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox3.png|thumb|center|800px|Java Interpreter, SAI]]<br />
<br />
:After the [https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer:_Create_Software software creation phase] a Main.R file and a Taget folder are created<br />
[[Image:StatisticalAlgorithmsImporter_JavaBlackBox4.png|thumb|center|800px|Java Create, SAI]]<br />
<br />
== Example Code ==<br />
:Java code in sample:<br />
<br />
<source lang='java'><br />
/**<br />
* <br />
* @author Giancarlo Panichi<br />
* <br />
*<br />
*/<br />
import java.io.File;<br />
import java.io.FileWriter;<br />
<br />
public class SimpleProducer<br />
{<br />
public static void main(String[] args)<br />
{<br />
try<br />
{<br />
FileWriter fw = new FileWriter(new File("program.txt"));<br />
fw.write("Check: " + args[0]);<br />
fw.close();<br />
}<br />
catch (Exception e)<br />
{<br />
e.printStackTrace();<br />
}<br />
}<br />
}<br />
</source><br />
<br />
==Example Download==<br />
[[File:JavaBlackBox.zip|JavaBlackBox.zip]]<br />
<br />
<!--<br />
==References==<br />
{{Reflist}} --><br />
<br />
=White Box Integration=<br />
This is the preferred way for developers who want their processes to fully exploit the e-Infrastructure resources, for example to implement Cloud computing using the e-Infrastructure computational resources. This integration modality allows also to fully reuse the Java data mining frameworks integrated by DataMiner, i.e. [https://www.knime.com/ Knime], [https://rapidminer.com/ RapidMiner], [https://www.cs.waikato.ac.nz/ml/weka/ Weka], [https://wiki.gcube-system.org/gcube/Statistical_Manager_Algorithms gCube EcologicalEngine].<br />
<br />
[https://gcube.wiki.gcube-system.org/gcube/How-to_Implement_Algorithms_for_DataMiner Step-by-step guide to integrate Java processes as white boxes]<br />
<!--<br />
[[Template:Statistical Algorithms Importer]] <br />
--><br />
<br />
[[Category:Statistical Algorithms Importer]]</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=DataMiner_Algorithms&diff=30037DataMiner Algorithms2017-11-09T10:06:57Z<p>Gianpaolo.coro: </p>
<hr />
<div>{| align="right"<br />
||__TOC__<br />
|}<br />
<br />
The complete list of currently active algorithms freely published by the [[DataMiner_Manager | DataMiner]] service is reported below. <br />
<br />
{|border="1" cellpadding="5" cellspacing="0"<br />
! colspan=2 bgcolor=lightgrey | <div id="HCAF_INTERPOLATION">HCAF_INTERPOLATION</div><br />
|-<br />
|| Description<br />
||Evaluates the climatic changes impact on species presence<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="FEED_FORWARD_A_N_N_DISTRIBUTION">FEED_FORWARD_A_N_N_DISTRIBUTION</div><br />
|-<br />
|| Description<br />
||A Bayesian method using a Feed Forward Neural Network to simulate a function from the features space (R^n) to R. A modeling algorithm that relies on Neural Networks to simulate a real valued function. It accepts as input a table containing the training dataset and some parameters affecting the algorithm behaviour such as the number of neurons, the learning threshold and the maximum number of iterations.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="XYEXTRACTOR">XYEXTRACTOR</div><br />
|-<br />
|| Description<br />
||An algorithm to extract values associated to an environmental feature repository (e.g. NETCDF, ASC, GeoTiff files etc. ). A grid of points at a certain resolution is specified by the user and values are associated to the points from the environmental repository. It accepts as one geospatial repository ID (via their UUIDs in the infrastructure spatial data repository - recoverable through the Geoexplorer portlet) or a direct link to a file and the specification about time and space. The algorithm produces one table containing the values associated to the selected bounding box.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="BIOCLIMATE_HSPEN">BIOCLIMATE_HSPEN</div><br />
|-<br />
|| Description<br />
||A transducer algorithm that generates a table containing species envelops (HSPEN) in time, i.e. models capturing species tolerance with respect to environmental parameters, used by the AquaMaps approach. Evaluates the climatic changes impact on the variation of the salinity values in several ranges of a set of species envelopes<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="HSPEN">HSPEN</div><br />
|-<br />
|| Description<br />
||The AquMaps HSPEN algorithm. A modeling algorithm that generates a table containing species envelops (HSPEN), i.e. models capturing species tolerance with respect to environmental parameters, to be used by the AquaMaps approach.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="SUPPORT_VECTOR_MACHINE_REGRESSOR">SUPPORT_VECTOR_MACHINE_REGRESSOR</div><br />
|-<br />
|| Description<br />
||A simple algorithm for regression using and already trained Support Vector Machine<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="AQUAMAPS_NATIVE">AQUAMAPS_NATIVE</div><br />
|-<br />
|| Description<br />
||Algorithm for Native Range by Aquamaps on a single node<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="POLYGONS_TO_MAP">POLYGONS_TO_MAP</div><br />
|-<br />
|| Description<br />
||A transducer algorithm to produce a GIS map of filled polygons associated to x,y coordinates and a certain resolution. A maximum of 259000 is allowed<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="LISTDBNAMES">LISTDBNAMES</div><br />
|-<br />
|| Description<br />
||Algorithm that allows to view the available database resources names in the Infrastructure<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="BIOCLIMATE_HSPEC">BIOCLIMATE_HSPEC</div><br />
|-<br />
|| Description<br />
||A transducer algorithm that generates a table containing an estimate of species distributions per half-degree cell (HSPEC) in time. Evaluates the climatic changes impact on species presence.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="LOF">LOF</div><br />
|-<br />
|| Description<br />
||Local Outlier Factor (LOF). A clustering algorithm for real valued vectors that relies on Local Outlier Factor algorithm, i.e. an algorithm for finding anomalous data points by measuring the local deviation of a given data point with respect to its neighbours. A Maximum of 4000 points is allowed.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="TIME_GEO_CHART">TIME_GEO_CHART</div><br />
|-<br />
|| Description<br />
||An algorithm producing an animated gif displaying quantities as colors in time. The color indicates the sum of the values recorded in a country.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="SPECIES_OBSERVATIONS_PER_AREA">SPECIES_OBSERVATIONS_PER_AREA</div><br />
|-<br />
|| Description<br />
||An algorithm producing a bar chart for the distribution of a species along a certain type of marine area (e.g. LME or MEOW)<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="LWR">LWR</div><br />
|-<br />
|| Description<br />
||An algorithm to estimate Length-Weight relationship parameters for marine species, using Bayesian methods. Runs an R procedure. Based on the Cube-law theory.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="DBSCAN">DBSCAN</div><br />
|-<br />
|| Description<br />
||A clustering algorithm for real valued vectors that relies on the density-based spatial clustering of applications with noise (DBSCAN) algorithm. A maximum of 4000 points is allowed.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="BIONYM">BIONYM</div><br />
|-<br />
|| Description<br />
||An algorithm implementing BiOnym, a flexible workflow approach to taxon name matching. The workflow allows to activate several taxa names matching algorithms and to get the list of possible transcriptions for a list of input raw species names with possible authorship indication.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="DISCREPANCY_ANALYSIS">DISCREPANCY_ANALYSIS</div><br />
|-<br />
|| Description<br />
||An evaluator algorithm that compares two tables containing real valued vectors. It drives the comparison by relying on a geographical distance threshold and a threshold for K-Statistic.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="AQUAMAPS_SUITABLE">AQUAMAPS_SUITABLE</div><br />
|-<br />
|| Description<br />
||Algorithm by Aquamaps on a single node<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="AQUAMAPS_NATIVE_NEURALNETWORK">AQUAMAPS_NATIVE_NEURALNETWORK</div><br />
|-<br />
|| Description<br />
||Aquamaps Native Algorithm calculated by a Neural Network. A distribution algorithm that relies on Neural Networks and AquaMaps data for native distributions to generate a table containing species distribution probabilities on half-degree cells.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="ECOPATH_WITH_ECOSIM">ECOPATH_WITH_ECOSIM</div><br />
|-<br />
|| Description<br />
||Ecopath with Ecosim (EwE) is a free ecological/ecosystem modeling software suite. This algorithm implementation expects a model and a configuration file as inputs; the result of the analysis is returned as a zip archive. References: Christensen, V., &amp; Walters, C. J. (2004). Ecopath with Ecosim: methods, capabilities and limitations. Ecological modelling, 172(2), 109-139.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="OCCURRENCES_INTERSECTOR">OCCURRENCES_INTERSECTOR</div><br />
|-<br />
|| Description<br />
||A transducer algorithm that produces a table of species occurrence points that are contained in both the two starting tables where points equivalence is identified via user defined comparison thresholds. Works with up to 10000 points per table. Between two ocurrence sets, it keeps the elements of the Right Set that are similar to elements in the Left Set.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="BIONYM_LOCAL">BIONYM_LOCAL</div><br />
|-<br />
|| Description<br />
||A fast version of the algorithm implementing BiOnym, a flexible workflow approach to taxon name matching. The workflow allows to activate several taxa names matching algorithms and to get the list of possible transcriptions for a list of input raw species names with possible authorship indication.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="TIME_SERIES_ANALYSIS">TIME_SERIES_ANALYSIS</div><br />
|-<br />
|| Description<br />
||An algorithms applying signal processing to a non uniform time series. A maximum of 10000 distinct points in time is allowed to be processed. The process uniformly samples the series, then extracts hidden periodicities and signal properties. The sampling period is the shortest time difference between two points. Finally, by using Caterpillar-SSA the algorithm forecasts the Time Series. The output shows the detected periodicity, the forecasted signal and the spectrogram.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="OCCURRENCE_ENRICHMENT">OCCURRENCE_ENRICHMENT</div><br />
|-<br />
|| Description<br />
||An algorithm performing occurrences enrichment. Takes as input one table containing occurrence points for a set of species and a list of environmental layer, taken either from the e-infrastructure GeoNetwork (through the GeoExplorer application) or from direct HTTP links. Produces one table reporting the set of environmental values associated to the occurrence points.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="FAO_OCEAN_AREA_COLUMN_CREATOR_FROM_QUADRANT">FAO_OCEAN_AREA_COLUMN_CREATOR_FROM_QUADRANT</div><br />
|-<br />
|| Description<br />
||An algorithm that adds a column containing the FAO Ocean Area codes associated to longitude, latitude and quadrant columns.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="POINTS_TO_MAP">POINTS_TO_MAP</div><br />
|-<br />
|| Description<br />
||A transducer algorithm to produce a GIS map of points from a set of points with x,y coordinates indications. A maximum of 259000 is allowed<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="BIONYM_BIODIV">BIONYM_BIODIV</div><br />
|-<br />
|| Description<br />
||An algorithm implementing BiOnym oriented to Biodiversity Taxa Names Matching with a predefined and optimized workflow. This version applies in sequence the following Matchers: GSay (thr:0.6, maxRes:10), FuzzyMatcher (thr:0.6, maxRes:10), Levenshtein (thr:0.4, maxRes:10), Trigram (thr:0.4, maxRes:10). BiOnym is a flexible workflow approach to taxon name matching. The workflow allows to activate several taxa names matching algorithms and to get the list of possible transcriptions for a list of input raw species names with possible authorship indication.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="XMEANS">XMEANS</div><br />
|-<br />
|| Description<br />
||A clustering algorithm for occurrence points that relies on the X-Means algorithm, i.e. an extended version of the K-Means algorithm improved by an Improve-Structure part. A Maximum of 4000 points is allowed.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="QUALITY_ANALYSIS">QUALITY_ANALYSIS</div><br />
|-<br />
|| Description<br />
||An evaluator algorithm that assesses the effectiveness of a distribution model by computing the Receiver Operating Characteristics (ROC), the Area Under Curve (AUC) and the Accuracy of a model<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="TIME_SERIES_CHARTS">TIME_SERIES_CHARTS</div><br />
|-<br />
|| Description<br />
||An algorithm producing time series charts of attributes vs. quantities. Charts are displayed per quantity column and superposing quantities are summed.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="ABSENCE_CELLS_FROM_AQUAMAPS">ABSENCE_CELLS_FROM_AQUAMAPS</div><br />
|-<br />
|| Description<br />
||An algorithm producing cells and features (HCAF) for a species containing absense points taken by an Aquamaps Distribution<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="AQUAMAPS_SUITABLE_NEURALNETWORK">AQUAMAPS_SUITABLE_NEURALNETWORK</div><br />
|-<br />
|| Description<br />
||Aquamaps Algorithm for Suitable Environment calculated by Neural Network. A distribution algorithm that relies on Neural Networks and AquaMaps data for suitable distributions to generate a table containing species distribution probabilities on half-degree cells.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="SPECIES_MAP_FROM_POINTS">SPECIES_MAP_FROM_POINTS</div><br />
|-<br />
|| Description<br />
||A transducer algorithm to produce a GIS map from a probability distribution made upf of x,y coordinates and a certain resolution. A maximum of 259000 is allowed<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="MOST_OBSERVED_TAXA">MOST_OBSERVED_TAXA</div><br />
|-<br />
|| Description<br />
||An algorithm producing a bar chart for the most observed taxa in a certain years range (with respect to the OBIS database)<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="LISTTABLES">LISTTABLES</div><br />
|-<br />
|| Description<br />
||Algorithm that allows to view the table names of a chosen database<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="CSQUARE_COLUMN_CREATOR">CSQUARE_COLUMN_CREATOR</div><br />
|-<br />
|| Description<br />
||An algorithm that adds a column containing the CSquare codes associated to longitude and latitude columns.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="FAO_OCEAN_AREA_COLUMN_CREATOR">FAO_OCEAN_AREA_COLUMN_CREATOR</div><br />
|-<br />
|| Description<br />
||An algorithm that adds a column containing the FAO Ocean Area codes associated to longitude and latitude columns.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="FEED_FORWARD_ANN">FEED_FORWARD_ANN</div><br />
|-<br />
|| Description<br />
||A method to train a generic Feed Forward Artifical Neural Network in order to simulate a function from the features space (R^n) to R. Uses the Back-propagation method. Produces a trained neural network in the form of a compiled file which can be used in the FEED FORWARD NEURAL NETWORK DISTRIBUTION algorithm.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="SPECIES_MAP_FROM_CSQUARES">SPECIES_MAP_FROM_CSQUARES</div><br />
|-<br />
|| Description<br />
||A transducer algorithm to produce a GIS map from a probability distribution associated to a set of csquare codes. A maximum of 259000 is allowed<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="HRS">HRS</div><br />
|-<br />
|| Description<br />
||An evaluator algorithm that calculates the Habitat Representativeness Score, i.e. an indicator of the assessment of whether a specific survey coverage or another environmental features dataset, contains data that are representative of all available habitat variable combinations in an area.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="ESRI_GRID_EXTRACTION">ESRI_GRID_EXTRACTION</div><br />
|-<br />
|| Description<br />
||An algorithm to extract values associated to an environmental feature repository (e.g. NETCDF, ASC, GeoTiff files etc. ). A grid of points at a certain resolution is specified by the user and values are associated to the points from the environmental repository. It accepts as one geospatial repository ID (via their UUIDs in the infrastructure spatial data repository - recoverable through the Geoexplorer portlet) or a direct link to a file and the specification about time and space. The algorithm produces one ESRI GRID ASCII file containing the values associated to the selected bounding box.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="SPECIES_OBSERVATION_MEOW_AREA_PER_YEAR">SPECIES_OBSERVATION_MEOW_AREA_PER_YEAR</div><br />
|-<br />
|| Description<br />
||Algorithm returning most observed species in a specific years range (data collected from OBIS database).<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="SPECIES_OBSERVATION_LME_AREA_PER_YEAR">SPECIES_OBSERVATION_LME_AREA_PER_YEAR</div><br />
|-<br />
|| Description<br />
||Algorithm returning most observed species in a specific years range (data collected from OBIS database).<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="ABSENCE_GENERATION_FROM_OBIS">ABSENCE_GENERATION_FROM_OBIS</div><br />
|-<br />
|| Description<br />
||An algorithm to estimate absence records from survey data in OBIS. Based on the work in Coro, G., Magliozzi, C., Berghe, E. V., Bailly, N., Ellenbroek, A., &amp; Pagano, P. (2016). Estimating absence locations of marine species from data of scientific surveys in OBIS. Ecological Modelling, 323, 61-76.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="GEO_CHART">GEO_CHART</div><br />
|-<br />
|| Description<br />
||An algorithm producing a charts that displays quantities as colors of countries. The color indicates the sum of the values recorded in a country.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="WEB_APP_PUBLISHER">WEB_APP_PUBLISHER</div><br />
|-<br />
|| Description<br />
||This algorithm publishes a zip file containing a Web site, based on html and javascript in the e-Infrastructure. It generates a public URL to the application that can be shared.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="KMEANS">KMEANS</div><br />
|-<br />
|| Description<br />
||A clustering algorithm for real valued vectors that relies on the k-means algorithm, i.e. a method aiming to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. A Maximum of 4000 points is allowed.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="SGVM_INTERPOLATION">SGVM_INTERPOLATION</div><br />
|-<br />
|| Description<br />
||An interpolation method relying on the implementation by the Study Group on VMS (SGVMS). The method uses two interpolation approached to simulate vessels points at a certain temporal resolution. The input is a file in TACSAT format uploaded on the Statistical Manager. The output is another TACSAT file containing interpolated points.The underlying R code has been extracted from the SGVM VMSTools framework. This algorithm comes after a feasibility study (http://goo.gl/risQre) which clarifies the features an e-Infrastructure adds to the original scripts. Limitation: the input will be processed up to 10000 vessels trajectory points. Credits: Hintzen, N. T., Bastardie, F., Beare, D., Piet, G. J., Ulrich, C., Deporte, N., Egekvist, J., et al. 2012. VMStools: Open-source software for the processing, analysis and visualisation of fisheries logbook and VMS data. Fisheries Research, 115-116: 31-43. Hintzen, N. T., Piet, G. J., and Brunel, T. 2010. Improved estimation of trawling tracks using cubic Hermite spline interpolation of position registration data. Fisheries Research, 101: 108-115. VMStools, available as an add-on package for R. Documentation available at https://code.google.com/p/vmstools/. Build versions of VMStools for Window, Mac, Linux available at https://docs.google.com/. Authors: Niels T. Hintzen, Doug Beare<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="TIMEEXTRACTION">TIMEEXTRACTION</div><br />
|-<br />
|| Description<br />
||An algorithm to extract a time series of values associated to a geospatial features repository (e.g. NETCDF, ASC, GeoTiff files etc. ). The algorithm analyses the time series and automatically searches for hidden periodicities. It produces one chart of the time series, one table containing the time series values and possibly the spectrogram.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="XYEXTRACTOR_TABLE">XYEXTRACTOR_TABLE</div><br />
|-<br />
|| Description<br />
||An algorithm to extract values associated to a table containing geospatial features (e.g. Vessel Routes, Species distribution maps etc. ). A grid of points at a certain resolution is specified by the user and values are associated to the points from the environmental repository. It accepts as one geospatial table and the specification about time and space. The algorithm produces one table containing the values associated to the selected bounding box.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="FEED_FORWARD_NEURAL_NETWORK_REGRESSOR">FEED_FORWARD_NEURAL_NETWORK_REGRESSOR</div><br />
|-<br />
|| Description<br />
||The algorithm simulates a real-valued vector function using a trained Feed Forward Artificial Neural Network and returns a table containing the function actual inputs and the predicted outputs<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="GRID_CWP_TO_COORDINATES">GRID_CWP_TO_COORDINATES</div><br />
|-<br />
|| Description<br />
||An algorithm that adds longitude, latitude and resolution columns analysing a column containing FAO Ocean Area codes (CWP format).<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="ESTIMATE_MONTHLY_FISHING_EFFORT">ESTIMATE_MONTHLY_FISHING_EFFORT</div><br />
|-<br />
|| Description<br />
||An algorithm that estimates fishing exploitation at 0.5 degrees resolution from activity-classified vessels trajectories. Produces a table with csquare codes, latitudes, longitudes and resolution and associated overall fishing hours in the time frame of the vessels activity. Requires each activity point to be classified as Fishing or other. This algorithm is based on the paper 'Deriving Fishing Monthly Effort and Caught Species' (Coro et al. 2013, in proc. of OCEANS - Bergen, 2013 MTS/IEEE). Example of input table (NAFO anonymised data): http://goo.gl/3auJkM<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="FEED_FORWARD_NEURAL_NETWORK_TRAINER">FEED_FORWARD_NEURAL_NETWORK_TRAINER</div><br />
|-<br />
|| Description<br />
||The algorithm trains a Feed Forward Artificial Neural Network using an online Back-Propagation procedure and returns the training error and a binary file containing the trained network<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="FIGIS_SPATIAL_REALLOCATION_GENERIC">FIGIS_SPATIAL_REALLOCATION_GENERIC</div><br />
|-<br />
|| Description<br />
||The Spatial Reallocaton algorithm allows to estimate statistics for other areas from those where they were reported. The algorithm is based on spatial disaggregation technics and provides at now an area-weighted reallocation.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="TAXONOMY_OBSERVATIONS_TREND_PER_YEAR">TAXONOMY_OBSERVATIONS_TREND_PER_YEAR</div><br />
|-<br />
|| Description<br />
||Algorithm returning most observations taxonomy trend in a specific years range (with respect to the OBIS database)<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="GENERIC_CHARTS">GENERIC_CHARTS</div><br />
|-<br />
|| Description<br />
||An algorithm producing generic charts of attributes vs. quantities. Charts are displayed per quantity column. Histograms, Scattering and Radar charts are produced for the top ten quantities. A gaussian distribution reports overall statistics for the quantities.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="BIOCLIMATE_HCAF">BIOCLIMATE_HCAF</div><br />
|-<br />
|| Description<br />
||A transducer algorithm that generates an Half-degree Cells Authority File (HCAF) dataset for a certain time frame, with environmental parameters used by the AquaMaps approach. Evaluates the climatic changes impact on the variation of the ocean features contained in HCAF tables<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="SHAPEFILE_PUBLISHER">SHAPEFILE_PUBLISHER</div><br />
|-<br />
|| Description<br />
||An algorithm to publish shapefiles under WMS and WFS standards in the e-Infrastructure. The produced WMS, WFS links are reported as output of this process. The map will be available in the VRE for consultation.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="CMSY_2">CMSY_2</div><br />
|-<br />
|| Description<br />
||The CMSY method for data-limited stock assessment. Described in Froese, R., Demirel, N., Coro, G., Kleisner, K. M., Winker, H. (2016). Estimating fisheries reference points from catch and resilience. Fish and Fisheries. Paper link: http://onlinelibrary.wiley.com/doi/10.1111/faf.12190/ Full Instructions and code: https://github.com/SISTA16/cmsy<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="FIGIS_SDMX_DATA_CONVERTER">FIGIS_SDMX_DATA_CONVERTER</div><br />
|-<br />
|| Description<br />
||This tool allows to convert easily a SDMX dataset into CSV, by callingthe rsdmx package for R<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="HCAF_FILTER">HCAF_FILTER</div><br />
|-<br />
|| Description<br />
||An algorithm producing a HCAF table on a selected Bounding Box (default identifies Indonesia)<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="RASTER_DATA_PUBLISHER">RASTER_DATA_PUBLISHER</div><br />
|-<br />
|| Description<br />
||This algorithm publishes a raster file as a maps or datasets in the e-Infrastructure. NetCDF-CF files are encouraged, as WMS and WCS maps will be produced using this format. For other types of files (GeoTiffs, ASC etc.) only the raw datasets will be published. The resulting map or dataset will be accessible via the VRE GeoExplorer by the VRE participants.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="AQUAMAPS_SUITABLE_2050">AQUAMAPS_SUITABLE_2050</div><br />
|-<br />
|| Description<br />
||Algorithm for Suitable Range in 2050 by Aquamaps on a single node<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="TIMEEXTRACTION_TABLE">TIMEEXTRACTION_TABLE</div><br />
|-<br />
|| Description<br />
||An algorithm to extract a time series of values associated to a table containing geospatial information. The algorithm analyses the time series and automatically searches for hidden periodicities. It produces one chart of the time series, one table containing the time series values and possibly the spectrogram.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="FIGIS_SPATIAL_REALLOCATION_SIMPLIFIED_TABLE">FIGIS_SPATIAL_REALLOCATION_SIMPLIFIED_TABLE</div><br />
|-<br />
|| Description<br />
||The Spatial Reallocaton algorithm allows to estimate statistics for other areas from those where they were reported. The algorithm is based on spatial disaggregation technics and provides at now an area-weighted reallocation. This simplified algorithm is specifically targeting users from the FAO Fisheries and Aquaculture department, aims to facilitate its execution by doing abstraction of the intersections to provide.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="AQUAMAPS_NATIVE_2050">AQUAMAPS_NATIVE_2050</div><br />
|-<br />
|| Description<br />
||Algorithm for Native Range in 2050 by Aquamaps on a single node<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="SPECIES_OBSERVATIONS_TREND_PER_YEAR">SPECIES_OBSERVATIONS_TREND_PER_YEAR</div><br />
|-<br />
|| Description<br />
||An algorithm producing the trend of the observations for a certain species in a certain years range.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="FAOMSY">FAOMSY</div><br />
|-<br />
|| Description<br />
||An algorithm to be used by Fisheries managers for stock assessment. Estimates the Maximum Sustainable Yield (MSY) of a stock, based on a catch trend. The algorithm has been developed by the Resource Use and Conservation Division of the FAO Fisheries and Aquaculture Department (contact: Yimin Ye, yimin.ye@fao.org). It is applicable to a CSV file containing metadata and catch statistics for a set of marine species and produces MSY estimates for each species. The CSV must follow a FAO-defined format (e.g. http://goo.gl/g6YtVx). The output is made up of two (optional) files: one for sucessfully processed species and another one for species that could not be processed because data were not sufficient to estimate MSY.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="PRESENCE_CELLS_GENERATION">PRESENCE_CELLS_GENERATION</div><br />
|-<br />
|| Description<br />
||An algorithm producing cells and features (HCAF) for a species containing presence points<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="ZEXTRACTION">ZEXTRACTION</div><br />
|-<br />
|| Description<br />
||An algorithm to extract the Z values from a geospatial features repository (e.g. NETCDF, ASC, GeoTiff files etc. ). The algorithm analyses the repository and automatically extracts the Z values according to the resolution wanted by the user. It produces one chart of the Z values and one table containing the values.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="MOST_OBSERVED_SPECIES">MOST_OBSERVED_SPECIES</div><br />
|-<br />
|| Description<br />
||An algorithm producing a bar chart for the most observed species in a certain years range (with respect to the OBIS database)<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="SEADATANET_INTERPOLATOR">SEADATANET_INTERPOLATOR</div><br />
|-<br />
|| Description<br />
||A connector for the SeaDataNet infrastructure. This algorithms invokes the Data-Interpolating Variational Analysis (DIVA) SeaDataNet service to interpolate spatial data. The model uses GEBCO bathymetry data and requires an estimate of the maximum spatial span of the correlation between points and the signal-to-noise ratio, among the other parameters. It can interpolate up to 10,000 points randomly taken from the input table. As output, it produces a NetCDF file with a uniform grid of values. This powerful interpolation model is described in Troupin et al. 2012, 'Generation of analysis and consistent error fields using the Data Interpolating Variational Analysis (Diva)', Ocean Modelling, 52-53, 90-101.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="LISTDBINFO">LISTDBINFO</div><br />
|-<br />
|| Description<br />
||Algorithm that allows to view information about one chosen resource of Database Type in the Infrastructure<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="OCCURRENCES_MARINE_TERRESTRIAL">OCCURRENCES_MARINE_TERRESTRIAL</div><br />
|-<br />
|| Description<br />
||A transducer algorithm that produces a table containing occurrence points by filtering them by type of area, i.e. by recognising whether they are marine or terrestrial. Works with up to 10000 points per table.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="OCCURRENCES_DUPLICATES_DELETER">OCCURRENCES_DUPLICATES_DELETER</div><br />
|-<br />
|| Description<br />
||A transducer algorithm that produces a duplicate free table of species occurrence points where duplicates have been identified via user defined comparison thresholds. Works with up to 100 000 points<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="FIGIS_SPATIAL_REALLOCATION_SIMPLIFIED">FIGIS_SPATIAL_REALLOCATION_SIMPLIFIED</div><br />
|-<br />
|| Description<br />
||The Spatial Reallocaton algorithm allows to estimate statistics for other areas from those where they were reported. The algorithm is based on spatial disaggregation technics and provides at now an area-weighted reallocation. This simplified algorithm is specifically targeting users from the FAO Fisheries and Aquaculture department, aims to facilitate its execution by doing abstraction of the intersections to provide.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="LISTDBSCHEMA">LISTDBSCHEMA</div><br />
|-<br />
|| Description<br />
||Algorithm that allows to view the schema names of a chosen database for which the type is Postgres<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="OCCURRENCES_SUBTRACTION">OCCURRENCES_SUBTRACTION</div><br />
|-<br />
|| Description<br />
||A transducer algorithm that produces a table resulting from the difference between two occurrence points tables where points equivalence is identified via user defined comparison thresholds. Works with up to 10000 points per table. Between two Ocurrence Sets, keeps the elements of the Left Set that are not similar to any element in the Right Set.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="OCCURRENCES_MERGER">OCCURRENCES_MERGER</div><br />
|-<br />
|| Description<br />
||A transducer algorithm that produces a duplicate-free table resulting from the union of two occurrence points tables where points equivalence is identified via user defined comparison thresholds. Works with up to 10000 points per table. Between two Ocurrence Sets, enrichs the Left Set with the elements of the Right Set that are not in the Left Set. Updates the elements of the Left Set with more recent elements in the Right Set. If one element in the Left Set corresponds to several recent elements in the Right Set, these will be all substituted to the element of the Left Set.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="ZEXTRACTION_TABLE">ZEXTRACTION_TABLE</div><br />
|-<br />
|| Description<br />
||An algorithm to extract a time series of values associated to a table containing geospatial information. The algorithm analyses the time series and automatically searches for hidden periodicities. It produces one chart of the time series, one table containing the time series values and possibly the spectrogram.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="ESTIMATE_FISHING_ACTIVITY">ESTIMATE_FISHING_ACTIVITY</div><br />
|-<br />
|| Description<br />
||An algorithm that estimates activity hours (fishing or other) from vessels trajectories, adds bathymetry information to the table and classifies (point-by-point) fishing activity of the involved vessels according to two algorithms: one based on speed (activity_class_speed output column) and the other based on speed and bathymetry (activity_class_speed_bath output column). The algorithm produces new columns containing this information. This algorithm is based on the paper 'Deriving Fishing Monthly Effort and Caught Species' (Coro et al. 2013, in proc. of OCEANS - Bergen, 2013 MTS/IEEE). Example of input table (NAFO anonymised data): http://goo.gl/3auJkM<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="AQUAMAPSNN">AQUAMAPSNN</div><br />
|-<br />
|| Description<br />
||The AquaMaps model trained using a Feed Forward Neural Network. This is a method to train a generic Feed Forward Artifical Neural Network to be used by the AquaMaps Neural Network algorithm. Produces a trained neural network in the form of a compiled file which can be used later.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="MAX_ENT_NICHE_MODELLING">MAX_ENT_NICHE_MODELLING</div><br />
|-<br />
|| Description<br />
||A Maximum-Entropy model for species habitat modeling, based on the implementation by Shapire et al. v 3.3.3k, Princeton University, http://www.cs.princeton.edu/schapire/maxent/. In this adaptation for the D4Science infrastructure, the software accepts a table produced by the Species Product Discovery service and a set of environmental layers in various formats (NetCDF, WFS, WCS, ASC, GeoTiff) via direct links or GeoExplorer UUIDs. The user can also establish the bounding box and the spatial resolution (in decimal deg.) of the training and the projection. The application will adapt the layers to that resolution if this is higher than the native one.The output contains: a thumbnail map of the projected model, the ROC curve, the Omission/Commission chart, a table containing the raw assigned values, a threshold to transform the table into a 0-1 probability distribution, a report of the importance of the used layers in the model, ASCII representations of the input layers to check their alignment.Other processes can be later applied to the raw values to produce a GIS map (e.g. the Statistical Manager Points-to-Map process) and results can be shared. Demo video: http://goo.gl/TYYnTO and instructions http://wiki.i-marine.eu/index.php/MaxEnt<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="MAPS_COMPARISON">MAPS_COMPARISON</div><br />
|-<br />
|| Description<br />
||An algorithm for comparing two OGC/NetCDF maps in seamless way to the user. The algorithm assesses the similarities between two geospatial maps by comparing them in a point-to-point fashion. It accepts as input the two geospatial maps (via their UUIDs in the infrastructure spatial data repository - recoverable through the Geoexplorer portlet) and some parameters affecting the comparison such as the z-index, the time index, the comparison threshold. Note: in the case of WFS layers it makes comparisons on the last feature column.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="SUPPORT_VECTOR_MACHINE_TRAINER">SUPPORT_VECTOR_MACHINE_TRAINER</div><br />
|-<br />
|| Description<br />
||A simple algorithm to train a Support Vector Machine<br />
|-<br />
<br />
<br />
|}</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=DataMiner_Algorithms&diff=30036DataMiner Algorithms2017-11-09T10:06:02Z<p>Gianpaolo.coro: </p>
<hr />
<div>{| align="right"<br />
||__TOC__<br />
|}<br />
<br />
The complete list of currently active algorithms published by the [[DataMiner_Manager | DataMiner]] service is reported below. <br />
<br />
{|border="1" cellpadding="5" cellspacing="0"<br />
! colspan=2 bgcolor=lightgrey | <div id="HCAF_INTERPOLATION">HCAF_INTERPOLATION</div><br />
|-<br />
|| Description<br />
||Evaluates the climatic changes impact on species presence<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="FEED_FORWARD_A_N_N_DISTRIBUTION">FEED_FORWARD_A_N_N_DISTRIBUTION</div><br />
|-<br />
|| Description<br />
||A Bayesian method using a Feed Forward Neural Network to simulate a function from the features space (R^n) to R. A modeling algorithm that relies on Neural Networks to simulate a real valued function. It accepts as input a table containing the training dataset and some parameters affecting the algorithm behaviour such as the number of neurons, the learning threshold and the maximum number of iterations.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="XYEXTRACTOR">XYEXTRACTOR</div><br />
|-<br />
|| Description<br />
||An algorithm to extract values associated to an environmental feature repository (e.g. NETCDF, ASC, GeoTiff files etc. ). A grid of points at a certain resolution is specified by the user and values are associated to the points from the environmental repository. It accepts as one geospatial repository ID (via their UUIDs in the infrastructure spatial data repository - recoverable through the Geoexplorer portlet) or a direct link to a file and the specification about time and space. The algorithm produces one table containing the values associated to the selected bounding box.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="BIOCLIMATE_HSPEN">BIOCLIMATE_HSPEN</div><br />
|-<br />
|| Description<br />
||A transducer algorithm that generates a table containing species envelops (HSPEN) in time, i.e. models capturing species tolerance with respect to environmental parameters, used by the AquaMaps approach. Evaluates the climatic changes impact on the variation of the salinity values in several ranges of a set of species envelopes<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="HSPEN">HSPEN</div><br />
|-<br />
|| Description<br />
||The AquMaps HSPEN algorithm. A modeling algorithm that generates a table containing species envelops (HSPEN), i.e. models capturing species tolerance with respect to environmental parameters, to be used by the AquaMaps approach.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="SUPPORT_VECTOR_MACHINE_REGRESSOR">SUPPORT_VECTOR_MACHINE_REGRESSOR</div><br />
|-<br />
|| Description<br />
||A simple algorithm for regression using and already trained Support Vector Machine<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="AQUAMAPS_NATIVE">AQUAMAPS_NATIVE</div><br />
|-<br />
|| Description<br />
||Algorithm for Native Range by Aquamaps on a single node<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="POLYGONS_TO_MAP">POLYGONS_TO_MAP</div><br />
|-<br />
|| Description<br />
||A transducer algorithm to produce a GIS map of filled polygons associated to x,y coordinates and a certain resolution. A maximum of 259000 is allowed<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="LISTDBNAMES">LISTDBNAMES</div><br />
|-<br />
|| Description<br />
||Algorithm that allows to view the available database resources names in the Infrastructure<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="BIOCLIMATE_HSPEC">BIOCLIMATE_HSPEC</div><br />
|-<br />
|| Description<br />
||A transducer algorithm that generates a table containing an estimate of species distributions per half-degree cell (HSPEC) in time. Evaluates the climatic changes impact on species presence.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="LOF">LOF</div><br />
|-<br />
|| Description<br />
||Local Outlier Factor (LOF). A clustering algorithm for real valued vectors that relies on Local Outlier Factor algorithm, i.e. an algorithm for finding anomalous data points by measuring the local deviation of a given data point with respect to its neighbours. A Maximum of 4000 points is allowed.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="TIME_GEO_CHART">TIME_GEO_CHART</div><br />
|-<br />
|| Description<br />
||An algorithm producing an animated gif displaying quantities as colors in time. The color indicates the sum of the values recorded in a country.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="SPECIES_OBSERVATIONS_PER_AREA">SPECIES_OBSERVATIONS_PER_AREA</div><br />
|-<br />
|| Description<br />
||An algorithm producing a bar chart for the distribution of a species along a certain type of marine area (e.g. LME or MEOW)<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="LWR">LWR</div><br />
|-<br />
|| Description<br />
||An algorithm to estimate Length-Weight relationship parameters for marine species, using Bayesian methods. Runs an R procedure. Based on the Cube-law theory.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="DBSCAN">DBSCAN</div><br />
|-<br />
|| Description<br />
||A clustering algorithm for real valued vectors that relies on the density-based spatial clustering of applications with noise (DBSCAN) algorithm. A maximum of 4000 points is allowed.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="BIONYM">BIONYM</div><br />
|-<br />
|| Description<br />
||An algorithm implementing BiOnym, a flexible workflow approach to taxon name matching. The workflow allows to activate several taxa names matching algorithms and to get the list of possible transcriptions for a list of input raw species names with possible authorship indication.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="DISCREPANCY_ANALYSIS">DISCREPANCY_ANALYSIS</div><br />
|-<br />
|| Description<br />
||An evaluator algorithm that compares two tables containing real valued vectors. It drives the comparison by relying on a geographical distance threshold and a threshold for K-Statistic.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="AQUAMAPS_SUITABLE">AQUAMAPS_SUITABLE</div><br />
|-<br />
|| Description<br />
||Algorithm by Aquamaps on a single node<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="AQUAMAPS_NATIVE_NEURALNETWORK">AQUAMAPS_NATIVE_NEURALNETWORK</div><br />
|-<br />
|| Description<br />
||Aquamaps Native Algorithm calculated by a Neural Network. A distribution algorithm that relies on Neural Networks and AquaMaps data for native distributions to generate a table containing species distribution probabilities on half-degree cells.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="ECOPATH_WITH_ECOSIM">ECOPATH_WITH_ECOSIM</div><br />
|-<br />
|| Description<br />
||Ecopath with Ecosim (EwE) is a free ecological/ecosystem modeling software suite. This algorithm implementation expects a model and a configuration file as inputs; the result of the analysis is returned as a zip archive. References: Christensen, V., &amp; Walters, C. J. (2004). Ecopath with Ecosim: methods, capabilities and limitations. Ecological modelling, 172(2), 109-139.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="OCCURRENCES_INTERSECTOR">OCCURRENCES_INTERSECTOR</div><br />
|-<br />
|| Description<br />
||A transducer algorithm that produces a table of species occurrence points that are contained in both the two starting tables where points equivalence is identified via user defined comparison thresholds. Works with up to 10000 points per table. Between two ocurrence sets, it keeps the elements of the Right Set that are similar to elements in the Left Set.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="BIONYM_LOCAL">BIONYM_LOCAL</div><br />
|-<br />
|| Description<br />
||A fast version of the algorithm implementing BiOnym, a flexible workflow approach to taxon name matching. The workflow allows to activate several taxa names matching algorithms and to get the list of possible transcriptions for a list of input raw species names with possible authorship indication.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="TIME_SERIES_ANALYSIS">TIME_SERIES_ANALYSIS</div><br />
|-<br />
|| Description<br />
||An algorithms applying signal processing to a non uniform time series. A maximum of 10000 distinct points in time is allowed to be processed. The process uniformly samples the series, then extracts hidden periodicities and signal properties. The sampling period is the shortest time difference between two points. Finally, by using Caterpillar-SSA the algorithm forecasts the Time Series. The output shows the detected periodicity, the forecasted signal and the spectrogram.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="OCCURRENCE_ENRICHMENT">OCCURRENCE_ENRICHMENT</div><br />
|-<br />
|| Description<br />
||An algorithm performing occurrences enrichment. Takes as input one table containing occurrence points for a set of species and a list of environmental layer, taken either from the e-infrastructure GeoNetwork (through the GeoExplorer application) or from direct HTTP links. Produces one table reporting the set of environmental values associated to the occurrence points.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="FAO_OCEAN_AREA_COLUMN_CREATOR_FROM_QUADRANT">FAO_OCEAN_AREA_COLUMN_CREATOR_FROM_QUADRANT</div><br />
|-<br />
|| Description<br />
||An algorithm that adds a column containing the FAO Ocean Area codes associated to longitude, latitude and quadrant columns.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="POINTS_TO_MAP">POINTS_TO_MAP</div><br />
|-<br />
|| Description<br />
||A transducer algorithm to produce a GIS map of points from a set of points with x,y coordinates indications. A maximum of 259000 is allowed<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="BIONYM_BIODIV">BIONYM_BIODIV</div><br />
|-<br />
|| Description<br />
||An algorithm implementing BiOnym oriented to Biodiversity Taxa Names Matching with a predefined and optimized workflow. This version applies in sequence the following Matchers: GSay (thr:0.6, maxRes:10), FuzzyMatcher (thr:0.6, maxRes:10), Levenshtein (thr:0.4, maxRes:10), Trigram (thr:0.4, maxRes:10). BiOnym is a flexible workflow approach to taxon name matching. The workflow allows to activate several taxa names matching algorithms and to get the list of possible transcriptions for a list of input raw species names with possible authorship indication.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="XMEANS">XMEANS</div><br />
|-<br />
|| Description<br />
||A clustering algorithm for occurrence points that relies on the X-Means algorithm, i.e. an extended version of the K-Means algorithm improved by an Improve-Structure part. A Maximum of 4000 points is allowed.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="QUALITY_ANALYSIS">QUALITY_ANALYSIS</div><br />
|-<br />
|| Description<br />
||An evaluator algorithm that assesses the effectiveness of a distribution model by computing the Receiver Operating Characteristics (ROC), the Area Under Curve (AUC) and the Accuracy of a model<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="TIME_SERIES_CHARTS">TIME_SERIES_CHARTS</div><br />
|-<br />
|| Description<br />
||An algorithm producing time series charts of attributes vs. quantities. Charts are displayed per quantity column and superposing quantities are summed.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="ABSENCE_CELLS_FROM_AQUAMAPS">ABSENCE_CELLS_FROM_AQUAMAPS</div><br />
|-<br />
|| Description<br />
||An algorithm producing cells and features (HCAF) for a species containing absense points taken by an Aquamaps Distribution<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="AQUAMAPS_SUITABLE_NEURALNETWORK">AQUAMAPS_SUITABLE_NEURALNETWORK</div><br />
|-<br />
|| Description<br />
||Aquamaps Algorithm for Suitable Environment calculated by Neural Network. A distribution algorithm that relies on Neural Networks and AquaMaps data for suitable distributions to generate a table containing species distribution probabilities on half-degree cells.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="SPECIES_MAP_FROM_POINTS">SPECIES_MAP_FROM_POINTS</div><br />
|-<br />
|| Description<br />
||A transducer algorithm to produce a GIS map from a probability distribution made upf of x,y coordinates and a certain resolution. A maximum of 259000 is allowed<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="MOST_OBSERVED_TAXA">MOST_OBSERVED_TAXA</div><br />
|-<br />
|| Description<br />
||An algorithm producing a bar chart for the most observed taxa in a certain years range (with respect to the OBIS database)<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="LISTTABLES">LISTTABLES</div><br />
|-<br />
|| Description<br />
||Algorithm that allows to view the table names of a chosen database<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="CSQUARE_COLUMN_CREATOR">CSQUARE_COLUMN_CREATOR</div><br />
|-<br />
|| Description<br />
||An algorithm that adds a column containing the CSquare codes associated to longitude and latitude columns.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="FAO_OCEAN_AREA_COLUMN_CREATOR">FAO_OCEAN_AREA_COLUMN_CREATOR</div><br />
|-<br />
|| Description<br />
||An algorithm that adds a column containing the FAO Ocean Area codes associated to longitude and latitude columns.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="FEED_FORWARD_ANN">FEED_FORWARD_ANN</div><br />
|-<br />
|| Description<br />
||A method to train a generic Feed Forward Artifical Neural Network in order to simulate a function from the features space (R^n) to R. Uses the Back-propagation method. Produces a trained neural network in the form of a compiled file which can be used in the FEED FORWARD NEURAL NETWORK DISTRIBUTION algorithm.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="SPECIES_MAP_FROM_CSQUARES">SPECIES_MAP_FROM_CSQUARES</div><br />
|-<br />
|| Description<br />
||A transducer algorithm to produce a GIS map from a probability distribution associated to a set of csquare codes. A maximum of 259000 is allowed<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="HRS">HRS</div><br />
|-<br />
|| Description<br />
||An evaluator algorithm that calculates the Habitat Representativeness Score, i.e. an indicator of the assessment of whether a specific survey coverage or another environmental features dataset, contains data that are representative of all available habitat variable combinations in an area.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="ESRI_GRID_EXTRACTION">ESRI_GRID_EXTRACTION</div><br />
|-<br />
|| Description<br />
||An algorithm to extract values associated to an environmental feature repository (e.g. NETCDF, ASC, GeoTiff files etc. ). A grid of points at a certain resolution is specified by the user and values are associated to the points from the environmental repository. It accepts as one geospatial repository ID (via their UUIDs in the infrastructure spatial data repository - recoverable through the Geoexplorer portlet) or a direct link to a file and the specification about time and space. The algorithm produces one ESRI GRID ASCII file containing the values associated to the selected bounding box.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="SPECIES_OBSERVATION_MEOW_AREA_PER_YEAR">SPECIES_OBSERVATION_MEOW_AREA_PER_YEAR</div><br />
|-<br />
|| Description<br />
||Algorithm returning most observed species in a specific years range (data collected from OBIS database).<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="SPECIES_OBSERVATION_LME_AREA_PER_YEAR">SPECIES_OBSERVATION_LME_AREA_PER_YEAR</div><br />
|-<br />
|| Description<br />
||Algorithm returning most observed species in a specific years range (data collected from OBIS database).<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="ABSENCE_GENERATION_FROM_OBIS">ABSENCE_GENERATION_FROM_OBIS</div><br />
|-<br />
|| Description<br />
||An algorithm to estimate absence records from survey data in OBIS. Based on the work in Coro, G., Magliozzi, C., Berghe, E. V., Bailly, N., Ellenbroek, A., &amp; Pagano, P. (2016). Estimating absence locations of marine species from data of scientific surveys in OBIS. Ecological Modelling, 323, 61-76.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="GEO_CHART">GEO_CHART</div><br />
|-<br />
|| Description<br />
||An algorithm producing a charts that displays quantities as colors of countries. The color indicates the sum of the values recorded in a country.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="WEB_APP_PUBLISHER">WEB_APP_PUBLISHER</div><br />
|-<br />
|| Description<br />
||This algorithm publishes a zip file containing a Web site, based on html and javascript in the e-Infrastructure. It generates a public URL to the application that can be shared.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="KMEANS">KMEANS</div><br />
|-<br />
|| Description<br />
||A clustering algorithm for real valued vectors that relies on the k-means algorithm, i.e. a method aiming to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. A Maximum of 4000 points is allowed.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="SGVM_INTERPOLATION">SGVM_INTERPOLATION</div><br />
|-<br />
|| Description<br />
||An interpolation method relying on the implementation by the Study Group on VMS (SGVMS). The method uses two interpolation approached to simulate vessels points at a certain temporal resolution. The input is a file in TACSAT format uploaded on the Statistical Manager. The output is another TACSAT file containing interpolated points.The underlying R code has been extracted from the SGVM VMSTools framework. This algorithm comes after a feasibility study (http://goo.gl/risQre) which clarifies the features an e-Infrastructure adds to the original scripts. Limitation: the input will be processed up to 10000 vessels trajectory points. Credits: Hintzen, N. T., Bastardie, F., Beare, D., Piet, G. J., Ulrich, C., Deporte, N., Egekvist, J., et al. 2012. VMStools: Open-source software for the processing, analysis and visualisation of fisheries logbook and VMS data. Fisheries Research, 115-116: 31-43. Hintzen, N. T., Piet, G. J., and Brunel, T. 2010. Improved estimation of trawling tracks using cubic Hermite spline interpolation of position registration data. Fisheries Research, 101: 108-115. VMStools, available as an add-on package for R. Documentation available at https://code.google.com/p/vmstools/. Build versions of VMStools for Window, Mac, Linux available at https://docs.google.com/. Authors: Niels T. Hintzen, Doug Beare<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="TIMEEXTRACTION">TIMEEXTRACTION</div><br />
|-<br />
|| Description<br />
||An algorithm to extract a time series of values associated to a geospatial features repository (e.g. NETCDF, ASC, GeoTiff files etc. ). The algorithm analyses the time series and automatically searches for hidden periodicities. It produces one chart of the time series, one table containing the time series values and possibly the spectrogram.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="XYEXTRACTOR_TABLE">XYEXTRACTOR_TABLE</div><br />
|-<br />
|| Description<br />
||An algorithm to extract values associated to a table containing geospatial features (e.g. Vessel Routes, Species distribution maps etc. ). A grid of points at a certain resolution is specified by the user and values are associated to the points from the environmental repository. It accepts as one geospatial table and the specification about time and space. The algorithm produces one table containing the values associated to the selected bounding box.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="FEED_FORWARD_NEURAL_NETWORK_REGRESSOR">FEED_FORWARD_NEURAL_NETWORK_REGRESSOR</div><br />
|-<br />
|| Description<br />
||The algorithm simulates a real-valued vector function using a trained Feed Forward Artificial Neural Network and returns a table containing the function actual inputs and the predicted outputs<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="GRID_CWP_TO_COORDINATES">GRID_CWP_TO_COORDINATES</div><br />
|-<br />
|| Description<br />
||An algorithm that adds longitude, latitude and resolution columns analysing a column containing FAO Ocean Area codes (CWP format).<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="ESTIMATE_MONTHLY_FISHING_EFFORT">ESTIMATE_MONTHLY_FISHING_EFFORT</div><br />
|-<br />
|| Description<br />
||An algorithm that estimates fishing exploitation at 0.5 degrees resolution from activity-classified vessels trajectories. Produces a table with csquare codes, latitudes, longitudes and resolution and associated overall fishing hours in the time frame of the vessels activity. Requires each activity point to be classified as Fishing or other. This algorithm is based on the paper 'Deriving Fishing Monthly Effort and Caught Species' (Coro et al. 2013, in proc. of OCEANS - Bergen, 2013 MTS/IEEE). Example of input table (NAFO anonymised data): http://goo.gl/3auJkM<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="FEED_FORWARD_NEURAL_NETWORK_TRAINER">FEED_FORWARD_NEURAL_NETWORK_TRAINER</div><br />
|-<br />
|| Description<br />
||The algorithm trains a Feed Forward Artificial Neural Network using an online Back-Propagation procedure and returns the training error and a binary file containing the trained network<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="FIGIS_SPATIAL_REALLOCATION_GENERIC">FIGIS_SPATIAL_REALLOCATION_GENERIC</div><br />
|-<br />
|| Description<br />
||The Spatial Reallocaton algorithm allows to estimate statistics for other areas from those where they were reported. The algorithm is based on spatial disaggregation technics and provides at now an area-weighted reallocation.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="TAXONOMY_OBSERVATIONS_TREND_PER_YEAR">TAXONOMY_OBSERVATIONS_TREND_PER_YEAR</div><br />
|-<br />
|| Description<br />
||Algorithm returning most observations taxonomy trend in a specific years range (with respect to the OBIS database)<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="GENERIC_CHARTS">GENERIC_CHARTS</div><br />
|-<br />
|| Description<br />
||An algorithm producing generic charts of attributes vs. quantities. Charts are displayed per quantity column. Histograms, Scattering and Radar charts are produced for the top ten quantities. A gaussian distribution reports overall statistics for the quantities.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="BIOCLIMATE_HCAF">BIOCLIMATE_HCAF</div><br />
|-<br />
|| Description<br />
||A transducer algorithm that generates an Half-degree Cells Authority File (HCAF) dataset for a certain time frame, with environmental parameters used by the AquaMaps approach. Evaluates the climatic changes impact on the variation of the ocean features contained in HCAF tables<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="SHAPEFILE_PUBLISHER">SHAPEFILE_PUBLISHER</div><br />
|-<br />
|| Description<br />
||An algorithm to publish shapefiles under WMS and WFS standards in the e-Infrastructure. The produced WMS, WFS links are reported as output of this process. The map will be available in the VRE for consultation.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="CMSY_2">CMSY_2</div><br />
|-<br />
|| Description<br />
||The CMSY method for data-limited stock assessment. Described in Froese, R., Demirel, N., Coro, G., Kleisner, K. M., Winker, H. (2016). Estimating fisheries reference points from catch and resilience. Fish and Fisheries. Paper link: http://onlinelibrary.wiley.com/doi/10.1111/faf.12190/ Full Instructions and code: https://github.com/SISTA16/cmsy<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="FIGIS_SDMX_DATA_CONVERTER">FIGIS_SDMX_DATA_CONVERTER</div><br />
|-<br />
|| Description<br />
||This tool allows to convert easily a SDMX dataset into CSV, by callingthe rsdmx package for R<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="HCAF_FILTER">HCAF_FILTER</div><br />
|-<br />
|| Description<br />
||An algorithm producing a HCAF table on a selected Bounding Box (default identifies Indonesia)<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="RASTER_DATA_PUBLISHER">RASTER_DATA_PUBLISHER</div><br />
|-<br />
|| Description<br />
||This algorithm publishes a raster file as a maps or datasets in the e-Infrastructure. NetCDF-CF files are encouraged, as WMS and WCS maps will be produced using this format. For other types of files (GeoTiffs, ASC etc.) only the raw datasets will be published. The resulting map or dataset will be accessible via the VRE GeoExplorer by the VRE participants.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="AQUAMAPS_SUITABLE_2050">AQUAMAPS_SUITABLE_2050</div><br />
|-<br />
|| Description<br />
||Algorithm for Suitable Range in 2050 by Aquamaps on a single node<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="TIMEEXTRACTION_TABLE">TIMEEXTRACTION_TABLE</div><br />
|-<br />
|| Description<br />
||An algorithm to extract a time series of values associated to a table containing geospatial information. The algorithm analyses the time series and automatically searches for hidden periodicities. It produces one chart of the time series, one table containing the time series values and possibly the spectrogram.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="FIGIS_SPATIAL_REALLOCATION_SIMPLIFIED_TABLE">FIGIS_SPATIAL_REALLOCATION_SIMPLIFIED_TABLE</div><br />
|-<br />
|| Description<br />
||The Spatial Reallocaton algorithm allows to estimate statistics for other areas from those where they were reported. The algorithm is based on spatial disaggregation technics and provides at now an area-weighted reallocation. This simplified algorithm is specifically targeting users from the FAO Fisheries and Aquaculture department, aims to facilitate its execution by doing abstraction of the intersections to provide.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="AQUAMAPS_NATIVE_2050">AQUAMAPS_NATIVE_2050</div><br />
|-<br />
|| Description<br />
||Algorithm for Native Range in 2050 by Aquamaps on a single node<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="SPECIES_OBSERVATIONS_TREND_PER_YEAR">SPECIES_OBSERVATIONS_TREND_PER_YEAR</div><br />
|-<br />
|| Description<br />
||An algorithm producing the trend of the observations for a certain species in a certain years range.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="FAOMSY">FAOMSY</div><br />
|-<br />
|| Description<br />
||An algorithm to be used by Fisheries managers for stock assessment. Estimates the Maximum Sustainable Yield (MSY) of a stock, based on a catch trend. The algorithm has been developed by the Resource Use and Conservation Division of the FAO Fisheries and Aquaculture Department (contact: Yimin Ye, yimin.ye@fao.org). It is applicable to a CSV file containing metadata and catch statistics for a set of marine species and produces MSY estimates for each species. The CSV must follow a FAO-defined format (e.g. http://goo.gl/g6YtVx). The output is made up of two (optional) files: one for sucessfully processed species and another one for species that could not be processed because data were not sufficient to estimate MSY.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="PRESENCE_CELLS_GENERATION">PRESENCE_CELLS_GENERATION</div><br />
|-<br />
|| Description<br />
||An algorithm producing cells and features (HCAF) for a species containing presence points<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="ZEXTRACTION">ZEXTRACTION</div><br />
|-<br />
|| Description<br />
||An algorithm to extract the Z values from a geospatial features repository (e.g. NETCDF, ASC, GeoTiff files etc. ). The algorithm analyses the repository and automatically extracts the Z values according to the resolution wanted by the user. It produces one chart of the Z values and one table containing the values.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="MOST_OBSERVED_SPECIES">MOST_OBSERVED_SPECIES</div><br />
|-<br />
|| Description<br />
||An algorithm producing a bar chart for the most observed species in a certain years range (with respect to the OBIS database)<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="SEADATANET_INTERPOLATOR">SEADATANET_INTERPOLATOR</div><br />
|-<br />
|| Description<br />
||A connector for the SeaDataNet infrastructure. This algorithms invokes the Data-Interpolating Variational Analysis (DIVA) SeaDataNet service to interpolate spatial data. The model uses GEBCO bathymetry data and requires an estimate of the maximum spatial span of the correlation between points and the signal-to-noise ratio, among the other parameters. It can interpolate up to 10,000 points randomly taken from the input table. As output, it produces a NetCDF file with a uniform grid of values. This powerful interpolation model is described in Troupin et al. 2012, 'Generation of analysis and consistent error fields using the Data Interpolating Variational Analysis (Diva)', Ocean Modelling, 52-53, 90-101.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="LISTDBINFO">LISTDBINFO</div><br />
|-<br />
|| Description<br />
||Algorithm that allows to view information about one chosen resource of Database Type in the Infrastructure<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="OCCURRENCES_MARINE_TERRESTRIAL">OCCURRENCES_MARINE_TERRESTRIAL</div><br />
|-<br />
|| Description<br />
||A transducer algorithm that produces a table containing occurrence points by filtering them by type of area, i.e. by recognising whether they are marine or terrestrial. Works with up to 10000 points per table.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="OCCURRENCES_DUPLICATES_DELETER">OCCURRENCES_DUPLICATES_DELETER</div><br />
|-<br />
|| Description<br />
||A transducer algorithm that produces a duplicate free table of species occurrence points where duplicates have been identified via user defined comparison thresholds. Works with up to 100 000 points<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="FIGIS_SPATIAL_REALLOCATION_SIMPLIFIED">FIGIS_SPATIAL_REALLOCATION_SIMPLIFIED</div><br />
|-<br />
|| Description<br />
||The Spatial Reallocaton algorithm allows to estimate statistics for other areas from those where they were reported. The algorithm is based on spatial disaggregation technics and provides at now an area-weighted reallocation. This simplified algorithm is specifically targeting users from the FAO Fisheries and Aquaculture department, aims to facilitate its execution by doing abstraction of the intersections to provide.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="LISTDBSCHEMA">LISTDBSCHEMA</div><br />
|-<br />
|| Description<br />
||Algorithm that allows to view the schema names of a chosen database for which the type is Postgres<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="OCCURRENCES_SUBTRACTION">OCCURRENCES_SUBTRACTION</div><br />
|-<br />
|| Description<br />
||A transducer algorithm that produces a table resulting from the difference between two occurrence points tables where points equivalence is identified via user defined comparison thresholds. Works with up to 10000 points per table. Between two Ocurrence Sets, keeps the elements of the Left Set that are not similar to any element in the Right Set.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="OCCURRENCES_MERGER">OCCURRENCES_MERGER</div><br />
|-<br />
|| Description<br />
||A transducer algorithm that produces a duplicate-free table resulting from the union of two occurrence points tables where points equivalence is identified via user defined comparison thresholds. Works with up to 10000 points per table. Between two Ocurrence Sets, enrichs the Left Set with the elements of the Right Set that are not in the Left Set. Updates the elements of the Left Set with more recent elements in the Right Set. If one element in the Left Set corresponds to several recent elements in the Right Set, these will be all substituted to the element of the Left Set.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="ZEXTRACTION_TABLE">ZEXTRACTION_TABLE</div><br />
|-<br />
|| Description<br />
||An algorithm to extract a time series of values associated to a table containing geospatial information. The algorithm analyses the time series and automatically searches for hidden periodicities. It produces one chart of the time series, one table containing the time series values and possibly the spectrogram.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="ESTIMATE_FISHING_ACTIVITY">ESTIMATE_FISHING_ACTIVITY</div><br />
|-<br />
|| Description<br />
||An algorithm that estimates activity hours (fishing or other) from vessels trajectories, adds bathymetry information to the table and classifies (point-by-point) fishing activity of the involved vessels according to two algorithms: one based on speed (activity_class_speed output column) and the other based on speed and bathymetry (activity_class_speed_bath output column). The algorithm produces new columns containing this information. This algorithm is based on the paper 'Deriving Fishing Monthly Effort and Caught Species' (Coro et al. 2013, in proc. of OCEANS - Bergen, 2013 MTS/IEEE). Example of input table (NAFO anonymised data): http://goo.gl/3auJkM<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="AQUAMAPSNN">AQUAMAPSNN</div><br />
|-<br />
|| Description<br />
||The AquaMaps model trained using a Feed Forward Neural Network. This is a method to train a generic Feed Forward Artifical Neural Network to be used by the AquaMaps Neural Network algorithm. Produces a trained neural network in the form of a compiled file which can be used later.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="MAX_ENT_NICHE_MODELLING">MAX_ENT_NICHE_MODELLING</div><br />
|-<br />
|| Description<br />
||A Maximum-Entropy model for species habitat modeling, based on the implementation by Shapire et al. v 3.3.3k, Princeton University, http://www.cs.princeton.edu/schapire/maxent/. In this adaptation for the D4Science infrastructure, the software accepts a table produced by the Species Product Discovery service and a set of environmental layers in various formats (NetCDF, WFS, WCS, ASC, GeoTiff) via direct links or GeoExplorer UUIDs. The user can also establish the bounding box and the spatial resolution (in decimal deg.) of the training and the projection. The application will adapt the layers to that resolution if this is higher than the native one.The output contains: a thumbnail map of the projected model, the ROC curve, the Omission/Commission chart, a table containing the raw assigned values, a threshold to transform the table into a 0-1 probability distribution, a report of the importance of the used layers in the model, ASCII representations of the input layers to check their alignment.Other processes can be later applied to the raw values to produce a GIS map (e.g. the Statistical Manager Points-to-Map process) and results can be shared. Demo video: http://goo.gl/TYYnTO and instructions http://wiki.i-marine.eu/index.php/MaxEnt<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="MAPS_COMPARISON">MAPS_COMPARISON</div><br />
|-<br />
|| Description<br />
||An algorithm for comparing two OGC/NetCDF maps in seamless way to the user. The algorithm assesses the similarities between two geospatial maps by comparing them in a point-to-point fashion. It accepts as input the two geospatial maps (via their UUIDs in the infrastructure spatial data repository - recoverable through the Geoexplorer portlet) and some parameters affecting the comparison such as the z-index, the time index, the comparison threshold. Note: in the case of WFS layers it makes comparisons on the last feature column.<br />
|-<br />
<br />
! colspan=2 bgcolor=lightgrey | <div id="SUPPORT_VECTOR_MACHINE_TRAINER">SUPPORT_VECTOR_MACHINE_TRAINER</div><br />
|-<br />
|| Description<br />
||A simple algorithm to train a Support Vector Machine<br />
|-<br />
<br />
<br />
|}</div>Gianpaolo.corohttps://wiki.gcube-system.org/index.php?title=Pre_Installed_Packages&diff=29728Pre Installed Packages2017-10-26T08:54:41Z<p>Gianpaolo.coro: </p>
<hr />
<div><!-- CATEGORIES --><br />
[[Category:Developer's Guide]]<br />
<!-- END CATEGORIES --><br />
=Preamble=<br />
This Wiki reports the packages pre-installed on the computational machines for a variety of languages<br />
<br />
==R Packages==<br />
A constantly updated list of installed R 3.4.0 Packages is available at the following [https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/RConfiguration/RPackagesManagement/r_cran_pkgs.txt LINK].<br />
<br />
Github packages are show here:[https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/RConfiguration/RPackagesManagement/r_github_pkgs.txt LINK].<br />
<br />
==Linux Debian Packages==<br />
A constantly updated list of installed Debian 3.2.51-1 x86_64 Packages is available at the following [https://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/RConfiguration/RPackagesManagement/r_deb_pkgs.txt LINK]<br />
<br />
==Octave Packages==<br />
Octave 4.0.2 is installed on the computational machines with basic packages.<br />
<br />
==Java==<br />
Java 8 is installed on the computational machines, with the dependencies of the Ecological Engine framework, retrievable through Maven [http://maven.research-infrastructures.eu/nexus/index.html#nexus-search;gav~org.gcube.dataanalysis~ecological-engine-geospatial-extensions HERE] (refer to the latest version).<br />
<br />
==Python Packages==<br />
Python 2.7.6 is installed on the computational machines with basic packages.<br />
<br />
==Windows Packages==<br />
Windows .Net compiled programs are supported through the [http://www.mono-project.com/ Mono] 3.2.8 simulator.</div>Gianpaolo.coro