Difference between revisions of "Statistical Algorithms Importer: Create Project"

From Gcube Wiki
Jump to: navigation, search
(SAI Project Type)
m
(14 intermediate revisions by 2 users not shown)
Line 3: Line 3:
 
|}
 
|}
 
   
 
   
:This page explains how to create a project using Statistical Algorithms Importer(SAI) portlet.
+
:This page explains how to create a project using [[Statistical_Algorithms_Importer|Statistical Algorithms Importer (SAI)]] portlet.
  
  
==SAI Project Type==
+
== SAI Project Type ==
:SAI allows to create different project type
+
[[Image:StatisticalAlgorithmsImporter_CreateNewProject.png|thumb|center|800px|Create Project, SAI]]
 +
 
 +
:The fist step is to select the project type. Then, using the Create Project button in the main menu SAI allows to create different project type:
  
 
* [[Statistical Algorithms Importer: R Project|R Project]]
 
* [[Statistical Algorithms Importer: R Project|R Project]]
 +
* [[Statistical Algorithms Importer: R-blackbox Project|R-blackbox Project]]
 +
* [[Statistical Algorithms Importer: Java Project|Java Project]]
 +
* [[Statistical Algorithms Importer: Knime-Workflow Project|Knime-Workflow Project]]
 +
* [[Statistical Algorithms Importer: Linux-compiled Project|Linux-compiled Project]]
 +
* [[Statistical Algorithms Importer: Octave Project|Octave Project]]
 +
* [[Statistical Algorithms Importer: Python Project|Python Project]]
 
* [[Statistical Algorithms Importer: Windows Project|Windows Project]]
 
* [[Statistical Algorithms Importer: Windows Project|Windows Project]]
* [[Statistical Algorithms Importer: PreInstalled Project|PreInstalled Project]]
+
* [[Statistical Algorithms Importer: Pre-Installed Project|Pre-Installed Project]]
 
+
==Project Folder==
+
:The fist step is to create or select an empty folder on the e-Infrastructure Workspace. Then, using the Create Project button in the main menu, the system creates an empty project in that folder.
+
 
+
[[Image:StatisticalAlgorithmsImporter_CreateProject.png|thumb|center|800px|Create Project, SAI]]
+
 
+
==Import Resources==
+
:Any resource needed to run the script can be imported in the Project Folder. Resources cab be added either via the Workspace or using the Add Resource button in main menu, or dragging and dropping files in the folder window.
+
[[Image:StatisticalAlgorithmsImporter_AddResource.png|thumb|center|800px|Add Resource, SAI]]
+
 
+
:Thus, if the resource is on the user's local file system, (s)he can use the Drag and Drop facility, working also with multiple files.
+
 
+
[[Image:StatisticalAlgorithmsImporter_ProjectExplorerDND.png|thumb|center|800px|Adding resources with Drag and Drop, SAI]]
+
 
+
==Import Resources From GitHub==
+
:If you have a project on GitHub, you can import it into SAI. After creating a new project, just click the menu button on GitHub.
+
 
+
[[Image:StatisticalAlgorithmsImporter_GitHubMenu.png|thumb|center|800px|GitHub on Menu, SAI]]
+
 
+
:You may access the GitHub Connector wizard. Please, read here to see how to use it: [[GitHub Connector|GitHub Connector]]
+
 
+
==Set Main Code==
+
:After adding the scripts and resources, one of the script files should be indicated as Main code. The e-Infrastructure will run this code, which is supposed to import and orchestrate the other scripts. Indicating a script as Main code can be done by clicking the Set Main button in Project Explorer. The file will be loaded in the Editor. In this phase the system also reads possible annotations inside the script (e.g. WPS4R annotations). At this point, the user can change the code and save it using the Save button on the Editor panel. Alternatively, the user can also use Copy and Paste by writing the code directly in the editor and then save it, still using the Save button in Editor menu (A file name will be requested).
+
 
+
[[Image:StatisticalAlgorithmsImporter_MainCodeFull.png|thumb|center|800px|Set Main Code facility, SAI]]
+
 
+
==Input==
+
:In this area the system collects all the information required by the system to create software for the e-Infrastructure and communicate with the e-Infrastructure team. Metadata, input/output information, global parameters and required packages are collected here.
+
 
+
===Global Variables===
+
:In this panel you can add any Global Variable that are used by the script as pre-requisite.
+
[[Image:StatisticalAlgorithmsImporter_AbsenceSpecies_GlobalV.png|thumb|center|800px|Global Variables indication, SAI]]
+
 
+
===Input/Output===
+
:In this area, selected input and output from the script is collected. In order to add a new I/O, the user should select a row in the code (from the the Editor) and than click the +Input (or +Output) button in the Menu Editor.
+
A new row is added to the Input/Output list. The system parses the code behind the scenes and guesses the best type, description and name of the parameter. Once a row has been created in the Input/Output window, the user can change information by clicking on the row. At least one input is required for compiling the project. '''The name of the input variable and the default value should not be changed unless a parsing error occurred'''. The reason is that the infrastructure will discover the variables inside the script by using the name and the default value.
+
 
+
'''Note: as a general rule, always set a default value for a variable, otherwise the execution of the algorithm may be compromised. Thus, do not use empty strings as default values.'''
+
 
+
[[Image:StatisticalAlgorithmsImporter_AbsenceSpecies_InputOutput.png|thumb|center|800px|Input/Output window, SAI]]
+
 
+
 
+
===Advanced Input===
+
It is possible to indicate spatial inputs or time/date inputs. The details for the definition of these dare are reported in the [[Advanced Input| Advanced Input ]]
+
 
+
===Interprer Info===
+
:You can add Version and Packages information in the Interpreter Info panel. The version number is mandatory for the project. Here, for example, a user should specify the version of the R interpreter and the packages needed to run the script. These will be installed on the e-Infrastructure machines during the first deployment session.
+
 
+
[[Image:StatisticalAlgorithmsImporter_AbsenceSpecies_InterpreterInfo.png|thumb|center|800px|Interpreter Info, SAI]]
+
 
+
===Project Info===
+
:A name and a description of the project are mandatory. These will be displayed to the user of the e-Infrastructure and should also contain proper citation of the algorithm. Special characters are not allowed for the algorithm name. The user can include the category of the algorithm.
+
[[Image:StatisticalAlgorithmsImporter_AbsenceSpecies_ProjectInfo.png|thumb|center|800px|Project Info, SAI]]
+
 
+
==Save Project==
+
:You can save project by click on Save button in main menu. A file called stat_algo.project is add to Project Folder.
+
[[Image:StatisticalAlgorithmsImporter_AbsenceSpecies_SaveProject.png|thumb|center|800px|Save Project, SAI]]
+
 
+
 
+
==Using WPS4R Annotations==
+
:SAI automatically parses R code containing [https://wiki.52north.org/bin/view/Geostatistics/WPS4R WPS4R annotations], the system automatically transforms annotations into Input/Output panel and Project Info panel information. The name of algorithm is mandatory in the annotations. We report a full example of annotated algorithm and attach the complete algorithm in a zip package:
+
 
+
<pre style="display:block;font-family:monospace;white-space:pre;margin:1em 0;">
+
############################################################################################################################
+
############# Absence Generation Script - Gianpaolo Coro and Chiara Magliozzi, CNR 2015, Last version 06-07-2015 ###########
+
############################################################################################################################
+
#Modified 25-05-2017
+
 
+
#52North WPS annotations
+
# wps.des: id = Absence_generation_from_OBIS, title = Absence_generation_from_OBIS, abstract = A script to estimate absence records from OBIS;
+
 
+
####REST API VERSION#####
+
rm(list=ls(all=TRUE))
+
graphics.off()
+
 
+
## charging the libraries
+
library(DBI)
+
library(RPostgreSQL)
+
library(raster)
+
library(maptools)
+
library("sqldf")
+
library(RJSONIO)
+
library(httr)
+
library(data.table)
+
 
+
# time
+
t0<-Sys.time()
+
 
+
## parameters
+
# wps.in: id = list, type = text/plain, title = list of species beginning with the speciesname header,value="species.txt";
+
list= "species.txt"
+
specieslist<-read.table(list,header=T,sep=",") # my short dataset 2 species
+
#attach(specieslist)
+
# wps.in: id = res, type = double, title = resolution of the analysis,value=1;
+
res=1;
+
extent_x=180
+
extent_y=90
+
n=extent_y*2/res;
+
m=extent_x*2/res;
+
# wps.in: id = occ_percentage, type = double, title = percentage of observations occurrence of a viable survey,value=0.1;
+
occ_percentage=0.05 #between 0 and 1
+
 
+
#uncomment for time filtering
+
 
+
#No time filter
+
TimeStart<-"";
+
TimeEnd<-"";
+
 
+
TimeStart<-gsub("(^ +)|( +$)", "",TimeStart)
+
TimeEnd<-gsub("(^ +)|( +$)", "", TimeEnd)
+
 
+
#AUX function
+
pos_id<-function(latitude,longitude){
+
  #latitude<-round(latitude, digits = 3)
+
  #longitude<-round(longitude, digits = 3)
+
  latitude<-latitude
+
  longitude<-longitude
+
  code<-paste(latitude,";",longitude,sep="")
+
  return(code)
+
}
+
 
+
## opening the connection with postgres
+
cat("REST API VERSION\n")
+
cat("PROCESS VERSION 6 \n")
+
cat("Opening the connection with the catalog\n")
+
#drv <- dbDriver("PostgreSQL")
+
#con <- dbConnect(drv, dbname="obis", host="obisdb-stage.vliz.be", port="5432", user="obisreader", password="0815r3@d3r")
+
 
+
cat("Analyzing the list of species\n")
+
counter=0;
+
overall=length(specieslist$scientificname)
+
 
+
cat("Extraction from the different contributors the total number of obs per resource id...\n")
+
 
+
timefilter<-""
+
if (nchar(TimeStart)>0 && nchar(TimeEnd)>0)
+
  timefilter<-paste(" where datecollected>'",TimeStart,"' and datecollected<'",TimeEnd,"'",sep="");
+
 
+
queryCache <- paste("select drs.resource_id, count(distinct position_id) as allcount from obis.drs", timefilter, " group by drs.resource_id",sep="")
+
cat("Resources extraction query:",queryCache,"\n")
+
 
+
allresfile="allresources.dat"
+
if (file.exists(allresfile)){
+
  load(allresfile)
+
} else{
+
  #allresources1<-dbGetQuery(con,queryCache)
+
  ######QUERY 0 - REST CALL
+
  cat("Q0:querying for resources\n")
+
 
+
  getJsonQ0<-function(limit,offset){
+
    cat("Q0: offset",offset,"limit",limit,"\n")
+
    resources_query<-paste("http://api.iobis.org/resource?limit=",limit,"&offset=",offset,sep="")
+
   
+
    json_file <- fromJSON(resources_query)
+
   
+
    #res_count<-json_file$count
+
    res_count<-length(json_file$results)
+
    res_count_json<<-json_file$count
+
    cat("Q0:json count vs count",res_count_json,"vs",res_count,"\n",sep=" ")
+
   
+
    allresources1 <- data.frame(resource_id=integer(),allcount=integer())
+
   
+
    for (i in 1:res_count){
+
      #cat(i,"\n")
+
      if (is.null(json_file$results[[i]]$record_cnt))
+
        json_file$results[[i]]$record_cnt=0
+
      row<-data.frame(resource_id = json_file$results[[i]]$id, allcount = json_file$results[[i]]$record_cnt)
+
      allresources1 <- rbind(allresources1, row)
+
    }
+
    rm(json_file)
+
    return(allresources1)
+
  }
+
  objects = 1000
+
  allresources1<-getJsonQ0(objects,0)
+
  ceil<-ceiling(res_count_json/objects)
+
  if (ceil>1){
+
    for (i in 2:ceil){
+
      cat(">call n.",i,"\n")
+
      allresources1.1<-getJsonQ0(objects,objects*(i-1))
+
      allresources1<-rbind(allresources1,allresources1.1)
+
    }
+
  }
+
  ######END REST CALL
+
  save(allresources1,file=allresfile)
+
}
+
 
+
 
+
cat("All resources saved\n")
+
 
+
files<-vector()
+
f<-0
+
if (!file.exists("./data"))
+
  dir.create("./data")
+
 
+
cat("About to analyse species\n")
+
 
+
for (sp in specieslist$scientificname){
+
  f<-f+1
+
  t1<-Sys.time()
+
  graphics.off()
+
  grid=matrix(data=0,nrow=n,ncol=m)
+
  gridInfo=matrix(data="",nrow=n,ncol=m)
+
  outputfileAbs=paste("data/Absences_",sp,"_",res,"deg.csv",sep="");
+
  outputimage=paste("data/Absences_",sp,"_",res,"deg.png",sep="");
+
 
+
  counter=counter+1;
+
  cat("analyzing species",sp,"\n")
+
  cat("***Species status",counter,"of",overall,"\n")
+
 
+
  ## first query: select the species
+
  cat("Extraction the species id from the OBIS database...\n")
+
  query1<-paste("select id from obis.tnames where tname='",sp,"'", sep="")
+
  #obis_id<- dbGetQuery(con,query1)
+
 
+
  ######QUERY 1 - REST CALL
+
  cat("Q1:querying for the species",sp," \n")
+
  query1<-paste("http://api.iobis.org/taxa?scientificname=",URLencode(sp),sep="")
+
  cat("Q1:query: ",query1," \n")
+
  result_from_httr1<-GET(query1, timeout(1*3600))
+
  json_obis_taxa_id <- fromJSON(content(result_from_httr1, as="text"))
+
 
+
  #json_obis_taxa_id <- fromJSON(query1)
+
  cat("Q1:query done\n")
+
  res_count_json<-json_obis_taxa_id$count
+
  res_count<-length(json_obis_taxa_id$results)
+
  cat("Q1:json count vs count",res_count_json,"vs",res_count,"\n",sep=" ")
+
  obis_id<-json_obis_taxa_id$results[[1]]$id
+
  obis_id<-data.frame(id=obis_id)
+
  ######END REST CALL
+
 
+
  cat("The ID extracted is ", obis_id$id, "for the species", sp, "\n", sep=" ")
+
  if (nrow(obis_id)==0) {
+
    cat("WARNING: there is no reference code for", sp,"\n")
+
    next;
+
  }
+
 
+
  ## second query: select the contributors
+
  cat("Selection of the contributors in the database having recorded the species...\n")
+
  query2<- paste("select distinct resource_id from obis.drs where valid_id='",obis_id$id,"'", sep="")
+
  #posresource<-dbGetQuery(con,query2)
+
 
+
  ######QUERY 2 - REST CALL
+
  cat("Q2:querying for obisid ",obis_id$id," \n")
+
 
+
 
+
    downlq<-paste("http://api.iobis.org/occurrence/download?obisid=",obis_id$id,"&sync=true",sep="")
+
    cat("Q2:query",downlq," \n")
+
 
+
    filezip<-paste("sp_",obis_id$id,".zip",sep="")
+
    dirzip<-paste("./sp_",obis_id$id,sep="")
+
    download.file(downlq, filezip, method="wget", quiet = F, mode = "w",cacheOK = FALSE)
+
    cat("Q2:dirzip",dirzip," \n")
+
 
+
    if (!file.exists(dirzip))
+
dir.create(dirzip)
+
    cat("Q2:unzipping",dirzip," \n")
+
    unzip(filezip,exdir=dirzip)
+
 
+
    csvfile<-dir(dirzip)
+
    csvfile<-paste(dirzip,"/",csvfile[1],sep="")
+
    cat("Q2:reading csv file",csvfile," \n")
+
    occurrences<-read.csv(csvfile)
+
    posresource<-sqldf("select resource_id from occurrences",drv="SQLite")
+
    tgtresources1<-sqldf("select resource_id, latitude || ';' || longitude as tgtcount from occurrences",drv="SQLite")
+
    posresource<-sqldf("select distinct * from posresource",drv="SQLite")
+
    rm(occurrences)
+
  ######END REST CALL
+
 
+
  if (nrow(posresource)==0) {
+
    cat("WARNING: there are no resources for", sp,"\n")
+
    next;
+
  }
+
 
+
 
+
  ## third query: select from the contributors different observations
+
  merge(allresources1, posresource, by="resource_id")-> res_ids
+
 
+
  ## forth query: how many obs are contained in each contributors for the species
+
  cat("Extraction from the different contributors the number of obs for the species...\n")
+
  query4 <- paste("select drs.resource_id, count(distinct position_id) as tgtcount from obis.drs where valid_id='",obis_id$id,"'group by drs.resource_id ",sep="")
+
  #tgtresources1<-dbGetQuery(con,query4)
+
 
+
  ######QUERY 4 - REST CALL
+
  cat("Q4:extracting obs from contributors ",obis_id$id," \n")
+
  getJsonQ4<-function(limit, offset){
+
    cat("Q4: offset",offset,"limit",limit,"\n")
+
    query4<-paste("http://api.iobis.org/occurrence?obisid=",obis_id$id,"&limit=",limit,"&offset=",offset,sep="")
+
    result_from_httr<-GET(query4, timeout(1*3600))
+
    jsonDoc <- fromJSON(content(result_from_httr, as="text"))
+
    res_count_json<<-jsonDoc$count
+
    res_count<-length(jsonDoc$results)
+
    cat("Q4:json count vs count",res_count_json,"vs",res_count,"\n",sep=" ")
+
   
+
    tgtresources1 <- data.frame(resource_id=integer(),tgtcount=character())
+
    res_count<-length(jsonDoc$results)
+
    for (i in 1:res_count){
+
      positionID<-pos_id(jsonDoc$results[[i]]$decimalLatitude,jsonDoc$results[[i]]$decimalLongitude)
+
      row<-data.frame(resource_id = jsonDoc$results[[i]]$resourceID , tgtcount=positionID)
+
      tgtresources1 <- rbind(tgtresources1, row)
+
    }
+
    #tgtresources1<-sqldf("select resource_id, count(distinct tgtcount) as tgtcount from tgtresources1 group by resource_id",drv="SQLite")
+
   
+
    return(tgtresources1)
+
  }
+
 
+
  #objects = 1500
+
  #tgtresources1<-getJsonQ4(objects,0)
+
  #ceil<-ceiling(res_count_json/objects)
+
  #if (ceil>1){
+
    #for (i in 2:ceil){
+
    # cat(">call n.",i,"\n")
+
      #tgtresources1.1<-getJsonQ4(objects,objects*(i-1))
+
      #tgtresources1<-rbind(tgtresources1,tgtresources1.1)
+
    #}
+
  #}
+
 
+
  tgtresources1<-sqldf("select resource_id, count(distinct tgtcount) as tgtcount from tgtresources1 group by resource_id",drv="SQLite")
+
 
+
  ######END REST CALL
+
 
+
 
+
  merge(tgtresources1, posresource, by="resource_id")-> tgtresourcesSpecies
+
 
+
  ## fifth query: select contributors that has al least 0.1 observation of the species
+
  #### we have the table all together: contributors, obs in each contributors for at leat one species and obs of the species in each contributors
+
  cat("Extracting the contributors containing more than 10% of observations for the species\n")
+
  cat("Selected occurrence percentage: ",occ_percentage,"\n")
+
 
+
  tmp <- merge(res_ids, tgtresourcesSpecies, by= "resource_id",all.x=T)
+
  tmp["species_10"] <- NA
+
  as.numeric(tmp$tgtcount) / tmp$allcount -> tmp$species_10
+
 
+
 
+
 
+
  viable_res_ids <- subset(tmp,species_10 >= occ_percentage, select=c("resource_id","allcount","tgtcount", "species_10"))
+
  #cat(viable_res_ids)
+
 
+
  if (nrow(viable_res_ids)==0) {
+
    cat("WARNING: there are no viable points for", sp,"\n")
+
    next;
+
  }
+
 
+
  numericselres<-paste("'",paste(as.character(as.numeric(t(viable_res_ids["resource_id"]))),collapse="','"),"'",sep="")
+
  selresnumbers<-as.numeric(t(viable_res_ids["resource_id"]))
+
 
+
  ## sixth query: select all the cell at 0.1 degrees resolution in the main contributors
+
  cat("Select the cells at 0.1 degrees resolution for the main contributors\n")
+
  query6 <- paste("select position_id, positions.latitude, positions.longitude, count(*) as allcount ",
+
                  "from obis.drs ",
+
                  "inner join obis.tnames on drs.valid_id=tnames.id ",
+
                  "inner join obis.positions on position_id=positions.id ",
+
                  "where resource_id in (", numericselres,") ",
+
                  "group by position_id, positions.latitude, positions.longitude, resource_id")
+
  #all_cells <- dbGetQuery(con,query6)
+
 
+
 
+
  ######QUERY 6 - REST CALL
+
  cat("Q6:extracting 0.1 cells from contributors \n")
+
 
+
    downlq<-paste("http://api.iobis.org/occurrence/download?resourceid=",gsub("'", "", numericselres),"&sync=true",sep="")
+
    cat("Q6:query",downlq," \n")
+
    filezip<-paste("rsp_",obis_id$id,".zip",sep="")
+
    dirzip<-paste("./rsp_",obis_id$id,sep="")
+
    download.file(downlq, filezip, method="wget", quiet = F, mode = "w",cacheOK = FALSE)
+
    cat("Q6:dirzip",dirzip," \n")
+
 
+
    if (!file.exists(dirzip))
+
dir.create(dirzip)
+
    cat("Q6:unzipping",dirzip," \n")
+
    unzip(filezip,exdir=dirzip)
+
 
+
    csvfile<-dir(dirzip)
+
    csvfile<-paste(dirzip,"/",csvfile[1],sep="")
+
    cat("Q6:reading csv file",csvfile," \n")
+
    occurrences<-read.csv(csvfile)
+
 
+
    all_cells_table<-sqldf("select resource_id, latitude || ';' || longitude as position, latitude ,longitude from occurrences",drv="SQLite")
+
    rm(occurrences)
+
  getJsonQ6<-function(limit,offset,selres){
+
    cat("Q6: offset",offset,"limit",limit,"\n")
+
    cat("Q6: resource",selres,"\n")
+
    #query6<-paste("http://api.iobis.org/occurrence?resourceid=",gsub("'", "", numericselres),"&limit=",limit,"&offset=",offset,sep="")
+
    if (offset>0)
+
      query6<-paste("http://api.iobis.org/occurrence?resourceid=",gsub("'", "", selres),"&limit=",limit,"&skipid=",offset,sep="")
+
    else
+
      query6<-paste("http://api.iobis.org/occurrence?resourceid=",gsub("'", "", selres),"&limit=",limit,sep="")
+
   
+
    cat("Q6:",query6," \n")
+
   
+
   
+
    jsonDoc = tryCatch({
+
      result_from_httr<-GET(query6, timeout(1*3600))
+
      cat("Q6: got answer\n")
+
      jsonDoc <- fromJSON(content(result_from_httr, as="text"))
+
    }, warning = function(w) {
+
      cat("Warning: ",w,"\n")
+
    }, error = function(e) {
+
      cat("Error: Too small value for resolution for this species - the solution spaceis too large!\n")
+
    }, finally = {
+
      jsonDoc=NA
+
    })
+
   
+
   
+
   
+
    res_count_json<<-jsonDoc$count
+
    res_count<-length(jsonDoc$results)
+
    cat("Q6:json count vs count",res_count_json,"vs",res_count,"\n",sep=" ")
+
   
+
    all_cells2 <- data.frame(resource_id=integer(),position_id=character(),latitude=integer(),longitude=integer())
+
    for (i in 1:res_count){
+
      positionID<-pos_id(jsonDoc$results[[i]]$decimalLatitude,jsonDoc$results[[i]]$decimalLongitude)
+
      row<-data.frame(resource_id = jsonDoc$results[[i]]$resourceID, position_id = positionID, latitude=jsonDoc$results[[i]]$decimalLatitude, longitude=jsonDoc$results[[i]]$decimalLongitude)
+
      all_cells2 <- rbind(all_cells2, row)
+
    }
+
    lastid<<-jsonDoc$results[[res_count]]$id
+
    return(all_cells2)
+
  }
+
 
+
  cat("All resources:",numericselres,"\n")
+
 
+
  all_cells<-sqldf("select position as position_id, latitude, longitude, count(*) as allcount from all_cells_table group by position, latitude, longitude, resource_id",drv="SQLite")
+
 
+
  ######END REST CALL
+
 
+
 
+
 
+
  ## seventh query:  select all the cell at 0.1 degrees resolution in the main contributors for the selected species
+
  cat("Select the cells at 0.1 degrees resolution for the species in the main contributors\n")
+
  query7 <- paste("select position_id, positions.latitude, positions.longitude, count(*) as tgtcount ",
+
                  "from obis.drs",
+
                  "inner join obis.tnames on drs.valid_id=tnames.id ",
+
                  "inner join obis.positions on position_id=positions.id ",
+
                  "where resource_id in (", numericselres,") ",
+
                  "and drs.valid_id='",obis_id$id,"'",
+
                  "group by position_id, positions.latitude, positions.longitude")
+
  #presence_cells<-dbGetQuery(con,query7)
+
 
+
  ######QUERY 7 - REST CALL
+
  cat("Q7:extracting 0.1 cells for the species ",obis_id$id,"\n")
+
 
+
    downlq<-paste("http://api.iobis.org/occurrence/download?resourceid=",gsub("'", "", numericselres),"&obisid=",obis_id$id,"&sync=true",sep="")
+
    cat("Q7:query",downlq," \n")
+
    filezip<-paste("rspsp_",obis_id$id,".zip",sep="")
+
    dirzip<-paste("./rspsp_",obis_id$id,sep="")
+
    download.file(downlq, filezip, method="wget", quiet = F, mode = "w",cacheOK = FALSE)
+
    cat("Q7:dirzip",dirzip," \n")
+
 
+
    if (!file.exists(dirzip))
+
dir.create(dirzip)
+
    cat("Q7:unzipping",dirzip," \n")
+
    unzip(filezip,exdir=dirzip)
+
 
+
    csvfile<-dir(dirzip)
+
    csvfile<-paste(dirzip,"/",csvfile[1],sep="")
+
    cat("Q7:reading csv file",csvfile," \n")
+
    occurrences<-read.csv(csvfile)
+
 
+
    presence_cells2<-sqldf("select resource_id, latitude ,longitude, latitude || ';' || longitude as position from occurrences",drv="SQLite")
+
    rm(occurrences)
+
  getJsonQ7<-function(limit,offset){
+
    cat("Q7: offset",offset,"limit",limit,"\n")
+
    if (offset>0)
+
      query7<-paste("http://api.iobis.org/occurrence?resourceid=",gsub("'", "", numericselres),"&obisid=",obis_id$id,"&limit=",limit,sep="")
+
    else query7<-paste("http://api.iobis.org/occurrence?resourceid=",gsub("'", "", numericselres),"&obisid=",obis_id$id,"&limit=",limit,"&skipid=",offset,sep="")
+
   
+
    result_from_httr<-GET(query7, timeout(1*3600))
+
    jsonDoc <- fromJSON(content(result_from_httr, as="text"))
+
    res_count_json<<-jsonDoc$count
+
    res_count<-length(jsonDoc$results)
+
    cat("Q7:json count vs count",res_count_json,"vs",res_count,"\n",sep=" ")
+
   
+
    presence_cells2 <- data.frame(resource_id=integer(),position_id=character(),latitude=integer(),longitude=integer())
+
    for (i in 1:res_count){
+
     
+
      positionID<-pos_id(jsonDoc$results[[i]]$decimalLatitude,jsonDoc$results[[i]]$decimalLongitude)
+
      row<-data.frame(resource_id = jsonDoc$results[[i]]$resourceID, position_id = positionID, latitude=jsonDoc$results[[i]]$decimalLatitude, longitude=jsonDoc$results[[i]]$decimalLongitude)
+
      presence_cells2 <- rbind(presence_cells2, row)
+
    }
+
   
+
    lastid<<-jsonDoc$results[[res_count]]$id
+
   
+
    return(presence_cells2)
+
  }
+
 
+
 
+
  presence_cells<-sqldf("select position as position_id, latitude, longitude, count(*) as tgtcount from presence_cells2 group by position_id, latitude, longitude, resource_id",drv="SQLite")
+
 
+
  ######END REST CALL
+
 
+
  ## last query: for every cell in the sixth query if there is a correspondent in the seventh query I can put 1 otherwise 0
+
  #data.df<-merge(all_cells, presence_cells, by= "position_id",all.x=T)
+
  #data.df$longitude.y<-NULL
+
  #data.df$latitude.y<-NULL
+
  #data.df[is.na(data.df)] <- 0
+
 
+
  ######### Table resulting from the analysis
+
  #pres_abs_cells <- subset(data.df,select=c("latitude.x","longitude.x", "tgtcount","position_id"))
+
  #positions<-paste("'",paste(as.character(as.numeric(t(pres_abs_cells["position_id"]))),collapse="','"),"'",sep="")
+
  positions<-""
+
  query8<-paste("select position_id, resfullname,digirname,abstract,temporalscope,date_last_harvested",
+
                "from ((select distinct position_id,resource_id from obis.drs where position_id IN (", positions,
+
                ") order by position_id ) as a",
+
                "inner join (select id,resfullname,digirname,abstract,temporalscope,date_last_harvested from obis.resources where id in (",
+
                numericselres,")) as b on b.id = a.resource_id) as d")
+
 
+
  #resnames<-dbGetQuery(con,query8)
+
 
+
  ######QUERY 8 - REST CALL
+
  cat("Q8:extracting contributors details\n")
+
  data.df2<-merge(all_cells, presence_cells, by= "position_id",all.x=T)
+
  data.df2$longitude.y<-NULL
+
  data.df2$latitude.y<-NULL
+
  data.df2[is.na(data.df2)] <- 0
+
  rm (all_cells)
+
  pres_abs_cells2 <- subset(data.df2,select=c("latitude.x","longitude.x", "tgtcount","position_id"))
+
  positions2<-paste("'",paste(as.character(as.character(t(pres_abs_cells2["position_id"]))),collapse="','"),"'",sep="")
+
 
+
  refofpositions<-sqldf(paste("select distinct resource_id from all_cells_table where position in (",positions2,")"),drv="SQLite")
+
  referencesn<-nrow(refofpositions)
+
  resnames_res2 <- data.frame(resource_id=integer(),resfullname=character(),digirname=character(),abstract=character(),temporalscope=character(),date_last_harvested=character())
+
  for (i in 1: referencesn){
+
    query8<-paste("http://api.iobis.org/resource/",refofpositions[i,1],sep="")
+
    result_from_httr<-GET(query8, timeout(1*3600))
+
    jsonDoc <- fromJSON(content(result_from_httr, as="text"))
+
   
+
    daterecord<-as.POSIXct(jsonDoc$date_last_harvested/1000, origin="1970-01-01")#origin="1970-01-01")
+
    if (length(daterecord)==0)
+
      daterecord=""
+
    abstractst<-jsonDoc$abstract_str
+
 
+
    if (length(jsonDoc$abstract_str)==0)
+
      jsonDoc$abstract_str=""
+
   
+
    if (length(jsonDoc$id)==0)
+
      jsonDoc$id=""
+
 
+
    if (length(jsonDoc$fullname)==0)
+
      jsonDoc$fullname=""
+
 
+
    if (length(jsonDoc$temporalscope)==0)
+
      jsonDoc$temporalscope=""
+
 
+
       
+
    row<-data.frame(resource_id = jsonDoc$id, resfullname=jsonDoc$fullname, digirname=jsonDoc$digirname, abstract=jsonDoc$abstract_str,temporalscope=jsonDoc$temporalscope,date_last_harvested=daterecord)
+
   
+
    resnames_res2 <- rbind(resnames_res2, row)
+
  }
+
 
+
  resnames2<-sqldf(paste("select distinct position as position_id, resfullname, digirname, abstract, temporalscope, date_last_harvested from (select * from all_cells_table where position in (",positions2,")) as a inner join resnames_res2 as b on a.resource_id=b.resource_id"),drv="SQLite")
+
  resnames<-sqldf("select * from resnames2 order by position_id",drv="SQLite")
+
  pres_abs_cells<-sqldf("select * from pres_abs_cells2 order by position_id",drv="SQLite")
+
  rm(all_cells_table)
+
  ######END REST CALL
+
 
+
  #sorting data df
+
  #  pres_abs_cells<-pres_abs_cells[with(pres_abs_cells, order(position_id)), ]
+
  nrows = nrow(pres_abs_cells)
+
  ######## FIRST Loop inside the rows of the dataset
+
  cat("Looping on the data\n")
+
  for(i in 1: nrows) {
+
    lat<-pres_abs_cells[i,1]
+
    long<-pres_abs_cells[i,2]
+
    value<-pres_abs_cells[i,3]
+
    resource_name<-paste("\"",paste(as.character(t(resnames[i,])),collapse="\",\""),"\"",sep="")#resnames[i,2]
+
    k=round((lat+90)*n/180)
+
    g=round((long+180)*m/360)
+
    if (k==0) k=1;
+
    if (g==0) g=1;
+
    if (k>n || g>m)
+
      next;
+
    if (value>=1){
+
      if (grid[k,g]==0){
+
        grid[k,g]=1
+
        gridInfo[k,g]=resource_name
+
      }
+
      else if (grid[k,g]==-1){
+
        grid[k,g]=-2
+
        gridInfo[k,g]=resource_name
+
      }
+
    }
+
    else if (value==0){
+
      if (grid[k,g]==0){
+
        grid[k,g]=-1
+
        #cat("resource abs",resource_name,"\n")
+
        gridInfo[k,g]=resource_name
+
      }
+
      else if (grid[k,g]==1){
+
        grid[k,g]=-2
+
        gridInfo[k,g]=resource_name
+
      }
+
     
+
    }
+
  }
+
  cat("End looping\n")
+
 
+
  cat("Generating image\n")
+
  absence_cells<-which(grid==-1,arr.ind=TRUE)
+
  presence_cells_idx<-which(grid==1,arr.ind=TRUE)
+
  latAbs<-((absence_cells[,1]*180)/n)-90
+
  longAbs<-((absence_cells[,2]*360)/m)-180
+
  latPres<-((presence_cells_idx[,1]*180)/n)-90
+
  longPres<-((presence_cells_idx[,2]*360)/m)-180
+
  resource_abs<-gridInfo[absence_cells]
+
  rm(gridInfo)
+
  rm(grid)
+
  absPoints <- cbind(longAbs, latAbs)
+
  absPointsData <- cbind(longAbs, latAbs,resource_abs)
+
 
+
  if (length(absPoints)==0)
+
  {
+
    cat("WARNING no viable point found for ",sp," after processing!\n")
+
    next;
+
  }
+
  data(wrld_simpl)
+
  projection(wrld_simpl) <- CRS("+proj=longlat")
+
  png(filename=outputimage, width=1200, height=600)
+
  plot(wrld_simpl, xlim=c(-180, 180), ylim=c(-90, 90), axes=TRUE, col="black")
+
  box()
+
  pts <- SpatialPoints(absPoints,proj4string=CRS(proj4string(wrld_simpl)))
+
 
+
  ## Find which points do not fall over land
+
  cat("Retreiving the poing that do not fall on land\n")
+
  pts<-pts[which(is.na(over(pts, wrld_simpl)$FIPS))]
+
  points(pts, col="green", pch=1, cex=0.50)
+
  datapts<-as.data.frame(pts)
+
  colnames(datapts) <- c("longAbs","latAbs")
+
 
+
  abspointstable<-merge(datapts, absPointsData, by.x= c("longAbs","latAbs"), by.y=c("longAbs","latAbs"),all.x=F)
+
 
+
 
+
  header<-"longitude,latitude,resource_id,resource_name,resource_identifier,resource_abstract,resource_temporalscope,resource_last_harvested_date"
+
  write.table(header,file=outputfileAbs,append=F,row.names=F,quote=F,col.names=F)
+
 
+
  write.table(abspointstable,file=outputfileAbs,append=T,row.names=F,quote=F,col.names=F,sep=",")
+
  files[f]<-outputfileAbs
+
  cat("Elapsed:  created imaged in ",Sys.time()-t1," sec \n")
+
  graphics.off()
+
}
+
 
+
# wps.out: id = zipOutput, type = text/zip, title = zip file containing absence records and images;
+
zipOutput<-"absences.zip"
+
zip(zipOutput, files=c("./data"), flags= "-r9X", extras = "",zip = Sys.getenv("R_ZIPCMD", "zip"))
+
 
+
cat("Closing database connection")
+
cat("Elapsed:  overall process finished in ",Sys.time()-t0," min \n")
+
#dbDisconnect(con)
+
graphics.off()
+
 
+
</pre>
+
[[File:AbsencesSpeciesList_prod_annotated.zip|AbsencesSpeciesList_prod_annotated.zip]]
+
  
 +
:Please, read our best practices: [[Statistical Algorithms Importer: FAQ|F.A.Q.]]
  
:The following screenshot report the result of importing this script into SAI:
+
== Installed Software ==
 +
:A list of pre-installed software on the infrastructure machines is available at this page:
 +
* [[Pre Installed Packages|Pre Installed Packages]]
  
[[Image:StatisticalAlgorithmsImporter_AbsenceSpecies_Annotations_Info.png|thumb|center|800px|Annotations Project Info, SAI]]
 
[[Image:StatisticalAlgorithmsImporter_AbsenceSpecies_Annotations_InputOutput.png|thumb|center|800px|Annotations Input/Output, SAI]]
 
  
 
<!--
 
<!--

Revision as of 11:07, 12 October 2018

This page explains how to create a project using Statistical Algorithms Importer (SAI) portlet.


SAI Project Type

Create Project, SAI
The fist step is to select the project type. Then, using the Create Project button in the main menu SAI allows to create different project type:
Please, read our best practices: F.A.Q.

Installed Software

A list of pre-installed software on the infrastructure machines is available at this page: