Statistical Algorithms Importer: R Project FAQ

From Gcube Wiki
Jump to: navigation, search

F.A.Q. of Statistical Algorithms Importer (SAI), here are common mistakes we have found in R Project.

Best Practices to debug the code

In order to debug a code that is not working as supposed to do, the following approach can be useful:

Avoid switching of the working directory in the code, because this makes the code prone to errors, especially for the services that need to work on the output of the process. In order to understand what's happening in the process, to do the following:

1 - add a cat() instructions all over the code, e.g. log the full path of the produced file

2 - log a check of existence for the file in the initial working directory For example, add lines like these at the end of the code:

#check if the output exists before exiting
fexists = file.exists(paste(output_file,sep=""))
cat("file exists?",fexists,"\n")
if (!fexists){
  cat(fexists,"Error, the output does not exist!\n")

3 - add an erroneous command at the end of the code to force the generation of an error at the point you want to investigate

4 - repackage the code and then download and read the logs after the execution of the algorithm

As a general rule, you should generate an error if the file was not produced by the algorithm due to some error in the execution.

Options and Warning Messages

Do not change the default options. Do not use this command to sets the handling of warning messages:

options(warn=2) # Warnings will be considered as errors

The previous command could produce errors on DataMiner.

In some cases, an algorithm works in R Studio but did not work via SAI

This kind of issue is usually related to the production of the output files:

  • The file was produced in a subfolder, but is was declared to be in the root folder. E.g. the file was produced in the ./data folder by the process, but in SAI the variable referring to the output was declared as
Thus with no ./data indicated in the file name
  • A forced switch of the working folder was done inside the code, which mislead the service about the produced file. E.g.:
switch of the working folder inside the script should be generally avoided.
  • A process tried to overwrite another file that had already been produced on the processing machine, but which was corrupted due to an update of the machine. This conflicted with the newly generated files.
Generally, files with new names should be generated by a script that is being transformed into a web service. Generating output files with new names prevents errors due to several concurrent requests creating the same files, when the requests are managed by the same machine.
For example, instead of declaring
The timestamp should be added to the generated file:
output_file<-gsub(" ", "_", output_file)
output_file<-gsub(":", "_", output_file)

An algorithm does not receive input from the interface

DataMiner searches in the code for the declared default value and then it substitutes this with the user provides through the interface.

This means that the default value in the code should correspond to the one declared in the annotations (and thus displayed in the Input/Output window) and vice versa. For example, if starting_point_latitude has -7.931 as declared default value, then DataMiner searches in the code for one of the following lines:
starting_point_latitude <- "-7.931"
starting_point_latitude = "-7.931"
whereas, in the case of numeric variables, it searches for
starting_point_latitude <- -7.931
starting_point_latitude = -7.931
Thus, if the initialization in the code is
starting_point_latitude <- "-13.548"
then DataMiner cannot find the default value to change. In other words, the default values in the code should correspond to the ones declared.
We use this approach since SAI could be theoretically applied also to other programming languages than R, thus we do not rely on the R interpreter behind the scenes but on strings substitution using regular expressions.

Managing Enumerated Types - Creating drop-down menus

In order to create drop-down menus from SAI, containing enumerated choices, the screenshot show the process:

Enumerated, SAI
  1. Declare a variable with a default value, e.g. enumerated<-"a"
  2. Indicate this variable as an input of Enumerated type and add the other possible values, separated by the | symbol: a|b|c
  3. The first choice should be the default value indicated in the code

Managing Boolean values

A Boolean variable can be managed by SAI, but this requires a trick to make R properly communicate with Java and vice-versa. In fact, R has many ways to declare boolean variables. The screenshot shows how to use and declare a Boolean variable when integrating an algorithm, i.e.:

Boolean, SAI
if removeZero is the boolean variable, then these lines help Java modifying the R code:
Thus the default value of the variable will be false.
Further, the Boolean variable should be declared as a Boolean input with default value false (or true), written in lower case.
This declaration will generate a Boolean choice on the user interface.

Saving plots

Save a PNG image, example:

# Graph Name
# Define the cars vector with 5 values
cars <- c(1, 3, 6, 4, 9)
# Output File
# PNG Image
png(file=plot_file, width=1400, height=800)
# Graph cars using blue points overlayed by a line
plot(cars, type="o", col="blue")
# Create a title with a red, bold/italic font
title(main=graphName, col.main="red", font.main=4)

In some cases, invoke dev.copy BEFORE plot instructions and end with, example:

plot(c(0, 1), c(0, 1), ann = F, bty = 'n', type = 'n', xaxt = 'n', yaxt = 'n')