Statistical Algorithms Importer: FAQ
F.A.Q. of Statistical Algorithms Importer (SAI), here are common mistakes we have found.
In some cases, an algorithm worked in R Studio but did not work via SAI
This kind of issue is usually related to the production of the output files:
- The file was produced in a subfolder, but is was declared to be in the root folder. E.g. the file output.zip was produced in the ./data folder by the process, but in SAI the variable referring to the output was declared as
output<-"output.zip"
- Thus with no ./data indicated in the file name
- A forced switch of the working folder was done inside the code, which mislead the service about the produced file. E.g.:
output<-"output.zip" setwd("./data") save(output)
- switch of the working folder inside the script should be generally avoided.
- A process tried to overwrite another file that had already been produced on the processing machine, but which was corrupted due to an update of the machine. This conflicted with the newly generated files.
- Generally, files with new names should be generated by a script that is being transformed into a web service. Generating output files with new names prevents errors due to several concurrent requests creating the same files, when the requests are managed by the same machine.
- For example, instead of declaring
zip_namefile <- "data_frame_result_query.zip"
- The timestamp should be added to the generated file:
zip_namefile_random <- paste("data_frame_result_query_",Sys.time(),".zip",sep="") zip_namefile <- zip_namefile_random
An algorithm does not receive input from the interface
DataMiner searches in the code for the declared default value and then it substitutes this with the user provides through the interface.
- This means that the default value in the code should correspond to the one declared in the annotations (and thus displayed in the Input/Output window) and vice versa. For example, if starting_point_latitude has -7.931 as declared default value, then DataMiner searches in the code for one of the following lines:
starting_point_latitude <- "-7.931" starting_point_latitude = "-7.931"
- whereas, in the case of numeric variables, it searches for
starting_point_latitude <- -7.931 starting_point_latitude = -7.931
- Thus, if the initialization in the code is
starting_point_latitude <- "-13.548"
- then DataMiner cannot find the default value to change. In other words, the default values in the code should correspond to the ones declared.
- We use this approach since SAI could be theoretically applied also to other programming languages than R, thus we do not rely on the R interpreter behind the scenes but on strings substitution using regular expressions.
In order to create drop-down menus from SAI, containing enumerated choices, the screenshot show the process:
- Declare a variable with a default value, e.g. enumerated<-"a"
- Indicate this variable as an input of Enumerated type and add the other possible values, separated by the | symbol: a|b|c
- The first choice should be the default value indicated in the code
Managing Boolean values
A Boolean variable can be managed by SAI, but this requires a trick to make R properly communicate with Java and vice-versa. In fact, R has many ways to declare boolean variables. The screenshot shows how to use and declare a Boolean variable when integrating an algorithm, i.e.:
- if removeZero is the boolean variable, then these lines help Java modifying the R code:
false<-F true<-T removeZero<-false
- Thus the default value of the variable will be false.
- Further, the Boolean variable should be declared as a Boolean input with default value false (or true), written in lower case.
- This declaration will generate a Boolean choice on the user interface.
Best Practices to debug the code
In order to debug a code that is not working as supposed to do, the following approach can be useful:
Avoid switching of the working directory in the code, because this makes the code prone to errors, especially for the services that need to work on the output of the process. In order to understand what's happening in the process, to do the following:
1 - add a cat() instructions all over the code, e.g. log the full path of the produced file 2 - log a check of existence for the file in the initial working directory 2 - add an erroneous command at the end of the code to force the generation of an error at the point you want to investigate 3 - repackage the code and then download and read the logs after the execution of the algorithm
As a general rule, you should generate an error if the file was not produced by the algorithm due to some error in the execution.