Install and Configure WPS-Hadoop

From Gcube Wiki
Revision as of 16:28, 26 May 2014 by Fabrice.brito (Talk | contribs) (Create the directory structure)

Jump to: navigation, search


Build

The Project is avaiable here: https://svn.d4science.research-infrastructures.eu/gcube/trunk/data-analysis/wps-hadoop/

We suppose to checkout all in ~/wps-hadoop-source.

Tomcat Embedded Packaging

[{{host}} ~/wps-hadoop-source] mvn clean package -P release,linux-x86_64

After the build, the target/ directory will contain the wps-hadoop-Template:Version-tomcat-embedded.tar.gz. This is a compress folder of the WPS-Hadoop server within a pre-configured tomcat.

Processes Packaging

This is useful for processes update of a pre-installed WPS Hadoop tomcat embedded application.

[{{host}} ~/wps-hadoop-source] mvn clean package -P wps,linux-x86_64

After the build, the target/ directory will contain the wps-hadoop-Template:Version.jar to replace in the WPS Hadoop tomcat embedded lib directory.


Install

To install the Tomcat Embedded package, extract the .tar.gz and run the tomcat.

  [{{host}} ~]$ tar xzf wps-hadoop-0.1-SNAPSHOT-tomcat-embedded.tar.gz 
 
  [{{host}} ~]$ wps-hadoop-0.1-SNAPSHOT-tomcat-embedded/bin/startup.sh 
  Using LD_LIBRARY_PATH: /home/imarine-wp10/wps-hadoop-0.1-SNAPSHOT-tomcat-embedded/lib/natives
  Using CATALINA_BASE:   /home/imarine-wp10/wps-hadoop-0.1-SNAPSHOT-tomcat-embedded
  Using CATALINA_HOME:   /home/imarine-wp10/wps-hadoop-0.1-SNAPSHOT-tomcat-embedded
  Using CATALINA_TMPDIR: /home/imarine-wp10/wps-hadoop-0.1-SNAPSHOT-tomcat-embedded/temp
  Using JRE_HOME:        /usr
  Using CLASSPATH:       /home/imarine-wp10/wps-hadoop-0.1-SNAPSHOT-tomcat-embedded/bin/bootstrap.jar:/home/imarine-wp10/wps-hadoop-0.1-SNAPSHOT-tomcat-embedded/bin/tomcat-juli.jar


Getting started

In this section we present a full tutorial to integrate and configure the indicator_i1 process.

Create the application

We suppose to already have created, tested and configured (including install R libs) the R script which performs the indicator_i1 algorithm. Let’s create the application folder inside the hadoopApplications.

Note: in this tutorial we have a application work directory called hadoopApplications. Having an utility folder like this is useful to maintain/develope/test your applications (in R, bash or other); when an application is ready you can compress (jar) it and copy into hadoop hdfs folder.


Create the directory structure

More info about legacy applications structure here.

cd
mkdir -p hadoopApplications/ird/indicator_i1      # we choose to put it inside an ird folder group
cd hadoopApplications/ird/indicator_i1
mkdir -p application/indicator_i1/bin/ application/indicator_i1/lib

so we have:

tree
.
|-- application
    `-- indicator_i1
        |-- bin
        |-- lib

Copy and integrate the .R script

Note: with the header #!/usr/bin/Rscript you allow to run the script as executable (for smart call by the run bash script) Note: in the pre-processing statements some parameters are set, something from args, others by constants (year_attribute_name) Note: in the post-processing statements the results file are copied to the current directory (Sys.getenv("PWD"))


#!/usr/bin/Rscript --vanilla --slave
# Francesco Cerasuolo - Terradue
# pre-processing 
 
args <- commandArgs(TRUE)
wfsUrl <- args[1]
typeName <- args[2]
species <- args[3]
 
connection_type <- "remote"
data_type <- "WFS"
url <- wfsUrl
layer <- typeName
ogc_filter <- paste('<ogc:Filter xmlns:ogc="http://www.opengis.net/ogc" xmlns:gml="http://www.opengis.net/gml">
	<ogc:PropertyIsEqualTo><ogc:PropertyName>species</ogc:PropertyName><ogc:Literal>',
	species,
	'</ogc:Literal></ogc:PropertyIsEqualTo></ogc:Filter>',
	sep="")
 
year_attribute_name <- "year"
ocean_attribute_name <- "ocean"
species_attribute_name <- "species"
value_attribute_name <- "value"
 
 
#Norbert Billet - IRD
#2014/01/27: Norbert - Multi sources edit
#2013/08/30: Norbert - Initial edit
#Atlas_i1_SpeciesByOcean : build a graph of catches by ocean and by year
 
#52North WPS annotations
# wps.des: id = Atlas_i1_SpeciesByOcean, title = IRD tuna atlas indicator i1, abstract = Graph of species catches by ocean;
 
# wps.in: id = data_type, type = string, title = Data type (csv or WFS or MDSTServer), value = "WFS";
# wps.in: id = url, type = string, title = Data URL, value = "http://mdst-macroes.ird.fr:8080/constellation/WS/wfs/tuna_atlas";
# wps.in: id = layer, type = string, title = Data layer name, minOccurs = 0, maxOccurs = 1, value = "ns11:i1i2_mv";
# wps.in: id = mdst_query, type = string, title = MDSTServer query. Only used with MDSTServer data type, minOccurs = 0, maxOccurs = 1;
# wps.in: id = ogc_filter, type = string, title = OGC filter to apply on a WFS datasource. Only used with WFS data type, minOccurs = 0, maxOccurs = 1;
# wps.in: id = year_attribute_name, type = string, title = Year attribute name in the input dataset, value = "year";
# wps.in: id = ocean_attribute_name, type = string, title = Ocean attribute name in the input dataset, value = "ocean";
# wps.in: id = species_attribute_name, type = string, title = Species attribute name in the input dataset, value = "species";
# wps.in: id = value_attribute_name, type = string, title = Value attribute name in the input dataset, value = "value";
# wps.in: id = connection_type, type = string, title = Data connection type (local or remote), value = "remote";
 
# wps.out: id = result, type = string, title = List of result files path;
 
 
if(! require(IRDTunaAtlas)) {
  stop("Missing IRDTunaAtlas library")
}
 
df <- readData(connectionType=connection_type,
        dataType=data_type,
        url=url,
        layer=layer,
        MDSTQuery=mdst_query,
        ogcFilter=ogc_filter)
 
result <- Atlas_i1_SpeciesByOcean(df=df,
                        yearAttributeName=year_attribute_name,
                        oceanAttributeName=ocean_attribute_name,
                        speciesAttributeName=species_attribute_name,
                        valueAttributeName=value_attribute_name)
 
 
# Francesco Cerasuolo - Terradue
# post-processing                        
apply(result, 1, function(x) file.copy(x, paste(Sys.getenv("PWD"), basename(x), sep="/")))


Create the run bash script

[{{host}} indicator_i1]$ vi application/indicator_i1/run 
 
 
#!/bin/bash
# INDICATOR I1
 
SUCCESS=0
ERR_NOINPUT=18
ERR_NOOUTPUT=19
ERR_CURL=30
DEBUG_EXIT=66
 
function cleanExit ()
{
    local retval=$?
       local msg=""
       case "$retval" in
        $SUCCESS)
              msg="Processing successfully concluded";;
        $ERR_NOINPUT)
              msg="Unable to retrieve an input file";;
        $ERR_NOOUTPUT)
              msg="No output results";;
                $ERR_CURL)
              msg="curl failed to download the GML from $wfsUrl";;
        $DEBUG_EXIT)
              msg="Breaking at debug exit";;
        *)
                  msg="Unknown error";;
       esac
       [ "$retval" != 0 ] && echo "Error $retval - $msg, processing aborted" || echo "INFO - $msg" 
       exit "$retval"
}
# trap an exit signal to exit properly
trap cleanExit EXIT
 
# evaluating the applicationPath, to resolve environments variable
eval "appPath=\"$applicationPath\""
# evaluating the outputFilesPath (hdfs path)
eval "outFilesPath=\"$outputFilesPath\""
 
# R library path 
export R_LIBS_USER=/application/share/rlibrary/
 
export PATH=$appPath/bin:$PATH
chmod 755 $appPath/bin/*
 
# data input file info
inputDatafileName="inputData.txt"
 
# create and entering work directory 
mkdir -p ./work
cd work
 
#counter (used as key)
count=0
 
type_name=ns11:i1i2_mv
 
# iterate each input (each input is a row and it’s a species identifier)
while read species
do
	# for debug
        echo "INPUT: species=$species"
 
	# call the .R script
        Atlas_i1_SpeciesByOcean.R $wfsUrl $type_name $species
 
        # saving produced output files on the hdfs (subfolder: exec<count>)
        keyDir="exec$count"
        path="$outFilesPath$keyDir"
 
        # hdfs output directory
        hadoop fs -mkdir $path
 
        # create an input info file
        echo "species=$species" > $inputDatafileName
 
        # copy all files to the hdfs path
        hadoop fs -copyFromLocal ./* $path/
 
        # cleanup
        rm -f $inputDatafileName
 
        let "count += 1"
done

Create the application jar and put it into the HDFS

Compress the application folder in a .jar file

Note: you must maintain the jar name with the same name of the application folder name. Simply, from the indicator_i1 folder:


[{{host}} indicator_i1]$ jar cvf indicator_i1.jar application
added manifest
adding: application/(in = 0) (out= 0)(stored 0%)
adding: application/indicator_i1/(in = 0) (out= 0)(stored 0%)
adding: application/indicator_i1/run(in = 1871) (out= 918)(deflated 50%)
adding: application/indicator_i1/bin/(in = 0) (out= 0)(stored 0%)
adding: application/indicator_i1/bin/Atlas_i1_SpeciesByOcean.R(in = 3020) (out= 1095)(deflated 63%)
adding: application/indicator_i1/lib/(in = 0) (out= 0)(stored 0%)
 
 
[{{host}} indicator_i1]$ ll
total 16
drwxr-xr-x 3 imarine-wp10 ciop 4096 Apr 29 12:25 application
-rw-r--r-- 1 imarine-wp10 ciop 6322 Apr 29 14:06 indicator_i1.jar
[{{host}} indicator_i1]

Copy the jar into the HDFS

From the indicator_i1 folder:

# remove previous jar if present
hadoop fs -rm /algorithmRepository/indicator_i1.jar
# copy the jar
hadoop fs -copyFromLocal ./indicator_i1.jar /algorithmRepository/
# check if the jar is added
hadoop fs -ls /algorithmRepository
Found 2 items
-rw-r--r--   1 imarine-wp10 supergroup       2577 2014-04-28 17:26 /algorithmRepository/helloWorld.jar
-rw-r--r--   1 imarine-wp10 supergroup       6322 2014-04-29 14:21 /algorithmRepository/indicator_i1.jar

Now the application is stored into the hdfs repository and it can be called from the wps-hadoop web-app.


Create the process description xml file

We must create the process xml process description inside the ~/wps-hadoop-source/src/main/resources/com/terradue/wps_hadoop/processes/ directory. We choose to organise the xml into the subpath ird/indicator/. Note: It’s important to have the path aligned to the class package path. Note: The path+processName will be the process identifier.

[{{host}} ~]$ vi wps-hadoop-source/src/main/resources/com/terradue/wps_hadoop/processes/ird/indicator/IndicatorI1.xml
 
 
<?xml version="1.0" encoding="UTF-8"?>
<wps:ProcessDescriptions xmlns:wps="http://www.opengis.net/wps/1.0.0"
        xmlns:ows="http://www.opengis.net/ows/1.1" xmlns:xlink="http://www.w3.org/1999/xlink"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://www.opengis.net/wps/1.0.0 http://geoserver.itc.nl:8080/wps/schemas/wps/1.0.0/wpsDescribeProcess_response.xsd"
        xml:lang="en-US" service="WPS" version="1.0.0">
        <ProcessDescription wps:processVersion="1.0.0"
                storeSupported="true" statusSupported="false">
                <ows:Identifier>IndicatorI1</ows:Identifier>
                <ows:Title>IRD Tuna Atlas Indicator i1</ows:Title>
                <ows:Abstract>Graph of catches of a given species.</ows:Abstract>
                <ows:Metadata xlink:title="Biodiversity"/>
                <DataInputs>
                        <Input minOccurs="1" maxOccurs="2147483647">
                                <ows:Identifier>species</ows:Identifier>
                                <ows:Title>Species Names</ows:Title>
                                <ows:Abstract>Species Names</ows:Abstract>
                                <LiteralData>
                                        <ows:DataType ows:reference="xs:string"></ows:DataType>
                                        <ows:AllowedValues>
                                                <ows:Value>YFT</ows:Value>
                                                <ows:Value>SKJ</ows:Value>
                                                <ows:Value>BET</ows:Value>
                                                <ows:Value>ALB</ows:Value>
                                                <ows:Value>BFT</ows:Value>
                                                <ows:Value>SBF</ows:Value>
                                                <ows:Value>SFA</ows:Value>
                                                <ows:Value>BLM</ows:Value>
                                                <ows:Value>MLS</ows:Value>
                                                <ows:Value>BIL</ows:Value>
                                                <ows:Value>SWO</ows:Value>
                                                <ows:Value>SSP</ows:Value>
                                        </ows:AllowedValues>
                                </LiteralData>
                        </Input>
                        <Input minOccurs="0" maxOccurs="1">
                                <ows:Identifier>wfsUrl</ows:Identifier>
                                <ows:Title>WFS Url</ows:Title>
                                <ows:Abstract>WFS Url</ows:Abstract>
                                <LiteralData>
                                        <ows:DataType ows:reference="xs:string"></ows:DataType>
                                        <ows:AnyValue/>
                                </LiteralData>
                        </Input>
                </DataInputs>
                <ProcessOutputs>
                        <Output>
                                <ows:Identifier>result</ows:Identifier>
                                <ows:Title>result</ows:Title>
                                <ows:Abstract>result</ows:Abstract>
                                <ComplexOutput>
                                        <Default>
                                                <Format>
                                                        <MimeType>application/xml</MimeType>
                                                </Format>
                                        </Default>
                                        <Supported>
                                                <Format>
                                                        <MimeType>application/xml</MimeType>
                                                </Format>
                                        </Supported>
                                </ComplexOutput>
                        </Output>
                </ProcessOutputs>
        </ProcessDescription>
</wps:ProcessDescriptions>


Create the wps-hadoop process java class(es)

In this section we create the java class (or the classes, if need) to implement a wps process which can easy act on hadoop. It’s important to define well:

  1. which parameters are taken from the wps execute request (specified in the process description)
  2. how these parameters are processed (if we need to transform them)
  3. which parameters are passed to the hadoop streaming
  4. which parameters (from parameters taken in 3th) are set as fixed parameters
  5. which parameter (from parameters taken in 3th) is set as inputResource (determining the parallelism)

For this indicator_i1 example, we have:

  • speciesCodes, as inputResource
  • wfsUrl, as fixed parameter

The wfsUrl are taken as is from wps execute request, while the speciesCodes are created starting from species names list (from wps execute request too), by searching into a default list of all species and managing case-free characters.

The primary class created is, in this case, IndicatorI1.java, and it must extends StreamingAbstractAlgorithm. At least one class like this must be created. However, in this case two simple classes are created: Constants.java and SpeciesCodes.java (species codes simple db with search engine). Note: We choose to create the classes inside the package com.terradue.wps_hadoop.processes.ird.indicator, the same path structure of the process description created.

Here’s the list of java classes:


[{{host}} ~]$ ll wps-hadoop-source/src/main/java/com/terradue/wps_hadoop/processes/ird/indicator/
total 24
-rw-r--r-- 1 imarine-wp10 ciop  377 Apr 28 17:35 Constants.java
-rw-r--r-- 1 imarine-wp10 ciop 3026 Apr 28 17:35 IndicatorI1.java
-rw-r--r-- 1 imarine-wp10 ciop 1545 Apr 28 17:35 SpeciesCodes.java


IndicatorI1.java

[{{host}} ~]$ vi wps-hadoop-source/src/main/java/com/terradue/wps_hadoop/processes/ird/indicator/IndicatorI1.java
 
 
/**
 * 
 */
package com.terradue.wps_hadoop.processes.ird.indicator;
 
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
 
import org.apache.log4j.Logger;
import org.n52.wps.io.data.IData;
import org.n52.wps.io.data.binding.complex.GenericFileDataBinding;
import org.n52.wps.io.data.binding.literal.LiteralStringBinding;
 
import com.terradue.wps_hadoop.common.input.InputUtils;
import com.terradue.wps_hadoop.common.input.ListInputResource;
import com.terradue.wps_hadoop.streaming.ResultsInfo;
import com.terradue.wps_hadoop.streaming.StreamingAbstractAlgorithm;
import com.terradue.wps_hadoop.streaming.StreamingPackagedAlgorithm;
import com.terradue.wps_hadoop.streaming.WpsHadoopConfiguration;
 
/**
 * @author fcerasuolo
 *
 */
public class IndicatorI1 extends StreamingAbstractAlgorithm {
 
    protected final Logger logger = Logger.getLogger(getClass());
        private List<String> errors = new ArrayList<String>();
 
        @Override
        public Map<String, IData> run(Map<String, List<IData>> inputData) {
                List<String> names = InputUtils.getListStringInputParameter(inputData, "species", true);
                List<String> speciesCodes = getSpeciesCodesFromSpeciesNames(names);
                String wfsUrl = InputUtils.getStringInputParameter(inputData, "wfsUrl");
 
                logger.info("Running Job TUNA ATLAS INDICATOR I1...");
 
                // get the configuration with default values
                WpsHadoopConfiguration conf = new WpsHadoopConfiguration();
 
                // create a new hadoop streaming algorithm
                StreamingPackagedAlgorithm streaming = new StreamingPackagedAlgorithm(conf);
 
                // set algorithm name
                streaming.setAlgorithmName("indicator_i1");
 
		// used if you want to copy the jar at runtime into the hdfs, for quick tests
//              streaming.setAlgorithmPackage(new File("/home/imarine-wp10/hadoopApplications/ird/indicator_i1/indicator_i1.jar"), true);
 
		// set the input resource
                streaming.setInputResource(new ListInputResource(speciesCodes));
 
                // adding parameters
                streaming.addFixedParameter("wfsUrl", wfsUrl==null ? Constants.defaultWfsUrl : wfsUrl);
 
		// set verbose debug mode (default false)
                streaming.setDebugMode(true);
 
                try {
                        // let's run!
                        ResultsInfo result = streaming.runAsync(this);
 
                        Map<String, IData> wpsResultMap = new HashMap<String, IData>();
                        wpsResultMap.put("result", result.getXmlFileDataBinding());
                        return wpsResultMap;
 
                } catch (Exception e) {
                        e.printStackTrace();
                        throw new RuntimeException("Execution job failed! " + e.getMessage());
                }
        }
 
        @Override
        public List<String> getErrors() {
                return errors;
        }
 
        @Override
        public Class<?> getInputDataType(String id) {
                return LiteralStringBinding.class;
        }
 
        @Override
        public Class<?> getOutputDataType(String id) {
                if (id.contentEquals("result"))
                        return GenericFileDataBinding.class;
                else
                        return null;
        }
 
        /**
         * @param names
         * @return
         */
        private List<String> getSpeciesCodesFromSpeciesNames(List<String> names) {
                List<String> speciesCodes = new ArrayList<String>();
                for (String name: names)
                        speciesCodes.add(SpeciesCodes.getSpeciesCode(name));
                return speciesCodes;
        }
}


SpeciesCodes.java

/**
 * 
 */
package com.terradue.wps_hadoop.processes.ird.indicator;
 
/**
 * @author "Francesco Cerasuolo (francesco.cerasuolo@terradue.com)"
 *
 */
public class SpeciesCodes {
 
        private static String csv[] = {
                "YFT,Thunnus albacares,Albacore,Rabil,Yellowfin tuna",
                "SKJ,Katsuwonus pelamis,Listao,Listado,Ocean skipjack",
                "BET,Thunnus obesus,Thon obese,Patudo,Bigeye tuna",
                "ALB,Thunnus alalunga,Germon,Atun blanco,Albacore",
                "BFT,Thunnus thynnus thynnus,Thon rouge,Atun rojo,Bluefin tuna",
                "SBF,Thunnus maccoyii,Thon rouge du sud,Atun rojo del sur,Southern bluefin tuna",
                "SFA,Istiophorus platypterus,Voilier Indo-Pacifique,Pez vela del Indo-Pacifico,Indo-Pacific sailfish",
                "BLM,Makaira indica,Makaire noir,Aguja negra,Black marlin",
                "BUM,Makaira nigricans,Makaire bleu Atlantique,Aguja azul,Atlantic blue marlin",
                "MLS,Tetrapturus audax,Marlin raye,Marlin rayado,Striped marlin",
                "BIL,Istiophoridae spp.,Poissons a rostre non classes,,Unclassified marlin",
                "SWO,Xiphias gladius,Espadon,Pez espada,Broadbill swordfish",
                "SSP,Tetrapturus angustirostris,Makaire a rostre court,Marlin trompa corta,short-billed spearfish",
        };
 
        protected static String getSpeciesCode(String speciesName) {
 
                speciesName = speciesName.toUpperCase();
 
                for (String csvRow: csv) {
                        csvRow = csvRow.toUpperCase();
                        String[] words = csvRow.split(",");
                        String code = words[0];
                        for (String word: words)
                                if (word.contentEquals(speciesName))
                                        return code;
                }
 
                throw new RuntimeException("Species not found.");
        }
 
}


Constants.java

/**
 * 
 */
package com.terradue.wps_hadoop.processes.ird.indicator;
 
/**
 * @author "Francesco Cerasuolo (francesco.cerasuolo@terradue.com)"
 *
 */
public class Constants {
        protected static final String defaultWfsUrl =   "http://mdst-macroes.ird.fr:8080/constellation/WS/wfs/tuna_atlas";
}


Build and package the sources

Now you can compile, build and package all. From ~/wps-hadoop-source directory, by using maven:

[{{host}} ~]$ cd wps-hadoop-source/
[{{host}} wps-hadoop-source]$ mvn clean package -P wps,linux-x86_64
 
[INFO] Scanning for projects...
....
....
....
....
[INFO] Building tar : /home/imarine-wp10/wps-hadoop-source/target/wps-hadoop-1.2.0-SNAPSHOT-bin.tar.gz
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 13.225s
[INFO] Finished at: Tue Apr 29 15:34:45 CEST 2014
[INFO] Final Memory: 24M/184M
[INFO] ------------------------------------------------------------------------

After the mvn execution, you can see what’s generated into the target folder:


[{{host}} wps-hadoop-source]$ ll target/
total 22072
drwxr-xr-x 2 imarine-wp10 ciop     4096 Apr 29 15:36 archive-tmp
drwxr-xr-x 4 imarine-wp10 ciop     4096 Apr 29 15:36 classes
drwxr-xr-x 2 imarine-wp10 ciop     4096 Apr 29 15:36 maven-archiver
drwxr-xr-x 2 imarine-wp10 ciop     4096 Apr 29 15:36 surefire
-rw-r--r-- 1 imarine-wp10 ciop 22394573 Apr 29 15:36 wps-hadoop-1.2.0-SNAPSHOT-bin.tar.gz
-rw-r--r-- 1 imarine-wp10 ciop   153429 Apr 29 15:36 wps-hadoop-1.2.0-SNAPSHOT.jar

The wps-hadoop-1.2.0-SNAPSHOT.jar  library is all you need.


Configure the wps-hadoop web-app

Few more steps: copy the jar library obtained, update the wps_config.xml including this new process, and restart the web application.


Copy the jar library

You simply copy the jar from target folder inside wps-hadoop-source to lib directory of wps-hadoop web-app:

[{{host}} ~]$ cp wps-hadoop-source/target/wps-hadoop-1.2.0-SNAPSHOT.jar wps-hadoop-0.1-SNAPSHOT-tomcat-embedded/webapps/wps/WEB-INF/lib/


Update the wps_config.xml

[{{host}} ~]$ vi wps-hadoop-0.1-SNAPSHOT-tomcat-embedded/webapps/wps/config/wps_config.xml 

You’ve to simply add a property

<Property name="Algorithm" active="true">com.terradue.wps_hadoop.processes.ird.indicator.IndicatorI1</Property>
inside //AlgorithmRepositoryList/Repository[@name=“UploadedAlgorithmRepository]
 
 
...
        <AlgorithmRepositoryList>
                <Repository name="UploadedAlgorithmRepository" className="org.n52.wps.server.UploadedAlgorithmRepository" active="true">
                        <Property name="Algorithm" active="true">com.terradue.wps_hadoop.processes.ird.indicator.IndicatorI1</Property>
                </Repository>
 
        </AlgorithmRepositoryList>
        <RemoteRepositoryList/>
        <Server hostname="{{host}}" hostport="8888" includeDataInputsInResponse="false" computationTimeoutMilliSeconds="5"
                cacheCapabilites="false" webappPath="wps" repoReloadInterval="0">
                <Database/>
        </Server>
</WPSConfiguration>

Notice that the property inner text is com.terradue.wps_hadoop.processes.ird.indicator.IndicatorI1, exactly the package+className(without .java)


Restart the wps-hadoop web application

You ca do this simply by touch the web.xml:

  [{{host}} ~]$ touch wps-hadoop-0.1-SNAPSHOT-tomcat-embedded/webapps/wps/WEB-INF/web.xml

or by restarting the tomcat (hard reload):

  [{{host}} ~]$ wps-hadoop-0.1-SNAPSHOT-tomcat-embedded/bin/shutdown.sh 
  Using LD_LIBRARY_PATH: /home/imarine-wp10/wps-hadoop-0.1-SNAPSHOT-tomcat-embedded/lib/natives
  Using CATALINA_BASE:   /home/imarine-wp10/wps-hadoop-0.1-SNAPSHOT-tomcat-embedded
  Using CATALINA_HOME:   /home/imarine-wp10/wps-hadoop-0.1-SNAPSHOT-tomcat-embedded
  Using CATALINA_TMPDIR: /home/imarine-wp10/wps-hadoop-0.1-SNAPSHOT-tomcat-embedded/temp
  Using JRE_HOME:        /usr
  Using CLASSPATH:       /home/imarine-wp10/wps-hadoop-0.1-SNAPSHOT-tomcat-embedded/bin/bootstrap.jar:/home/imarine-wp10/wps-hadoop-0.1-SNAPSHOT-tomcat-embedded/bin/tomcat-juli.jar
  [{{host}} ~]$ wps-hadoop-0.1-SNAPSHOT-tomcat-embedded/bin/startup.sh 
  Using LD_LIBRARY_PATH: /home/imarine-wp10/wps-hadoop-0.1-SNAPSHOT-tomcat-embedded/lib/natives
  Using CATALINA_BASE:   /home/imarine-wp10/wps-hadoop-0.1-SNAPSHOT-tomcat-embedded
  Using CATALINA_HOME:   /home/imarine-wp10/wps-hadoop-0.1-SNAPSHOT-tomcat-embedded
  Using CATALINA_TMPDIR: /home/imarine-wp10/wps-hadoop-0.1-SNAPSHOT-tomcat-embedded/temp
  Using JRE_HOME:        /usr
  Using CLASSPATH:       /home/imarine-wp10/wps-hadoop-0.1-SNAPSHOT-tomcat-embedded/bin/bootstrap.jar:/home/imarine-wp10/wps-hadoop-0.1-SNAPSHOT-tomcat-embedded/bin/tomcat-juli.jar

Now the process is plugged inside the wps-hadoop system. Note: the restart can take some seconds; you see the progress by see in realtime the log (tail -f  wps-hadoop-0.1-SNAPSHOT-tomcat-embedded/logs/catalina.out)


Run, Test and Debug

Run & Test

getCapabilities: http://Template:Host:8888/wps/WebProcessingService?Request=GetCapabilities&Service=WPS You should see the new process created:


<wps:Process wps:processVersion="1.0.0">
<ows:Identifier>
com.terradue.wps_hadoop.processes.ird.indicator.IndicatorI1
</ows:Identifier>
<ows:Title>IRD Tuna Atlas Indicator i1</ows:Title>
</wps:Process>


describeProcess: http://Template:Host:8888/wps/WebProcessingService?Service=WPS&Version=1.0.0&Request=DescribeProcess&Identifier=com.terradue.wps_hadoop.processes.ird.indicator.IndicatorI1 You should see the process description xml.

executeProcess (example):   async: http://Template:Host:8888/wps/WebProcessingService?service=wps&version=1.0.0&request=Execute &identifier=com.terradue.wps_hadoop.processes.ird.indicator.IndicatorI1 &dataInputs=species=BFT;species=BFT;&ResponseDocument=result sync: http://Template:Host:8888/wps/WebProcessingService?service=wps&version=1.0.0&request=Execute &identifier=com.terradue.wps_hadoop.processes.ird.indicator.IndicatorI1 &dataInputs=species=BFT;species=BFT;&ResponseDocument=result&storeExecuteResponse=true&status=true


The wps-hadoop web-app include a simple wps-webclient, you can access it by http://Template:Host:8888/client.html


Debug

During wps-hadoop process execution, you can check the catalina.out log: inside it it’s displayed the hadoop tracking url and the status. If you follow the url, you can see all map/reduce attempts with logs, in real time.


2014-04-29 16:19:24,008 [pool-21-thread-6] INFO  org.apache.hadoop.streaming.StreamJob: (by wps-hadoop) JobId: job_201404161623_0007
2014-04-29 16:19:24,008 [pool-21-thread-6] INFO  org.apache.hadoop.streaming.StreamJob: To kill this job, run:
2014-04-29 16:19:24,009 [pool-21-thread-6] INFO  org.apache.hadoop.streaming.StreamJob: UNDEF/bin/hadoop job  -Dmapred.job.tracker={{host}}:8021 -kill job_201404161623_0007
2014-04-29 16:19:24,037 [pool-21-thread-6] INFO  org.apache.hadoop.streaming.StreamJob: Tracking URL: http://{{host}}:50030/jobdetails.jsp?jobid=job_201404161623_0007
2014-04-29 16:19:24,129 [pool-21-thread-4] INFO  org.n52.wps.server.request.ExecuteRequest: Update received from Subject, state changed to : 73
2014-04-29 16:19:24,131 [pool-21-thread-4] INFO  org.apache.hadoop.streaming.StreamJob:  map 100%  reduce 33%
2014-04-29 16:19:25,039 [pool-21-thread-6] INFO  org.n52.wps.server.request.ExecuteRequest: Update received from Subject, state changed to : 0
2014-04-29 16:19:25,039 [pool-21-thread-6] INFO  org.apache.hadoop.streaming.StreamJob:  map 0%  reduce 0%
2014-04-29 16:19:25,137 [pool-21-thread-4] INFO  org.n52.wps.server.request.ExecuteRequest: Update received from Subject, state changed to : 100
2014-04-29 16:19:25,139 [pool-21-thread-4] INFO  org.apache.hadoop.streaming.StreamJob:  map 100%  reduce 100%
2014-04-29 16:19:28,146 [pool-21-thread-4] INFO  org.apache.hadoop.streaming.StreamJob: Job complete: job_201404161623_0005
2014-04-29 16:19:28,428 [pool-21-thread-4] INFO  org.apache.hadoop.streaming.StreamJob: Output: /store/a5cc3063-b8d8-4422-ad32-0c7f387e07dd/output/

Note: it’s convenient to set streaming.setDebugMode(true) inside your java process class. In this way you can see the bash run file execution in debug mode, and into the tracking url you can see each bash statement execution.