Difference between revisions of "Legacy applications integration"
Line 1: | Line 1: | ||
The structure we propose is as depicted below (for a processing step named "align", provided as an example) | The structure we propose is as depicted below (for a processing step named "align", provided as an example) | ||
− | The application directory follows | + | The application directory follows a set of best practices |
+ | * for its folders and files structure | ||
+ | * for its descriptive metadata | ||
+ | so to ease the subsequent deployment of the application to the WPS-hadoop environment. | ||
− | The application.xml file has two main blocks: the job template section and the workflow template section. | + | The application.xml file has two main blocks: |
+ | * the job template section | ||
+ | * and the workflow template section. | ||
− | The first part is to define the job templates in the workflow XML application definition file. The second would not be used and it is just there to pave the road (if needed) to support workflows with | + | The first part is to define the job templates in the workflow XML application definition file. The second would not be used and it is just there to pave the road (if needed) to support workflows with Oozie) |
Our unique processing block of the workflow needs a job template. | Our unique processing block of the workflow needs a job template. | ||
Line 22: | Line 27: | ||
</defaultParameters> | </defaultParameters> | ||
<defaultJobconf> | <defaultJobconf> | ||
− | <property id=" | + | <property id="app.job.max.tasks">1</property> <!-- Maximum number of parallel tasks --> |
</defaultJobconf> | </defaultJobconf> | ||
</jobTemplate> | </jobTemplate> |
Revision as of 13:03, 1 February 2013
The structure we propose is as depicted below (for a processing step named "align", provided as an example)
The application directory follows a set of best practices
- for its folders and files structure
- for its descriptive metadata
so to ease the subsequent deployment of the application to the WPS-hadoop environment.
The application.xml file has two main blocks:
- the job template section
- and the workflow template section.
The first part is to define the job templates in the workflow XML application definition file. The second would not be used and it is just there to pave the road (if needed) to support workflows with Oozie)
Our unique processing block of the workflow needs a job template.
A proposed example contains the XML lines below:
<jobTemplate id="align"> <streamingExecutable>/application/align/run</streamingExecutable> <!-- processing trigger --> <defaultParameters> <!-- default parameters of the job --> <!-- Default values are specified here, for testing purposes only! --> <parameter id="param1">2</parameter> <!-- no default value --> <parameter id="param2">4</parameter> </defaultParameters> <defaultJobconf> <property id="app.job.max.tasks">1</property> <!-- Maximum number of parallel tasks --> </defaultJobconf> </jobTemplate>
We could provide tools to test a job on the local workstation.
Once done, this is packaged as a jar and stored in a repository accessible from the WPS-hadoop server. When a processing request is triggered, WPS-hadoop, via the hadoop streaming deploys that jar and the legacy application is invoked.
Hadoop clusters serving legacy applications only need to have R (and/or IDL, MatLab, Octave, etc.) installed (known procedure since SimpleTestNono has lead to that scenario)