Difference between revisions of "IR Bootstrapper"
(10 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
* Indexes | * Indexes | ||
* Open Search resources for open search collections | * Open Search resources for open search collections | ||
+ | * SRU resources for SRU collections | ||
This portlet is based on a [https://gcube.wiki.gcube-system.org/gcube/index.php?title=IR_Bootstrapper#Bootstrapper_Static_Configuration configuration file] that is saved as a generic resource on the IS. This file is at XML format and it defines: | This portlet is based on a [https://gcube.wiki.gcube-system.org/gcube/index.php?title=IR_Bootstrapper#Bootstrapper_Static_Configuration configuration file] that is saved as a generic resource on the IS. This file is at XML format and it defines: | ||
Line 11: | Line 12: | ||
** A job can be extended by another job to define a more restrict job to execute (i.e. to be defined for a collection with a given name) | ** A job can be extended by another job to define a more restrict job to execute (i.e. to be defined for a collection with a given name) | ||
** The user can define a new job by using the portlet's graphical user interface | ** The user can define a new job by using the portlet's graphical user interface | ||
+ | <br> | ||
+ | |||
+ | - An easy way to learn how to use the IRBootstrapper portlet is by watching this video: [http://www.youtube.com/watch?v=8wCcMVUPubE Execute tasks using the IRBootstrapper] | ||
+ | |||
== Job Execution == | == Job Execution == | ||
Line 24: | Line 29: | ||
=== Jobs Batch Submission === | === Jobs Batch Submission === | ||
− | When you check more than one collections of the same job type, you can submit these jobs for batch execution. If these jobs require any extra user input at runtime a window appears asking for the extra input. The same input will be used for all the jobs that will be submitted using the batch mode | + | When you check more than one collections of the same job type, you can submit these jobs for batch execution. If these jobs require any extra user input at runtime a window appears asking for the extra input. The same input will be used for all the jobs that will be submitted using the batch mode, or the output of the previous task will be used as input in the next task <br> |
[[File:IR-BatchSelection.png]] [[File:IR-inputWindow.png]] | [[File:IR-BatchSelection.png]] [[File:IR-inputWindow.png]] | ||
Line 76: | Line 81: | ||
<type class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.data.FullTextIndexNodeDataType" name="FullTextIndexNode" /> | <type class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.data.FullTextIndexNodeDataType" name="FullTextIndexNode" /> | ||
− | + | ||
− | <type class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.data. | + | <type class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.data.SRUCollectionDataType" name="SRUCollection" /> |
+ | |||
+ | <type class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.data.SRUDataType" name="SRUResource" /> | ||
+ | |||
+ | <type class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.data.RelationalDBDataType" name="RelationalDBDataSource" /> | ||
</source> | </source> | ||
Line 85: | Line 94: | ||
<source lang="xml"> | <source lang="xml"> | ||
<tasktype class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.task.OpenSearchGenerationTaskType" name="OpenSearchGenerationTaskType"> | <tasktype class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.task.OpenSearchGenerationTaskType" name="OpenSearchGenerationTaskType"> | ||
+ | |||
<input type="GCUBECollection" /> | <input type="GCUBECollection" /> | ||
<output type="OpenSearch" /> | <output type="OpenSearch" /> | ||
− | <run>true</run> | + | <run>true</run> |
+ | |||
</tasktype> | </tasktype> | ||
− | <tasktype class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.task. | + | <tasktype class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.task.FullTextIndexNodeGenerationTaskType" name="FullTextIndexNodeGenerationTask"> |
<input type="TreeManagerCollection" /> | <input type="TreeManagerCollection" /> | ||
+ | |||
+ | <output type="FullTextIndexNode" /> | ||
+ | |||
+ | <run>true</run> | ||
+ | |||
+ | </tasktype> | ||
− | <output type=" | + | <tasktype class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.task.FullTextIndexNodeUpdateTaskType" name="FullTextIndexNodeUpdateTask"> |
+ | |||
+ | <input type="TreeManagerCollection" /> | ||
+ | |||
+ | <output type="FullTextIndexNode" /> | ||
<run>true</run> | <run>true</run> | ||
</tasktype> | </tasktype> | ||
+ | |||
+ | <tasktype class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.task.SRUGenerationTaskType" name="SRUGenerationTaskType"> | ||
+ | |||
+ | <input type="SRUCollection" /> | ||
+ | |||
+ | <output type="SRUResource" /> | ||
+ | |||
+ | <run>true</run> | ||
− | <tasktype class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.task. | + | </tasktype> |
+ | <tasktype class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.task.RelationalDBFullTextIndexNodeGenerationTaskType" name="RelationalDBFullTextIndexNodeGenerationTask"> | ||
− | <input type=" | + | <input type="RelationalDBDataSource" /> |
<output type="FullTextIndexNode" /> | <output type="FullTextIndexNode" /> | ||
Line 114: | Line 144: | ||
</source> | </source> | ||
− | * ''JobType'': The JobTypes are added by the administrator and define a set of task types and their initial assignments that will be executed. Furthermore for each task and for the task's internal assignments it defines | + | * ''JobType'': The JobTypes are added by the administrator and define a set of task types and their initial assignments that will be executed. Furthermore for each task and for the task's internal assignments it defines if they will be executed in parallel or sequential order.<br> |
− | In this example the JobType creates FullText indexes. It takes as input a ''' | + | In this example the JobType creates FullText indexes. It takes as input a '''TreeManagerCollection''' and it will execute the '''FullTextIndexNodeGenerationTask''' task type that should be already defined in the configuration. |
− | It performs the required assignments for this task by providing the desired input and output | + | It performs the required assignments for this task by providing the desired input and output. |
Notice here that these assignments will be run sequentially as defined in the XML. | Notice here that these assignments will be run sequentially as defined in the XML. | ||
<source lang="xml"> | <source lang="xml"> | ||
− | <jobtype description="Creates the required fulltext indices for a collection." name=" | + | <jobtype description="Creates the required fulltext indices for a collection." name="FTIndexNodeCollection"> |
− | + | <input type="TreeManagerCollection" /> | |
− | + | <jobDefinition> | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
<parallel> | <parallel> | ||
− | + | <sequential> | |
− | + | ||
− | + | ||
− | + | <assign to="%Create_ft_node_index.input" value="%FTIndexNodeCollection.input" /> | |
− | + | <assign to="%Create_ft_node_index.output.IndexedCollectionID" value="%Create_ft_node_index.input.ColID" /> | |
− | + | <task name="Create_ft_node_index" tasktype="FullTextIndexNodeGenerationTask" /> | |
− | + | </sequential> | |
</parallel> | </parallel> | ||
− | + | </jobDefinition> | |
</jobtype> | </jobtype> | ||
Line 158: | Line 176: | ||
* You can define that instances of a specific JobType can be submitted in a batch mode. This means that multiple instances will be executed in a sequential order and that every instance (except the first one) will use a spesific output of the previous completed instance as its input.<br> | * You can define that instances of a specific JobType can be submitted in a batch mode. This means that multiple instances will be executed in a sequential order and that every instance (except the first one) will use a spesific output of the previous completed instance as its input.<br> | ||
− | In order to enable this functionality you should add the following element in the JobType definition: | + | In order to enable this functionality you should add the following element in the JobType definition (This element should be added as a child of the ''JobType'' element): |
<source lang="xml"> | <source lang="xml"> | ||
<ChainExecution> | <ChainExecution> | ||
Line 164: | Line 182: | ||
<ChainConnectionAssignments> | <ChainConnectionAssignments> | ||
− | <assign to="% | + | <assign to="%Create_ft_node_index.FullTextIndexGenerationTask.IdOfIndexManagerToAppend" value="%Create_ft_node_index.output.IndexID" /> |
</ChainConnectionAssignments> | </ChainConnectionAssignments> | ||
Line 177: | Line 195: | ||
You can define as many jobs you want by using the portlet's visual Job Editor/Creator. The portlet helps the creation by suggesting the JobTypes and the assignments.<br> | You can define as many jobs you want by using the portlet's visual Job Editor/Creator. The portlet helps the creation by suggesting the JobTypes and the assignments.<br> | ||
The type of each job should be one of the declared JobTypes and based on the type every job should declare the needed assignments. | The type of each job should be one of the declared JobTypes and based on the type every job should declare the needed assignments. | ||
− | You can see the XML of job for the '' | + | You can see the XML of job for the ''FTIndexNodeCollection'' JobType below: |
<source lang="xml"> | <source lang="xml"> | ||
− | <job jobtype=" | + | <job jobtype="FTIndexNodeCollection" name="FullText Index OAI Tree Collections"> |
− | + | <initialization> | |
− | + | <assign to="%FTIndexNodeCollection.input.Type" value="ns5:OAI" /> | |
− | + | <assign to="%Create_ft_node_index.FullTextIndexNodeGenerationTask.IndexTypeID" value="ft_oai_dc_1.0" /> | |
− | + | <assign to="%Create_ft_node_index.FullTextIndexNodeGenerationTask.TransformationXSLTID" value="$BrokerXSLT_wrapperFT" /> | |
− | + | <assign to="%Create_ft_node_index.FullTextIndexNodeGenerationTask.XsltsIDs" value="[ $BrokerXSLT_FARM_dc_anylanguage_to_ftRowset_anylanguage ]" /> | |
− | + | <assign to="%Create_ft_node_index.FullTextIndexNodeGenerationTask.IdOfIndexManagerToAppend" userInputLabel="ID of index node to append" value="%userInput" /> | |
− | + | ||
− | + | ||
− | + | </initialization> | |
</job> | </job> | ||
</source> | </source> | ||
− | In the above example the job has the name: '' | + | In the above example the job has the name: ''FullText Index OAI Tree Collections'' and declares through the assignments that it can only be matched for the TreeManagerCollection with Type: "ns5:OAI" and it also defines all the other needed assignments needed for the Full Text Index creation. |
+ | This job is applicable for all the '''OAI''' collections. If you need to declare the same job for other types the '''Type''' and the '''XsltsIDs''' should be updated. | ||
* It is worth noticing that if you want a job to ask for a user input at runtime then you should declare at the respective assignment the specific value: '''%userInput'''. In addition the attribute: '''userInputLabel''' should contain a description of the type of value you are expecting to be provided by the user. This will help the user to provide the expected value. | * It is worth noticing that if you want a job to ask for a user input at runtime then you should declare at the respective assignment the specific value: '''%userInput'''. In addition the attribute: '''userInputLabel''' should contain a description of the type of value you are expecting to be provided by the user. This will help the user to provide the expected value. | ||
Line 209: | Line 226: | ||
You can create new Jobs by extending existing ones. In order to extend an existing Job you should add the '''extends''' attribute in the Job element: | You can create new Jobs by extending existing ones. In order to extend an existing Job you should add the '''extends''' attribute in the Job element: | ||
<source lang="xml"> | <source lang="xml"> | ||
− | <job extends=" | + | <job extends="FullText Index OAI Tree Collections" jobtype="FTIndexNodeCollection" name="FullText Index OAI Tree Collections-Extended"> |
</source> | </source> | ||
− | In the above example you declare that a new Job is created with name: " | + | In the above example you declare that a new Job is created with name: "FullText Index OAI Tree Collections-Extended" that extends the existing Job: "FullText Index OAI Tree Collections".<br> |
Assignments that are already defined in the parent Job do not need to be defined again, unless you would like to override them | Assignments that are already defined in the parent Job do not need to be defined again, unless you would like to override them | ||
+ | <br> | ||
+ | * The latest Bootstrapper configuration can be downloaded from here: [https://gcube.wiki.gcube-system.org/gcube/images_gcube/b/ba/IRBootstrapperConfiguration.xml IRBoostrapperConfiguration.xml] |
Latest revision as of 15:57, 3 July 2014
The IR Bootstrapper portlet provides a graphical user interface for executing sets of tasks on various resources of the infrastructure after the data import phase is completed. These tasks lead to the creation of other resources such as:
- Indexes
- Open Search resources for open search collections
- SRU resources for SRU collections
This portlet is based on a configuration file that is saved as a generic resource on the IS. This file is at XML format and it defines:
- The available tasks that can be executed
- The available jobTypes that can be used. These JobTypes define a sequential and/or parallel task executions for a given type of input to a given output
- The available jobs which are of type of the available jobTypes and provide all the specific inputs for this type.
- The jobs are the ones that are available for execution on the resources.
- A job can be extended by another job to define a more restrict job to execute (i.e. to be defined for a collection with a given name)
- The user can define a new job by using the portlet's graphical user interface
- An easy way to learn how to use the IRBootstrapper portlet is by watching this video: Execute tasks using the IRBootstrapper
Contents
Job Execution
The first tab of the portlet is divided into 2 main panels. At the left panel there is a tree with all the available collections. Clicking on a collection you can see all the jobs that can be executed on this collection. You can select any of the jobs and see the execution tree at the right panel. When a lock icon appears at a task of the selected job it means that this task is already completed for the selected collection and thus it won't be executed again. In order to execute this job you have to click on the button located on the top of the tree or you can check the checkbox and click on the button. This button is enabled when at least one job is checked.
Jobs Batch Submission
When you check more than one collections of the same job type, you can submit these jobs for batch execution. If these jobs require any extra user input at runtime a window appears asking for the extra input. The same input will be used for all the jobs that will be submitted using the batch mode, or the output of the previous task will be used as input in the next task
When a job is submitted it is added at the Submitted jobs tree. You can go at the Submitted Jobs panel to check the state of each job.
- On each task an icon declares the current state: Running, Completed, Completed with warnings, Failed or Fulfilled Task
- You can see the execution log of each task by clicking on the '+' button.
- For each job you can abort the execution or you can remove it from the list
Job Designer
The second tab shows a tree with all the job types and all the defined jobs for each type.
You can delete an existing job, display the execution tree of a job and/or clone an existing job to a new one
These changes update the bootstrapper's portlet configuration generic resource.
You can also create a new job using the graphical interface
- The job should have a name and be of a specific type
- For all the assignments of the specified type a value should be provided
Form more information about the jobs and jobTypes please refer to the section below
Bootstrapper Static Configuration
A static configuration in an XML format is required by the portlet in order to be initialized. This configuration is saved as a generic resource to the system's Information System. The configuration is created by the administrator when the portlet is released and can be enhanced later on using the portlet's Job Editor.
The configuration is consisted of 2 main parts:
- Types
- Jobs
Types
There are 3 different types that should be declared:
- Type: It is added by the administrator when the resource is created and declares the classes defined in the portlet's source code
In the current implementation of the portlet the Data types that are defined are the following:
<type class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.data.TreeManagerCollectionDataType" name="TreeManagerCollection" /> <type class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.data.GCUBECollectionDataType" name="GCUBECollection" /> <type class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.data.OpenSearchDataType" name="OpenSearch" /> <type class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.data.FullTextIndexNodeDataType" name="FullTextIndexNode" /> <type class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.data.SRUCollectionDataType" name="SRUCollection" /> <type class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.data.SRUDataType" name="SRUResource" /> <type class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.data.RelationalDBDataType" name="RelationalDBDataSource" />
- TaskType: It is added by the administrator when the resource is created and declares the tasks that can be executed using the portlet. For each task type the input and the output should be defined and the allowed values can be one of the available Data types described above.
In the current implementation of the portlet the tasks that can be executed are the following:
<tasktype class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.task.OpenSearchGenerationTaskType" name="OpenSearchGenerationTaskType"> <input type="GCUBECollection" /> <output type="OpenSearch" /> <run>true</run> </tasktype> <tasktype class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.task.FullTextIndexNodeGenerationTaskType" name="FullTextIndexNodeGenerationTask"> <input type="TreeManagerCollection" /> <output type="FullTextIndexNode" /> <run>true</run> </tasktype> <tasktype class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.task.FullTextIndexNodeUpdateTaskType" name="FullTextIndexNodeUpdateTask"> <input type="TreeManagerCollection" /> <output type="FullTextIndexNode" /> <run>true</run> </tasktype> <tasktype class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.task.SRUGenerationTaskType" name="SRUGenerationTaskType"> <input type="SRUCollection" /> <output type="SRUResource" /> <run>true</run> </tasktype> <tasktype class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.task.RelationalDBFullTextIndexNodeGenerationTaskType" name="RelationalDBFullTextIndexNodeGenerationTask"> <input type="RelationalDBDataSource" /> <output type="FullTextIndexNode" /> <run>true</run> </tasktype>
- JobType: The JobTypes are added by the administrator and define a set of task types and their initial assignments that will be executed. Furthermore for each task and for the task's internal assignments it defines if they will be executed in parallel or sequential order.
In this example the JobType creates FullText indexes. It takes as input a TreeManagerCollection and it will execute the FullTextIndexNodeGenerationTask task type that should be already defined in the configuration. It performs the required assignments for this task by providing the desired input and output. Notice here that these assignments will be run sequentially as defined in the XML.
<jobtype description="Creates the required fulltext indices for a collection." name="FTIndexNodeCollection"> <input type="TreeManagerCollection" /> <jobDefinition> <parallel> <sequential> <assign to="%Create_ft_node_index.input" value="%FTIndexNodeCollection.input" /> <assign to="%Create_ft_node_index.output.IndexedCollectionID" value="%Create_ft_node_index.input.ColID" /> <task name="Create_ft_node_index" tasktype="FullTextIndexNodeGenerationTask" /> </sequential> </parallel> </jobDefinition> </jobtype>
- You can define that instances of a specific JobType can be submitted in a batch mode. This means that multiple instances will be executed in a sequential order and that every instance (except the first one) will use a spesific output of the previous completed instance as its input.
In order to enable this functionality you should add the following element in the JobType definition (This element should be added as a child of the JobType element):
<ChainExecution> <ChainConnectionAssignments> <assign to="%Create_ft_node_index.FullTextIndexGenerationTask.IdOfIndexManagerToAppend" value="%Create_ft_node_index.output.IndexID" /> </ChainConnectionAssignments> </ChainExecution>
In the above assignment you declare that in a batch mode execution the specific assignment will take as value the specific output value of the previous job (i.e. the IndexID of a Full Text Index will be used as the Index Manager ID of the next Job. This will force both indexes to be created under the same WS-resource Index Manager).
Jobs
You can define as many jobs you want by using the portlet's visual Job Editor/Creator. The portlet helps the creation by suggesting the JobTypes and the assignments.
The type of each job should be one of the declared JobTypes and based on the type every job should declare the needed assignments.
You can see the XML of job for the FTIndexNodeCollection JobType below:
<job jobtype="FTIndexNodeCollection" name="FullText Index OAI Tree Collections"> <initialization> <assign to="%FTIndexNodeCollection.input.Type" value="ns5:OAI" /> <assign to="%Create_ft_node_index.FullTextIndexNodeGenerationTask.IndexTypeID" value="ft_oai_dc_1.0" /> <assign to="%Create_ft_node_index.FullTextIndexNodeGenerationTask.TransformationXSLTID" value="$BrokerXSLT_wrapperFT" /> <assign to="%Create_ft_node_index.FullTextIndexNodeGenerationTask.XsltsIDs" value="[ $BrokerXSLT_FARM_dc_anylanguage_to_ftRowset_anylanguage ]" /> <assign to="%Create_ft_node_index.FullTextIndexNodeGenerationTask.IdOfIndexManagerToAppend" userInputLabel="ID of index node to append" value="%userInput" /> </initialization> </job>
In the above example the job has the name: FullText Index OAI Tree Collections and declares through the assignments that it can only be matched for the TreeManagerCollection with Type: "ns5:OAI" and it also defines all the other needed assignments needed for the Full Text Index creation. This job is applicable for all the OAI collections. If you need to declare the same job for other types the Type and the XsltsIDs should be updated.
- It is worth noticing that if you want a job to ask for a user input at runtime then you should declare at the respective assignment the specific value: %userInput. In addition the attribute: userInputLabel should contain a description of the type of value you are expecting to be provided by the user. This will help the user to provide the expected value.
Jobs' Inheritance
You can create new Jobs by extending existing ones. In order to extend an existing Job you should add the extends attribute in the Job element:
<job extends="FullText Index OAI Tree Collections" jobtype="FTIndexNodeCollection" name="FullText Index OAI Tree Collections-Extended">
In the above example you declare that a new Job is created with name: "FullText Index OAI Tree Collections-Extended" that extends the existing Job: "FullText Index OAI Tree Collections".
Assignments that are already defined in the parent Job do not need to be defined again, unless you would like to override them
- The latest Bootstrapper configuration can be downloaded from here: IRBoostrapperConfiguration.xml