Difference between revisions of "Process Optimisation"

From Gcube Wiki
Jump to: navigation, search
(Usage Example)
(Abstract and Concrete Service References)
Line 36: Line 36:
 
In gCube we distinguish three categories of partnerLinkTypes:
 
In gCube we distinguish three categories of partnerLinkTypes:
  
* '''concreteDiligentService''': Is a static reference to a specific Running Instance. The Planner will not try to reschedule the assignment of this partnerLinkType. Whenever such partnerLink is declared in the process the static reference to the Running Instance is used in the relevant invocations. The user, provides a static URL inside the partnerLink element to be used in every process invocation that involves such partnerLink.
+
* '''concreteGCubeService''': Is a static reference to a specific Running Instance. The Planner will not try to reschedule the assignment of this partnerLinkType. Whenever such partnerLink is declared in the process the static reference to the Running Instance is used in the relevant invocations. The user, provides a static URL inside the partnerLink element to be used in every process invocation that involves such partnerLink.
  
 
* '''concreteExternalService''': This in practice is similar to the above but points to a Web Service outside a gCube infrastructure. Thus this service is not a Running Instance of any gCube service deployed in the VRE.
 
* '''concreteExternalService''': This in practice is similar to the above but points to a Web Service outside a gCube infrastructure. Thus this service is not a Running Instance of any gCube service deployed in the VRE.
  
* '''abstractDiligentService''': These are partnerLinkTypes whose partnerLinks can be rescheduled at any time either by the Planner or the ActivePlanner. The selection of the specific Running Instance to use depends on the optimisation policies declared in the BPEL document and of course on the current state of the VRE infrastructure as it is reflected by the information provided from the IS.
+
* '''abstractGCubeService''': These are partnerLinkTypes whose partnerLinks can be rescheduled at any time either by the Planner or the ActivePlanner. The selection of the specific Running Instance to use depends on the optimisation policies declared in the BPEL document and of course on the current state of the VRE infrastructure as it is reflected by the information provided from the IS.
  
 
=== BPEL Optimisation Extensions ===
 
=== BPEL Optimisation Extensions ===

Revision as of 14:35, 24 February 2009

Introduction

The gCube Process Optimisation Services implements core functionality in the form of libraries and web services for Process scheduling and execution planning. gCube POS is exploited by the CSEngine in order to deliver optimized process execution functionality in the context of a VRE.

Implementation Overview

POS is comprised by a core optimisation library (POSLib) and two Web Services (RewriterService and PlannerService) that expose part of the library's functionality. POSLib implements three core components of process optimisation

Rewriter

Provides structure optimization of a process. It receives as input a BPEL process, analyzes its structure, identifies independent invocations and formulates them in parallel constructs (BPEL flow elements) in order to accelerate the overall process execution. It is the first step of optimization that takes place before the process arrives to the execution engine.

Planner

Performs pre-planning of the process execution. Receives an abstract BPEL process and generates various scheduling plans for execution. The generation of an executable plan implies that all references to abstract services are replaced by invocations to concrete, instantiated services in a gCube infrastructure. The Planner uses information provided by the IS that holds up-to-date metrics for resources employed in the grid (machines, services, etc). This information is input to various cost functions that calculate the individual execution cost of a candidate plan. The selection of best plans is performed by a custom implementation of the Simulated Annealing algorithm. The outcome of the planning is a set of executable BPEL processes that are passed to the gCube execution engine. Cost calculation can be guided by various weighted optimization policies passed by the author (human or application) of the BPEL process inside the BPEL description.

ActivePlanner

Provides run-time optimized scheduling of a gCube process. It is invoked by the execution engine before any invocation activity to ensure that the plan generated by the Planner (during pre-planning) is still valid (e.g. the selected service end-point is still reachable) and optimal (according to the user-defined optimization policies). If any of the former criteria has been violated the ActivePlanner re-evaluates a optimal service instance for the current process invocation. It can also work without pre-planning being available.


The Rewriter and the Planner are also available as Web Services. The ActivePlanner only as part of POSLib.

Optimisation Policies

The Planner and ActivePlanner components perform optimised scheduling of abstract BPEL processes based on user defined policies. Optimisation policies are declared within the BPEL document and can apply to individual partnerLinkTypes or to the whole process.

In a BPEL document, PartnerLinkTypes define the classes of Web Services that can participate in multiple roles in a process. A particular instantiation of a partnerLinkType inside the process is denoted by a partnerLink element definition. A process may include various different parterLinks from the same partnerLinkType participating in the process with different roles. In practice a partnerLink is the Running Instance whose operations can be invoked during the exection of a process.

The selection of a specific Running Instance to used in a particular process invocation is driven by the optimisation policy applied either in the process level or in a partnerLink level. Currently POS supports six different optimisation policies:

  • Host load: In this policy the gHNs with the lowest system load is used for scheduling the invocation.
  • Fastest CPU: In this policy the gHNs are ranked based on their CPU capabilities and the best one is used for scheduling the invocation
  • Memory Utilization: gHNs are ranked according to the percentage of available memory as reported by the Java VM. The gHN with the highest percentage is selected.
  • Storage Utilization: gHNs are ranked according to their total available space. The gHN with the biggest available space is prefered.
  • Reliability: The gHNs are ranked based on their total uptime. The gHN which has been up and running for the longest period is ranked highest. The idea behind this policy is that a gHN that hasn't gone off line for a long period of time has smaller probability to go down while a process invocation takes place.
  • Network Utilization: This is a so called "whole plan" optimisation policy. When the Planner evaluates multiple possible scheduling plans it will so preference to those plans where the Running Instances are located close to each other (based on the reported gHN locality information). Notice though that the Planner will try to avoid co-scheduling invocations to the same gHN to avoid overloading it. Actually, this will be the last resort when there are no other available gHNs to use.

One additional optimisation policy has been reserved but is not fully implemented yet, namely the Monetary Cost optimisation policy, that instructs the Planner to select the best Running Instances based on the money charging cost defined by the RI provider. Currently though, all services in the two user community infrastructures established with gCube (EM and FARM) are provided free of charge.

Abstract and Concrete Service References

In gCube we distinguish three categories of partnerLinkTypes:

  • concreteGCubeService: Is a static reference to a specific Running Instance. The Planner will not try to reschedule the assignment of this partnerLinkType. Whenever such partnerLink is declared in the process the static reference to the Running Instance is used in the relevant invocations. The user, provides a static URL inside the partnerLink element to be used in every process invocation that involves such partnerLink.
  • concreteExternalService: This in practice is similar to the above but points to a Web Service outside a gCube infrastructure. Thus this service is not a Running Instance of any gCube service deployed in the VRE.
  • abstractGCubeService: These are partnerLinkTypes whose partnerLinks can be rescheduled at any time either by the Planner or the ActivePlanner. The selection of the specific Running Instance to use depends on the optimisation policies declared in the BPEL document and of course on the current state of the VRE infrastructure as it is reflected by the information provided from the IS.

BPEL Optimisation Extensions

gCube POS functionality heavily depends on the Business Process Execution Language (BPEL) standard. The notation used to represent the Processes is based on BPEL v1.1. The standard has been extended to include optimisation information such as process policy information per partnerLinks, the definition of abstract or concrete services, allocation relationship between invocations etc. The XML schema of the extended BPEL 1.1 can be found here .

Possible values of optimisation policy attribute list

Below is the XML schema of the policy values that can be used in a BPEL document. The name of the policies are self-explanatory and refer to the policies described in the previous paragraphs.

<simpleType name="policyValues">
        <restriction base="string">
		<enumeration value="host_load"/>
		<enumeration value="fastest_cpu"/>
		<enumeration value="network_utilization"/>
		<enumeration value="memory_utilization"/>
		<enumeration value="storage_utilization"/>
		<enumeration value="reliability"/>
		<enumeration value="monetary_cost"/>
	</restriction>
</simpleType>

Note that the order of the policy definitions is important. For instance a policy "network_utilization reliability" will try to satisfy first the requirement for Network utilisation optimisation and then for Reliability. Formally speaking, the order of the policies defines the weight of the respective individual execution cost when the Planner is calculating the total plan cost.

Process-wide policy definition

To define process wide optimisation policies use the optimisationPolicy attribute of the BPEL process element. The attribute is a string list of optimisation policy names separated with a space. For example the process defined by the BPEL exerpt below will be scheduled for according to the fastest_cpu policy (with higher weight) and the storage_utilization policy (lower weight).

<process optimisationPolicy="fastest_cpu storage_utilization" xmlns="http://schemas.xmlsoap.org/ws/2003/03/business-process/" xmlns:plnk="http://schemas.xmlsoap.org/ws/2003/05/partner-link/" xmlns:tns="http://diligentproject.org/searchservice/diligentprocess" targetNamespace="http://diligentproject.org/searchservice/diligentprocess" name="BPELDiligentProcessJAXB1826584379" abstractProcess="no" xmlns:jxb="http://java.sun.com/xml/ns/jaxb" xmlns:xjc="http://java.sun.com/xml/ns/jaxb/xjc" xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schemas.xmlsoap.org/ws/2003/03/business-process/
C:\Development\workspace\ProcessOptimisation\etc\schema\bpel+.xsd">
	<partnerLinkTypes>
		<partnerLinkType name="BPELDiligentProcess">
			<role name="BPELDiligentProcessProvider"> 
				<portType serviceType="concreteDiligentService" name="tns:BPELDiligentProcess"/>
			</role>
		</partnerLinkType>
		<partnerLinkType name="fulltextindexlookupserviceLT">
			<role name="fulltextindexlookupserviceRole">
				<portType xmlns:fulltextindexlookupservice="http://diligentproject.org/namespaces/index/FullTextIndexLookupService" serviceType="concreteDiligentService" name="fulltextindexlookupservice:FullTextIndexLookupPortType"/>
			</role>

...

If no policy is defined the default used is the host_load optimisation policy.

Service specific policy definition

The policies defined on process wide level pertain the planning of all partnerLinks included in the process unless a partnerLink specific policy is defined on the partnerLink element. To define such policy use the partnerLinkPolicyType attribute of the BPEL partnerLink element. The usage is similar with the policy definitions on the process level. Again, if no policy is defined the Planner will use as default the host_load policy. If the network_utilisation policy is used it will be ignored because it doesn't apply to the partnerLink level but only to the process level.

Below is and excerpt from a BPEL partnerLinks definition that demonstrates the above.

...
<partnerLinks>
	<partnerLink partnerLinkType="tns:BPELDiligentProcess" name="client" myRole="BPELDiligentProcessProvider"/>
	<partnerLink xmlns:fulltextindexlookupservice="http://diligentproject.org/namespaces/index/FullTextIndexLookupService" partnerRole="fulltextindexlookupserviceRole" partnerLinkType="fulltextindexlookupservice:fulltextindexlookupserviceLT" name="fulltextindexlookupservicePLfulltextindexlookupserviceLT0" partnerLinkPolicyType="fastest_cpu"/>
	<partnerLink xmlns:sortoperatorservice="http://diligentproject.org/namespaces/searchservice/SortOperatorService" partnerRole="sortoperatorserviceRole" partnerLinkType="sortoperatorservice:sortoperatorserviceLT" name="sortoperatorservicePLsortoperatorserviceLT" partnerLinkPolicyType="storage_utilization fastest_cpu"/>
	<partnerLink xmlns:keeptopoperatorservice="http://diligentproject.org/namespaces/searchservice/KeepTopOperatorService" partnerRole="keeptopoperatorserviceRole" partnerLinkType="keeptopoperatorservice:keeptopoperatorserviceLT" name="keeptopoperatorservicePLkeeptopoperatorserviceLT" partnerLinkPolicyType="fastest_cpu"/>
	<partnerLink xmlns:transformbyxsltoperatorservice="http://diligentproject.org/namespaces/searchservice/TransformByXSLTOperatorService" partnerRole="transformbyxsltoperatorserviceRole" partnerLinkType="transformbyxsltoperatorservice:transformbyxsltoperatorserviceLT" name="transformbyxsltoperatorservicePLtransformbyxsltoperatorserviceLT"/>
	<partnerLink xmlns:joininneroperatorservice="http://diligentproject.org/namespaces/searchservice/JoinInnerOperatorService" partnerRole="joininneroperatorserviceRole" partnerLinkType="joininneroperatorservice:joininneroperatorserviceLT" name="joininneroperatorservicePLjoininneroperatorserviceLT"/>
	<partnerLink xmlns:filterresultsetbyxpathoperatorservice="http://diligentproject.org/namespaces/searchservice/FilterResultSetByXPathOperatorService" partnerRole="filterresultsetbyxpathoperatorserviceRole" partnerLinkType="filterresultsetbyxpathoperatorservice:filterresultsetbyxpathoperatorserviceLT" name="filterresultsetbyxpathoperatorservicePLfilterresultsetbyxpathoperatorserviceLT"/>
</partnerLinks>
...

Dependencies

POSLib depends on the following components

  • ResourceManager - All queries to the IS are performed through the ResourceManager taking advantage of the caching functionality that the component implements.
  • Java Architecture for XML Binding (JAXB) - Sun's reference implementation of JAXB is used by the Planner to parse BPEL documents and to extract optimisation related information. It is also used by the Rewriter for reading and reconstructing BPEL processes in order to optimise them.
  • gCore - As with most gCube components, POSLib depends at gCore not only because it provides the container were the PlannerService and RewriterService are deployed, but also because it provides, indirectly, access to a set of supporting libraries that are extensively used in various components of the library.

Usage Example

The following subparagraphs contain usage examples for the three main POS components (PlannerService, RewriterService and ActivePlanner). Although in the current gCube architecture POS is exploited only by the CSEngine, the implemented classes and web services can be used by any other component wishing to optimise BPEL processes. Apart from the three main implemented components, that are described below, also the rest of POS functionality like the Cost Functions or other utility classes can be utilized by various other sub-systems of a VRE environment. Interested developers should consult the POS API documentation, included in the POS binaries distribution, for further information.

BPEL Static optimisation

Below is an example of a simple Web Service client that uses the RewriterService to structurally optimise a BPEL document.

package org.diligentproject.process.optimisation.clients;

import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.net.MalformedURLException;
import java.rmi.RemoteException;

import javax.xml.rpc.ServiceException;

import org.diligentproject.process.optimisation.services.rewriter.stubs.service.RewriterServiceLocator;
import org.diligentproject.process.optimisation.services.rewriter.stubs.RewriterPortType;
import org.diligentproject.process.optimisation.services.rewriter.stubs.BPELParsingErrorFaultType;

public class RewriterTestClient {

	public RewriterTestClient() {
		super();
	}

	public static void main(String[] args) {
		if (args.length != 2) {
			System.out.println("Usage: RewriterTestClient serviceUrl bpelFilePath");
			System.exit(-1);
		}

		String fileName = args[1];

		System.out.println("Processing file " + fileName + "...");
		char arg[] = new char[1];

		try {
			FileReader fr = new FileReader(fileName);

			StringBuffer contents = new StringBuffer();

			try {
				while (fr.ready()) {
					fr.read(arg);
					contents.append(arg);
				}
			} catch (IOException e2) {
				e2.printStackTrace();
			}

			System.out.println("Input:");
			System.out.println(contents);

			RewriterServiceLocator locator = new RewriterServiceLocator();
			
			RewriterPortType rws = null;
			
			try {
                                // Get a reference to the service
				rws = locator.getRewriterPortTypePort(new java.net.URL(args[0]));
			} catch (MalformedURLException e1) {
				e1.printStackTrace();
			}
			System.out.println("Rewriting...");
			System.out.println("Output:");

			try {
                                /* Optimise the process and print the contents of the result BPEL
                                 * file
                                 */
				System.out.println(rws.optimiseProcess(contents.toString()));

                        /* If an error occurs a BPELParsingErrorFoultType
                         * is thrown by the stub
                         */
			} catch (BPELParsingErrorFaultType e) {
				System.out.println("Error optimizing BPEL process");
				e.printStackTrace();
			}
		} catch (FileNotFoundException e) {
			e.printStackTrace();
		}  catch (ServiceException e) {
			e.printStackTrace();
		}

	}
}

Process Pre-planning

The following client demonstrates how to use the PlannerService to produce a list of execution plans for a given abstract Process.

package org.diligentproject.process.optimisation.clients;

import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.net.MalformedURLException;
import java.rmi.RemoteException;
import java.util.Date;

import javax.xml.rpc.ServiceException;

import org.diligentproject.process.optimisation.services.planner.stubs.service.PlannerServiceLocator;
import org.diligentproject.process.optimisation.services.planner.stubs.PlannerPortType;

import org.diligentproject.process.optimisation.services.planner.stubs.PlanEntry;
import org.diligentproject.process.optimisation.services.planner.stubs.PlanList;
import org.diligentproject.process.optimisation.services.planner.stubs.Plan;
import org.diligentproject.process.optimisation.services.planner.stubs.RunningInstanceNotFoundFaultType;
import org.diligentproject.process.optimisation.services.planner.stubs.BPELParsingErrorFaultType;

/**
 * Client for testing the PlannerService
 */
public class PlannerTestClient {


	/**
	 * Usage: PlannerTestClient serviceUrl bpelFilePath
	 * 
	 * @param args CL args should be the 'serviceUrl' and 'bpelFilePath'
	 */
	public static void main(String[] args) {
		Date startTime;
		Date endTime;
		
		if (args.length != 2) {
			System.out.println("Usage: PlannerTestClient serviceUrl bpelFilePath");
			System.exit(-1);
		}

		String fileName = args[1];
		
		startTime = new Date(System.currentTimeMillis());		
		System.out.println("Starting at " + startTime);

		System.out.println("Processing file " + fileName + "...");
		char arg[] = new char[1];

		try {
			FileReader fr = new FileReader(fileName);

			StringBuffer contents = new StringBuffer();

			try {
				while (fr.ready()) {
					fr.read(arg);
					contents.append(arg);
				}
			} catch (IOException e) {
				e.printStackTrace();
			}

			System.out.println("Input:");
			System.out.println(contents);

			// Get the service from the given location
			PlannerServiceLocator locator = new PlannerServiceLocator();
			PlannerPortType planner = null;
			try {
				planner = locator.getPlannerPortTypePort(new java.net.URL(args[0]));
			} catch (MalformedURLException e) {
				e.printStackTrace();
			}

			System.out.println("Planning...");
			
                        /* The result of the planner is a list of scheduling plans sorted
                         * by their execution cost and stored in a PlanList object.
                         */
			PlanList planList = planner.createPlan(contents.toString());
			
			System.out.println("Output:");
			
                        /* The array of plans is a member field of the PlanList object
                         */
			Plan[] plans = planList.getPlan();
			
                        /* Parse the array and for each plan print the cost and
                         * the timestamp of its creation
                         */
			for(int i=0; i<plans.length ; i++) {
				System.out.println("Cost of plan[" + i + "] is " + plans[i].getCost());
				System.out.println("Created on "+new Date(plans[i].getTime().getTimeInMillis()));

				System.out.println("Contents: ");

                                /* Each entry in the plan is a PartnerLink of the initial BPEL process
                                 * that after the planning points to a concrete web service (Running Instance).
                                 */
				PlanEntry[] entries = plans[i].getPlanEntry();
				
                                /* For each scheduled partnerLink print all information stored by
                                 * the planner. In the example below we retrieve the endPoint of the
                                 * Web Service, the parnterLink name as it appears in the BPEL document,
                                 * the individual execution cost as calculated by the Planner, the name
                                 * of the service portType taken from the WSDL and the ID of the DHN
                                 * where this service is instantiated
                                 */
                                 
				for(int j=0; j<entries.length; j++) {
					System.out.println(entries[j].getEndPoint().toString()+", "
							+ entries[j].getPartnerLink() + ", "
							+ entries[j].getCost() + ", "
							+ entries[j].getPortTypeName() + ", "
							+ entries[j].getDhnID());
				}
			}
			
			endTime = new Date(System.currentTimeMillis());
			
			System.out.println("\nFinished at " + endTime);
			
			long sec = (endTime.getTime()-startTime.getTime())/1000;
			System.out.println("Total time: "+ sec+"sec");

                /* Below are all possible exceptions that can be thrown while
                 * running the above statements
                 */
		} catch (FileNotFoundException e) {
			e.printStackTrace();
		} catch (ServiceException e) {
			e.printStackTrace();
		} catch (RunningInstanceNotFoundFaultType rie) {
			rie.printStackTrace();
		} catch (BPELParsingErrorFaultType e) {
			e.printStackTrace();
		} catch (RemoteException e) {
			e.printStackTrace();
		}
	}
}

Dynamic planning using the ActivePlanner

[coming soon]