Difference between revisions of "Search Operators"

From Gcube Wiki
Jump to: navigation, search
(Description)
(Description)
Line 97: Line 97:
 
Language Semantics: In order for the evaluator to produce a valid result, the filtering expression should contain at least one select or filter function. Besides that, the expression can contain any possible mathematical expression. More precisely, the evaluator supports the most frequently used operators (!, +, -, *, /, ^, <, >, =, !=) and functions (sin, cos, tan, ln, log, exp, sqrt, abs, rand, mod, ...). Also, users are free to define their own temporary variables. However, the variable names of the leaf element names (leaf elements are the XML elements which do not have any child elements, but plain text values) cannot be redefined, cause they are automatically defined by the evaluator and initialized to their values, which can be either strings or numerics (doubles). For further information about the available functions and operators, see org.nfunk.jep The syntax of our custom functions is the following:
 
Language Semantics: In order for the evaluator to produce a valid result, the filtering expression should contain at least one select or filter function. Besides that, the expression can contain any possible mathematical expression. More precisely, the evaluator supports the most frequently used operators (!, +, -, *, /, ^, <, >, =, !=) and functions (sin, cos, tan, ln, log, exp, sqrt, abs, rand, mod, ...). Also, users are free to define their own temporary variables. However, the variable names of the leaf element names (leaf elements are the XML elements which do not have any child elements, but plain text values) cannot be redefined, cause they are automatically defined by the evaluator and initialized to their values, which can be either strings or numerics (doubles). For further information about the available functions and operators, see org.nfunk.jep The syntax of our custom functions is the following:
  
BNF Syntax
+
  BNF Syntax
 +
    <custom_functions> ::= <filter_fun> <do_fun> <like_fun> <in_fun> <select_fun>
 +
    <filter_fun> ::= <filter_fun_name> <left_parenthesis> <boolean_expression> <right_parenthesis>
 +
    <filter_fun_name> ::= 'filter'
 +
    <do_fun> ::= <do_fun_name> <left_parenthesis> <do_arguments> <right_parenthesis>
 +
    <do_fun_name> ::= 'do'
 +
    <do_arguments> ::= <do_argument> <do_args>
 +
    <do_args> ::= <comma> <do_arguments> | EMPTY
 +
    <do_argument> ::= <expression>
 +
    <like_fun> ::= <like_fun_name> <left_parenthesis> <like_object> <comma> <regular_expression> <right_parenthesis>
 +
    <like_fun_name> ::= 'like'
 +
    <like_object> ::= <element> | <attribute>
 +
    <regular_expression> ::= (see java.util.regexp.Pattern)
 +
    <element> ::= String
 +
    <attribute> ::= <element> '_' <attribute_name>
 +
    <attribute_name> ::= String
 +
    <in_fun> ::= <in_fun_name> <left_parenthesis> <object> <comma> <lower_bound> <comma> <upper_bound> <right_parenthesis>
 +
    <in_fun_name> ::= 'in'
 +
    <lower_bound> ::= Numeric
 +
    <upper_bound> ::= Numeric
 +
    < object> ::= <user_defined_variable> | <bound_variable>
 +
    <bound_variable> ::= <attribute> | <element>
 +
    <user_defined_variable> ::= (any instantiated variable, e.g. a=2)
 +
    <select_fun> ::= <select_fun_name> <left_parenthesis> <select_object_list> <right_parenthesis>
 +
    <select_fun_name> ::= 'select'
 +
    <select_object_list> ::= <bound_variable> <select_args>
 +
    <select_args> ::= <comma> <select_object_list>
  
<custom_functions> ::= <filter_fun> <do_fun> <like_fun> <in_fun> <select_fun>
 
 
<filter_fun> ::= <filter_fun_name> <left_parenthesis> <boolean_expression> <right_parenthesis>
 
 
<filter_fun_name> ::= 'filter'
 
 
<do_fun> ::= <do_fun_name> <left_parenthesis> <do_arguments> <right_parenthesis>
 
 
<do_fun_name> ::= 'do'
 
 
<do_arguments> ::= <do_argument> <do_args>
 
 
<do_args> ::= <comma> <do_arguments> | EMPTY
 
 
<do_argument> ::= <expression>
 
 
<like_fun> ::= <like_fun_name> <left_parenthesis> <like_object> <comma> <regular_expression> <right_parenthesis>
 
 
<like_fun_name> ::= 'like'
 
 
<like_object> ::= <element> | <attribute>
 
 
<regular_expression> ::= (see java.util.regexp.Pattern)
 
 
<element> ::= String
 
 
<attribute> ::= <element> '_' <attribute_name>
 
 
<attribute_name> ::= String
 
 
<in_fun> ::= <in_fun_name> <left_parenthesis> <object> <comma> <lower_bound> <comma> <upper_bound> <right_parenthesis>
 
 
<in_fun_name> ::= 'in'
 
 
<lower_bound> ::= Numeric
 
 
<upper_bound> ::= Numeric
 
 
< object> ::= <user_defined_variable> | <bound_variable>
 
 
<bound_variable> ::= <attribute> | <element>
 
 
<user_defined_variable> ::= (any instantiated variable, e.g. a=2)
 
 
<select_fun> ::= <select_fun_name> <left_parenthesis> <select_object_list> <right_parenthesis>
 
 
<select_fun_name> ::= 'select'
 
 
<select_object_list> ::= <bound_variable> <select_args>
 
 
<select_args> ::= <comma> <select_object_list>
 
 
===== Dependencies =====
 
===== Dependencies =====
 
*jdk 1.5
 
*jdk 1.5

Revision as of 12:12, 23 August 2007

Search Operators

Introduction

The Search Operator family of services are the building blocks of any search operation. These along with external to the Search services handle the production, filtering and refinement of available data according to the user queries. The various intermediate steps towards producing the final search output are handled by Search Operator services. In this section we will only describe the Search Service internal Services listed below, although the Search Operator Framework reaches out to "integrate" on a high level other services too that can be utilized within a Search operation context.

The following operators are implemented as stateless services. They receive their input and produce their output in the context of a single invocation without holding any intermediate state. In case any data transferring is necessary either as input to a service or as output from the processing, the ResultSet Framework is employed.

The search operators cover the basic functionality that could be encountered in a typical search operation. A search can be decomposed in undividable units consisting of the above operators and their interaction can construct a workflow producing the net result delivered to the requester. The external source search and the service invocation services provide some extendibility for future operators by offering a method for invoking an “unknown” to the Search framework service, importing its results to the search operator workflow. The distinguished search operators at present time are listed below.

Example Code

Search Operators Usage Examples

Operators

BooleanOperator

Description

The Boolean Operator is used in conditional execution and more specifically, in evaluating the condition. So, it actually offers the ability of selecting alternative execution plans. For example, one can follow a plan (let’s say a projection on a given field of a set of data), if a given precondition is valid; otherwise, she may follow the alternative plan (e.g. a projection on another field of the same set of data and then sort on the field). The precondition validation is the responsibility of this Service.

The condition is a Boolean expression. Basically, it involves comparisons using the operations: equal, not_equal, greater_than, lower_than, greater_equal, lower_equal. The comparing parts are either literals (date, string, integer, double literals are supported) or aggregate functions on the results of a search service execution. These aggregate functions include max, min, average, size, sum and they can be applied to a given field of the result set of a search service execution, by referring to that field employing xPath expressions.

Dependencies
  • jdk 1.5
  • WS-Core 4.0.4
  • ResultSetClientLibrary
  • SearchLibrary

FilterResultSetByXPathOperator

Description

The role of the FilterResultSetByXPath Operator is to perform search through an expression to be evaluated against an XML structure. Such an expression could be an xPath query. The XML structure against which the expression is to be evaluated is a ResultSet, previously constructed by an other operator or complete search execution. The result of the operation is a new ResultSet and the end point reference to this is returned to the caller.

Dependencies
  • jdk 1.5
  • WS-Core 4.0.4
  • ResultSetClientLibrary
  • SearchLibrary

JoinInnerOperator

Description

The role of the JoinResultSetService is to perform a join operation on a specific field using a set of ResultSets whose end point references are provided. This operation produces a new ResultSet, leaving the input untouched. The newly created ResultSet is wrapped around a WS-Resource and its endpoint reference is returned to the caller. An in memory hash – join algorithm has been implemented to perform the Joining functionality.

Dependencies
  • jdk 1.5
  • WS-Core 4.0.4
  • ResultSetClientLibrary
  • SearchLibrary

KeepTopOperator

Description

The role of the KeepTop Operator is to perform a simple filtering operation on its input ResultSet and to produce as output a new ResultSet that holds a defined number of leading records.

Dependencies
  • jdk 1.5
  • WS-Core 4.0.4
  • ResultSetClientLibrary
  • SearchLibrary

MergeOperator

Description

The role of the Merge Operator is to perform a merge operation using a set of ResultSets whose end point references are provided. This operation produces a new ResultSet leaving the input untouched. The newly created ResultSet is wrapped around a WS-Resource and its endpoint reference is returned to the caller.

Dependencies
  • jdk 1.5
  • WS-Core 4.0.4
  • ResultSetClientLibrary
  • SearchLibrary

QueryExtSourceOperatorGoogle

Description

The role of the QueryExtSourceOperatorGoogle is to redirect a query to the Google search engine through its Web Service interface and wrapping the output produced by the external service in a ResultSet, whose endpoint reference returns to its caller. The above mentioned functionality is supported by elements residing in the SearchLibrary.

Dependencies
  • jdk 1.5
  • WS-Core 4.0.4
  • ResultSetClientLibrary
  • SearchLibrary

QueryExtSourceOperatorJDBC

Description

The role of the QueryExtSourceOperatorJDBC is to redirect a query to an external search engine through a JDBC interface after completing appropriate actions. These actions include getting the attributes of the external service, submitting the query to the external service and wrapping the output produced by the external service in a ResultSet, whose endpoint reference returns to its caller. The above mentioned functionality is supported by elements residing in the SearchLibrary. Query String Example:

  <root>
    <driverName>your jdbc driver</driverName>
    <connectionString>your jdbc connection string</connectionString>
    <query>your sql queryt</query>
  </root>
Dependencies
  • jdk 1.5
  • WS-Core 4.0.4
  • ResultSetClientLibrary
  • SearchLibrary

QueryExtSourceOperatorOSIRIS

Description

The role of the QueryExtSourceOperatorOSIRIS is to redirect a query to the external content based search engine provided by the ISIS/OSIRIS service through its http interface after completing appropriate actions. These actions include getting the attributes of the external service, submitting the query to the external service and wrapping the output produced by the external service in a ResultSet, whose endpoint reference returns to its caller. The above mentioned functionality is supported by elements residing in the SearchLibrary. Query string example:

  <root>
    <collection>your osiris collection</collection>
    <imageURL>your image URL to be searched for similar images</imageURL>
    <numberOfResults>the number of results</numberOfResults>
  </root>
Dependencies
  • jdk 1.5
  • WS-Core 4.0.4
  • ResultSetClientLibrary
  • SearchLibrary

ScannerOperator

Description

The Scanner Operator defines and provides a generic methodology of scanning through a result set, which is produced by another search operation service. It provides the ability to filter records, retrieve and update element/attributes values and remove selected elements/attributes. For this purpose it employs a formal function-like mathematical language, which is used for defining the operation on a given result set. The evaluation is done by an external package called JEP, which is a parser for mathematical expressions with the additional ability of defining new custom functions. Taking this ability into consideration, we have introduced some functions (do, like, filter, in, select), in order to provide a full-fledged filtering language. More analytically, the do function gets an arbitrary number of arguments and evaluates them. The filter function receives a boolean expression. If it is true then the respective result set record is removed from the derived result set. The like function performs a pattern matching and returns the boolean result of the matching. The in function determines whether a variable is in a given range of numeric values. Finally the select function selects specific elements|attributes to be included in the new result set.

Language Semantics: In order for the evaluator to produce a valid result, the filtering expression should contain at least one select or filter function. Besides that, the expression can contain any possible mathematical expression. More precisely, the evaluator supports the most frequently used operators (!, +, -, *, /, ^, <, >, =, !=) and functions (sin, cos, tan, ln, log, exp, sqrt, abs, rand, mod, ...). Also, users are free to define their own temporary variables. However, the variable names of the leaf element names (leaf elements are the XML elements which do not have any child elements, but plain text values) cannot be redefined, cause they are automatically defined by the evaluator and initialized to their values, which can be either strings or numerics (doubles). For further information about the available functions and operators, see org.nfunk.jep The syntax of our custom functions is the following:

  BNF Syntax
    <custom_functions> ::= <filter_fun> <do_fun> <like_fun> <in_fun> <select_fun>
    <filter_fun> ::= <filter_fun_name> <left_parenthesis> <boolean_expression> <right_parenthesis>
    <filter_fun_name> ::= 'filter'
    <do_fun> ::= <do_fun_name> <left_parenthesis> <do_arguments> <right_parenthesis>
    <do_fun_name> ::= 'do'
    <do_arguments> ::= <do_argument> <do_args>
    <do_args> ::= <comma> <do_arguments> | EMPTY
    <do_argument> ::= <expression>
    <like_fun> ::= <like_fun_name> <left_parenthesis> <like_object> <comma> <regular_expression> <right_parenthesis>
    <like_fun_name> ::= 'like'
    <like_object> ::= <element> | <attribute>
    <regular_expression> ::= (see java.util.regexp.Pattern)
    <element> ::= String
    <attribute> ::= <element> '_' <attribute_name>
    <attribute_name> ::= String
    <in_fun> ::= <in_fun_name> <left_parenthesis> <object> <comma> <lower_bound> <comma> <upper_bound> <right_parenthesis>
    <in_fun_name> ::= 'in'
    <lower_bound> ::= Numeric
    <upper_bound> ::= Numeric
    < object> ::= <user_defined_variable> | <bound_variable>
    <bound_variable> ::= <attribute> | <element>
    <user_defined_variable> ::= (any instantiated variable, e.g. a=2)
    <select_fun> ::= <select_fun_name> <left_parenthesis> <select_object_list> <right_parenthesis>
    <select_fun_name> ::= 'select'
    <select_object_list> ::= <bound_variable> <select_args>
    <select_args> ::= <comma> <select_object_list>
Dependencies
  • jdk 1.5
  • WS-Core 4.0.4
  • ResultSetClientLibrary
  • SearchLibrary

SortOperator

Description

The role of the Sort Operator is to sort the provided ResultSet using as key a specific field. This operation produces a new ResultSet leaving the input untouched. The newly created ResultSet is wrapped around a WS-Resource and its end point reference is returned to the caller. The algorithm used is merge sort. The comparison rules differ depending on the type of the elements to be sorted.

Dependencies
  • jdk 1.5
  • WS-Core 4.0.4
  • ResultSetClientLibrary
  • SearchLibrary

TransformByXSLTOperator

Description

The role of the TransformByXSLT Operator is to transform a ResultSet it receives as input from one schema to another through a transformation technology such as XSL / XSLT. These transformations are directly supplied as input to the service. The output of the transformation, which could be a projection of the initial ResultSet, is a new ResultSet wrapped in a WS-Resource whose endpoint reference is returned to the caller.

Dependencies
  • jdk 1.5
  • WS-Core 4.0.4
  • ResultSetClientLibrary
  • SearchLibrary