OpenSearch Framework

From Gcube Wiki
Jump to: navigation, search

Description

The role of the OpenSearch operator is to provide support for querying and retrieval of search results via OpenSearch from resources which expose an OpenSearch description document. The operator accepts a set of query terms and parameters and an #OpenSearch Resource reference which contains the URL of an OpenSearch description document and various specifications relevant to the OpenSearch service to be queried. After performing the number of OpenSearch queries required to obtain the desired results, it returns these results wrapping them in a ResultSet.

Extensibility Points

The operator introduces and makes use of a set of functionalities beyond those of the standard OpenSearch specification. These extensions are supported by the introduction of a special #OpenSearch Resource structure and by the internal logic of the operator, the latter using standard OpenSearch functionality provided by a general-purpose OpenSearch library. The extra functionalities are summarized as follows:

  • The support of data transformation by the operator. Provided that a transformation specification, in the form of an XPath-XSLT pair, is available for one of the MIME types of the results returned by an OpenSearch-enabled search service, the operator is able to return the obtained results in a form suitable for further processing.
  • Both direct and brokered result processing is supported. Some OpenSearch-enabled services diverge from the common case of returning a set of direct results and instead provide their results indirectly, by returning a set of links to other OpenSearch-enabled services. Provided that both a transformation specification used to extract these links from the returned results as well as the OpenSearch resources for each one of the brokered OpenSearch services are available, the operator will return the full set of results provided by the brokered OpenSearch services.
  • The operator will support one or more security schemes.

OpenSearch Resource

The purpose of an OpenSearch resource object is to describe the specifications of an OpenSearch resource. It encapsulates the extensions described in the #Extensibility Points section. Among the attributes included are:

  • The name of the resource
  • The URL of the OpenSearch Description Document of the service to be queried
  • Information about whether the service provides direct or brokered results
  • Data transformation specifications for a subset of the MIME types of the results which the result service returns
  • Security specifications (to be added soon)

The OpenSearch Resource structure can be serialized to XML conforming to the following XML Schema:

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xs:element name="OpenSearchResource">
  <xs:complexType>
   <xs:sequence>
    <xs:element name="name" type="xs:string"/>
    <xs:element name="DDUrl" type="xs:string"/>
    <xs:element name="brokeredResults" type="xs:boolean"/>
    <xs:element name="transformation" maxOccurs="unbounded">
     <xs:complexType>
      <xs:sequence>
       <xs:element name="MIMEType" type="xs:string"/>
       <xs:element name="recordXPath" type="xs:string"/>
       <xs:element name="XSLTUrl" type="xs:string"/>						
      </xs:sequence>
     </xs:complexType>
    </xs:element>
    <xs:element name="security" minOccurs="0">
     <xs:complexType>
      <xs:sequence>
      </xs:sequence>
     </xs:complexType>
    </xs:element>		
   </xs:sequence>
  </xs:complexType>
 </xs:element>
</xs:schema>

The use of the name and DDUrl elements is straightforward, with the former providing a textual representation of the resource and the latter pointing to the URL of the description document of the service to be queried. The purpose of the brokeredResults element is to inform the operator about whether the service returns direct or brokered results, so that it can adapt its operation to both kinds of services.

In the case of querying services which return direct results (that is, if brokeredResults equals to false), the transformation element provides a way of specifying a way for result data tranformation to a form suitable for the user's needs and can appear an unlimited number of times, one for each possible result MIME type. Just one transformation specification should suffice for the retrieval of results of a given form, but if more are present the operator will have alternatives in the event of transformation failure. Result type preference is also supported by having multiple specifications available, with the first one appearing in the element sequence enjoying highest preference by the operator. The transformation specification consists of two elements: The RecordXPath element describes an XPath expression used to extract records from the results which the search service returns, and the value of the XSLTUrl element points to the description of an XSLT which is used to transform each record to the desired form.

In the case of querying services which return brokered results, the transformation element is used to to specify a data tranformation that extracts the URLs of the Description Documents of the brokered OpenSearch services from the initial results provided by the OpenSearch service acting as a broker.

Note that the security element is empty. This is subject to change, as the schema of the OpenSearch Resource will be updated as soon as the security specifications are decided on. Moreover, as this element is optional, its absence implies the absence of a security scheme.

OpenSearch Operator Logic

The functions performed by the operator are summarized in the following simplified diagram

A simplified flowchart of the operations performed by the OpenSearch operator

As shown, the operator accepts a set of query terms and a set of parameters. A trivial example of a query parameter is the required number of results, while support for other parameters will be taken into account.

The operator's main course of action is to formulate and send queries requesting pages of search results as long as there still are results to be returned and the user requirement of the number of results is not met. In the case of resources which return brokered results, the operator first retrieves the set of brokered OpenSearch services and reads their corresponding OpenSearch Resources so as to be able to loop through these resources while retrieving results. This function is implied in the diagram.

Furthermore, if an OpenSearch Resource structure is missing for one or more of the brokered services, the operator continues with the retrieval of results from the next available brokered service ignoring it if it cannot obtain information for it. The same holds if all data transformations specified in a Resource fail.