OpenSearch Framework

From Gcube Wiki
Jump to: navigation, search

Description

The role of the gCube OpenSearch Framework is to enable the gCube Framework to access external providers which publish their results through search engines conforming to the OpenSearch Specification. The framework consists of two components

  • The OpenSearch Library, which includes a general-purpose library and the OpenSearch Operator which utilizes functionality provided by the former, and
  • The OpenSearch Service, which binds collections with provider-specific information encapsulated in generic resources and invokes the OpenSearch Operator

To resolve ambiguity, the name "OpenSearch Library" will be used when referring to the whole OpenSearch Library component and the name "General-Purpose Library" will be used when referring to the library constituent of the component.

The OpenSearch Library

The General-Purpose OpenSearch Library

The General-Purpose OpenSearch Library conforms to the latest OpenSearch specification and provides general OpenSearch-related functionality any component which needs to query OpenSearch providers. The OpenSearch Operator, described in the following section functions atop this library.

The OpenSearch Operator

Description

The role of the OpenSearch operator is to provide support for querying and retrieval of search results via OpenSearch from providers which expose an OpenSearch description document. The operator accepts a set of query terms and parameters and an #OpenSearch Resource reference which contains the URL of an OpenSearch description document and various specifications relevant to the OpenSearch provider to be queried. After performing the number of OpenSearch queries required to obtain the desired results, it returns these results wrapping them in a ResultSet.

Extensibility Points

The operator introduces and makes use of a set of functionalities beyond those of the standard OpenSearch specification. These extensions are supported by the introduction of a special #OpenSearch Resource structure and by the internal logic of the operator, the latter using standard OpenSearch functionality provided by the general-purpose OpenSearch library. The extra functionalities are summarized as follows:

  • The support of data transformation by the operator. Provided that a transformation specification, in the form of an XPath-XSLT pair, is available for one of the MIME types of the results returned by an OpenSearch-enabled provider, the operator is able to return the obtained results transformed to the desired schema. There is also provision for the tagging of each record with a unique identifier extracted by the results and described by an additional optional XPath expression.
  • Both direct and brokered result processing is supported. Some OpenSearch-enabled providers diverge from the common case of returning a set of direct results and instead provide their results indirectly, by returning a set of links to other OpenSearch-enabled providers. Provided that both a transformation specification used to extract these links from the returned results as well as the OpenSearch resources for each one of the brokered OpenSearch services are available, the operator will return the full set of results provided by the brokered OpenSearch services.
  • The support of a set of fixed parameters, which override the user-provided parameters only at the level of the top provider, i.e either the broker or the only direct provider in the direct provider case.

The purpose of these parameters is to facilitate the creation of dynamic collections from results obtained by brokers by taking the fixed parameters into account while querying the broker and only the user defined parameters on lower levels and also to customize the behaviour of some provider to the needs of the gCube Framework (or both).

  • Support for one or more security schemes is planned for a subsequent version of the OpenSearch Library.

OpenSearch Resource

The purpose of an OpenSearch resource object is to describe the specifications of an OpenSearch provider. It encapsulates the extensions described in the #Extensibility Points section. The attributes included are the following:

  • The name of the resource
  • The URL of the OpenSearch Description Document of the provider to be queried
  • Information about whether the provider returns direct or brokered results, used by the operator to adapt its operation to both kinds of providers.
  • Data transformation specifications for a subset of the MIME types of the results which the result provider returns. The data transformation consists of two or, optionally, three parts:
    • The RecordSplitXPath expression is used to split a page of search results into individual records. For example for the rss format, the <item> elements under rss/channel could be of interest
    • The XSLTLink contains a pointer to an XSLT which is used to transform the individual records to the target schema.
    • The optional RecordIdXPath expression can be used to tag each record with a unique identifier, extracted from the record itself.
  • Security specifications (planned for a future version, when the supported security specifications are decided on). This element is optional, its absence implying the absence of a security scheme,

The serialization of an OpenSearch Resource can be easily incorporated into a Generic Resource. The default mode of operation for the OpenSearch Operator in fact obtains the necessary OpenSearch resources by retrieving the corresponding Generic Resources from the IS. There are two types of Generic Resources utilized by the OpenSearch Operator

  • The OpenSearchResource which contains the body of the OpenSearch Resource as described below
  • The OpenSearchXSLT which contains the XSLT portion of a transformation specification

The XSLT pointer, in this case, contains the name of the OpenSearchXSLT generic resource it points to.

Note that, solely for testing purposes, the OpenSearch Operator also supports a local mode of operation, whereby all OpenSearch Resources are loaded from the local file system. In that case, the XSLTLink element contains a URL pointing to the corresponding XSLT file.

The XML Schema that all OpenSearch Resource serializations should conform to is the following:

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xs:element name="OpenSearchResource">
  <xs:complexType>
   <xs:sequence>
    <xs:element name="name" type="xs:string"/>
    <xs:element name="descriptionDocumentURI" type="xs:string"/>
    <xs:element name="brokeredResults" type="xs:boolean"/>
    <xs:element name="transformation" maxOccurs="unbounded">
     <xs:complexType>
      <xs:sequence>
       <xs:element name="MIMEType" type="xs:string"/>
       <xs:element name="recordSplitXPath" type="xs:string"/>
       <xs:element name="recordIdXPath" type="xs:string" minOccurs="0" maxOccurs="1"/>
       <xs:element name="XSLTLink" type="xs:string"/>						
      </xs:sequence>
     </xs:complexType>
    </xs:element>
    <xs:element name="security" minOccurs="0">
     <xs:complexType>
      <xs:sequence>
      </xs:sequence>
     </xs:complexType>
    </xs:element>		
   </xs:sequence>
  </xs:complexType>
 </xs:element>
</xs:schema>

The transformation element can appear multiple times within an OpenSearch Resource. The usual case is for a single transformation element per provider to be specified, but if transformation elements are present for more than one MIME type, the operator has the alternative of resorting to the next available schema in sequence, in the event of a failure in the transformation phase.

In the case of querying providers which return brokered results, the transformation element is used to specify a data tranformation that extracts the URLs of the Description Documents of the brokered OpenSearch services from the initial results provided by the OpenSearch service acting as a broker.

OpenSearch Operator Logic

The functions performed by the operator in order for a set of results to be retrieved are summarized in the following simplified diagram

A simplified flowchart of the operations performed by the OpenSearch operator

As shown, the operator accepts a set of query terms and a set of query parameters.

The operator's main course of action is to formulate and send queries requesting pages of search results as long as there still are results to be returned and the user requirement of the number of results is not met. In the case of resources which return brokered results, the operator first retrieves the endpoints of the underlying brokered OpenSearch providers and reads their corresponding OpenSearch Resources so as to be able to loop through these resources while retrieving results. This function is implied in the diagram.

Furthermore, if an OpenSearch Resource structure is missing for one or more of the brokered services, the operator continues with the retrieval of results from the next available brokered service ignoring it if it cannot obtain information for it. The same holds if all query formulation attempts for a provider fail.

The OpenSearch Service