Search Planner

From Gcube Wiki
Revision as of 17:03, 22 July 2011 by Vassilis.verroios (Talk | contribs) (Search Planner)

Jump to: navigation, search

Search Planner

The outcome of a CQL query is a set of documents that satisfy the criteria defined by the query. Documents are hosted in a gCube infrastructure by Data Sources. Data Sources are able to execute CQL queries, bound to the information they host, and each Data Source supports different CQL capabilities. In most cases the initial CQL query can not be answered solely by a single Data Source. Therefore we need to use the functionality of Search Operators and combine the sets of documents retrieved from various Data Sources. The Search Planner detects which are the Data Sources that must be involved in order to answer the initial query and produces a plan. This plan specifies a)the subquery of the initial query that must be answered by each Data Source and b)how the Data Sources are combined with Search Operators in order to produce the final outcome.

Search Planner takes into account the following facts about the gCube environment:

  • Vertical and Horizontal partitioning of data
  • Data Sources CQL capabilities
  • The costs implied by the execution of a search plan

The information for one document is distributed across multiple Data Sources. We define this information partitioning as the Vertical Data Partitioning. Documents are also divided into collections (see OCMA), forming in such a way a Horizontal Data Partitioning. Multiple Horizontal partitions are hosted by each Data Source. For the example of Figure 1 assume that the documents are divided into collections A, B, C, D, E, which constitute the Horizontal Data Partitioning. The header, type, and location for each document are hosted into different Data Sources, forming the Vertical Data Partitioning. Assume also that DataSource ABh hosts the header information for the documents of collections A and B, source ABt hosts the type information for collections A and B, and ABl hosts the location for documents of A and B. Following the same fashion sources CDh, CDt, CDl, Eh, Et and El host the information for the header, type and location of the documents of collections C, D and E.

Figure 1. CQL query before the first stage of planning