Search Management

From Gcube Wiki
Revision as of 13:17, 23 August 2007 by Gpapanikos (Talk | contribs) (Query Language)

Jump to: navigation, search

Search Management

Example Code

Search Management Services Usage Examples

Search Master

Introduction

The SearchMasterService is the main entry point to the functionality of the search engine. It contains the elements that will organize the execution of the search operators for the various tasks the Search engine is responsible for.

The SearchMasterService is responsible for the first stage of query processing. This stage produces a query execution plan, which in the DILIGENT implementation is a directed acyclic graph of SearchOperator invocations. This element is responsible for gathering the whole set of information that is expected to be needed by the various search services and provides it as context to the processed query. In this manner, delays for gathering info at the various services are significantly reduced and assist responsiveness.

The information gathered is produced by various components or services of the DILIGENT Infrastructure. They include the Diligent Information Service (DIS), Content and Metadata Management, Indexing service etc. The process of gathering all needed information proves to be very time consuming. To this end, the SearchMasterService keeps a cache of previously discovered information and state.

The SearchMaster validates the received Query using Search Library elements. It validates the user supplied query against the elements of the specific Digital Library Instance. This ensures that content collections are available, metadata elements (e.g. fields) are present, operators (i.e. services) are accessible etc.

DL Description

Through the Search Master, external services can receive a structured overview of the Digital Library resources available and usable during a search operation. An example of this summarization is shown bellow:

  <SearchConfig>
    <collections>
      <collection name="Example Collection Name 1" id="1fc1fbf0-fa3c-11db-82de-905c553f17c3">
        <TYPE>DATA</TYPE>
        <ASSOCIATEDWITH>d510a060-fa3c-11db-aa91-f715cb72c9ff</ASSOCIATEDWITH>
        <ASSOCIATEDWITH>g45612f7-dth5-23fg-45df-45dfg5b1r34s</ASSOCIATEDWITH>
      </collection>
      <collection name="Example Collection Name 2" id="c3f685b0-fdb6-11db-a573-e4518f2111ab">
        <TYPE>DATA</TYPE>
        <ASSOCIATEDWITH>7bb87410-fdb7-11db-8476-f715cb72c9ff</ASSOCIATEDWITH>
        <INDEX>FEATURE</INDEX>
      </collection>
      <collection name="Example Collection Name 3" id="d510a060-fa3c-11db-aa91-f715cb72c9ff">
        <TYPE>METADATA</TYPE>
        <LANGUAGE>en</LANGUAGE>
        <SCHEMA>dc</SCHEMA>
        <ASSOCIATEDWITH>1fc1fbf0-fa3c-11db-82de-905c553f17c3</ASSOCIATEDWITH>
        <INDEX>FTS</INDEX>
        <INDEX>XML</INDEX>
      </collection>
      <collection name="Example Collection Name 4" id="g45612f7-dth5-23fg-45df-45dfg5b1r34s">
        <TYPE>METADATA</TYPE>
        <LANGUAGE>en</LANGUAGE>
        <SCHEMA>tei</SCHEMA>
        <ASSOCIATEDWITH>1fc1fbf0-fa3c-11db-82de-905c553f17c3</ASSOCIATEDWITH>
        <INDEX>FTS</INDEX>
        <INDEX>XML</INDEX>
      </collection>
      <collection name="Example Collection Name 5" id="7bb87410-fdb7-11db-8476-f715cb72c9ff">
        <TYPE>METADATA</TYPE>
        <LANGUAGE>en</LANGUAGE>
        <SCHEMA>dc</SCHEMA>
        <ASSOCIATEDWITH>c3f685b0-fdb6-11db-a573-e4518f2111ab</ASSOCIATEDWITH>
        <INDEX>FTS</INDEX>
        <INDEX>XML</INDEX>
      </collection>
    </collections>
  </SearchConfig>
Query Language

<function> ::= <project_fun> | <sort_fun> | <filter_fun> | <merge_fun> | <join_fun> | <keeptop_fun> | <fulltexts_fun> | <fieldedsearch_fun> | <extsearch_fun> | <read_fun> | <similsearch_fun> | <spatialsearch_fun> | <retrieve_metadata_fun> <read_fun> ::= <read_fun_name> <epr> <read_fun_name> ::= 'read' <epr> ::= string <project_fun> ::= <project_fun_name> <by> <project_key> <project_source> <project_fun_name> ::= 'project' <project_key> ::= string <project_source> ::= <non_leaf_source> <sort_fun> ::= <sort_fun_name> <sort_order> <by> <sort_key> <sort_source> <sort_fun_name> ::= 'sort' <sort_key> ::= string <sort_order> ::= 'ASC' | 'DESC' <sort_source> ::= <non_leaf_source> <filter_fun> ::= <filter_fun_name> <filter_type> <by> <filter_statement> <filter_source> <filter_fun_name> ::= 'filter' <filter_type> ::= string <filter_statement> ::= string <filter_source> ::= <non_leaf_source> | <leaf_source> <merge_fun> ::= <merge_fun_name> <on> <merge_sources> <merge_fun_name> ::= 'merge' <merge_sources> ::= <merge_source> <and> <merge_source> <merge_sources2> <merge_sources2> ::= <and> <merge_source> <merge_sources2> | φ <merge_source> ::= <left_parenthesis> <function> <right_parenthesis> <join_fun> ::= <join_fun_name> <join_type> <by> <join_key> <on> <join_source> <and> <join_source> <join_fun_name> ::= 'join' <join_key> ::= string <join_type> ::= 'inner' | 'fullOuter' | 'leftOuter' | 'rightOuter' <join_source> ::= <left_parenthesis> <function> <right_parenthesis> <keeptop_fun> ::= <keeptop_fun_name> <keeptop_number> <keeptop_source> <keeptop_fun_name> ::= 'keeptop' <keeptop_number> ::= integer <keeptop_source> ::= <non_leaf_source> <fulltexts_fun> ::= <fulltexts_fun_name> <by> <fulltexts_term> <fulltexts_terms> <in> <language> <on> <fulltexts_sources> <fulltexts_fun_name> ::= 'fulltextsearch' <fulltexts_terms> ::= <comma> <fulltexts_term> <fulltexts_terms> | φ <fulltexts_sources> ::= <fulltexts_source> <fulltexts_sources_2> <fulltexts_sources_2> ::= <comma> <fulltexts_source> <fulltexts_source> | φ <fulltexts_source> ::= string <fieldedsearch_fun> ::= <fieldedsearch_fun_name> <by> <query> <fieldedsearch_source> <fieldedsearch_fun_name> ::= 'fieldedsearch' <query> ::= string <fieldedsearch_source> ::= <non_leaf_source> | <leaf_source> <extsearch_fun> ::= <extsearch_fun_name> <by> <extsearch_query> <on> <extsearch_source> <extsearch_fun_name> ::= 'externalsearch' <extsearch_query> ::= string <extsearch_source> ::= string <similsearch_fun> ::= <similaritysearch_fun_name> <as> <URL> <by> <pair> <pairs> <similarity_source> <similsearch_fun_name> ::= 'similaritysearch' <URL> ::= string <pair> ::= <feature> <equal> <weight> <pairs> ::= <and> <pair> <pairs> | φ <similarity_source> ::= <leaf_source> <if-syntax> ::= <if> <left_parenthesis> <function-st> <compare-sign> <function-st> <right_parenthesis> <then> <search-op> <else> <search-op> <compare-sign> ::= '==' | '>' | '<' | '>=' | '<=' <function-st> ::= <left-op> <math-op> <right-op> <math-op> ::= '+' | '-' | '*' | '/' <left-op> ::= <function> <left_parenthesis> <left-op> <right_parenthesis> | <literal> <function> ::= <max-fun> | <min-fun> | <sum-fun> | <av-fun> | <var-fun> | <size-fun> <max-fun> ::= 'max' <left_parenthesis> <xpath> <comma> <search-op> <right_parenthesis> <min-fun> ::= 'min' <left_parenthesis> <xpath> <comma> <search-op> <right_parenthesis> <sun-fun> ::= 'sum' <left_parenthesis> <xpath> <comma> <search-op> <right_parenthesis> <av-fun> ::= 'av' <left_parenthesis> <xpath> <comma> <search-op> <right_parenthesis> <var-fun> ::= 'var' <left_parenthesis> <xpath> <comma> <search-op> <right_parenthesis> <size-fun> ::= size' <left_parenthesis> <search-op> <right_parenthesis> <right-op> ::= <function-st> | <left-op> <xpath> ::= '<field selection through xpath>' <retrieve_metadata_fun> ::= <rm_fun_name> <in> <language> <on> <rm_source> <as> <schema> <rm_fun_name> ::= 'retrievemetadata' <schema> ::= string <rm_source> ::= <left_parenthesis> <function> <right_parenthesis> <spatialsearch_fun> ::= <spatialsearch_fun_name> <relation> <geometry> [<timeBoundary>] <spatial_source> <spatialsearch_fun_name> ::= 'spatialsearch' <relation> ::= {'intersects', 'contains', 'isContained'} <geometry> ::= <polygon_name> <left_parenthesis> <points> <right_parenthesis> <timeBoundary> ::= 'within' <startTime> <stopTime> <startTime> ::= double <stopTime> ::= double <spatial_source> ::= <leaf_source> <points> ::= <point> {<comma> <point>}+ <x> ::= integer <y> ::= integer

<leaf_source>  ::= [<in> <language>] <on>

Invalid language.

You need to specify a language like this: <source lang="html4strict">...</source>

Supported languages for syntax highlighting:

4cs, 6502acme, 6502kickass, 6502tasm, 68000devpac, abap, actionscript, actionscript3, ada, aimms, algol68, apache, applescript, arm, asm, asp, asymptote, autoconf, autohotkey, autoit, avisynth, awk, bascomavr, bash, basic4gl, bf, bibtex, blitzbasic, bnf, boo, c, caddcl, cadlisp, cfdg, cfm, chaiscript, chapel, cil, clojure, cmake, cobol, coffeescript, cpp, csharp, css, cuesheet, d, dart, dcl, dcpu16, dcs, delphi, diff, div, dos, dot, e, ecmascript, eiffel, email, epc, erlang, euphoria, ezt, f1, falcon, fo, fortran, freebasic, freeswitch, fsharp, gambas, gdb, genero, genie, gettext, glsl, gml, gnuplot, go, groovy, gwbasic, haskell, haxe, hicest, hq9plus, html4strict, html5, icon, idl, ini, inno, intercal, io, ispfpanel, j, java, java5, javascript, jcl, jquery, kixtart, klonec, klonecpp, latex, lb, ldif, lisp, llvm, locobasic, logtalk, lolcode, lotusformulas, lotusscript, lscript, lsl2, lua, m68k, magiksf, make, mapbasic, matlab, mirc, mmix, modula2, modula3, mpasm, mxml, mysql, nagios, netrexx, newlisp, nginx, nimrod, nsis, oberon2, objc, objeck, ocaml, octave, oobas, oorexx, oracle11, oracle8, oxygene, oz, parasail, parigp, pascal, pcre, per, perl, perl6, pf, php, pic16, pike, pixelbender, pli, plsql, postgresql, postscript, povray, powerbuilder, powershell, proftpd, progress, prolog, properties, providex, purebasic, pycon, pys60, python, q, qbasic, qml, racket, rails, rbs, rebol, reg, rexx, robots, rpmspec, rsplus, ruby, rust, sas, scala, scheme, scilab, scl, sdlbasic, smalltalk, smarty, spark, sparql, sql, standardml, stonescript, systemverilog, tcl, teraterm, text, thinbasic, tsql, typoscript, unicon, upc, urbi, uscript, vala, vb, vbnet, vbscript, vedit, verilog, vhdl, vim, visualfoxpro, visualprolog, whitespace, whois, winbatch, xbasic, xml, xpp, yaml, z80, zxbasic


 [<as> <schema>]
<non_leaf_source>  ::= <left_parenthesis> <function> <right_parenthesis>
<language>  ::= 'AFRIKAANS' | 'ARABIC' | 'AZERI' | 'BYELORUSSIAN' | 'BULGARIAN' | 'BANGLA' | 'BRETON' | 'BOSNIAN' | 'CATALAN' | 'CZECH' | 'WELSH' | 'DANISH' | 'GERMAN' | 'GREEK' | 'ENGLISH' | 'ESPERANTO' | 'SPANISH' | 'ESTONIAN' | 'BASQUE' | 'FARSI' | 'FINNISH' | 'FAEROESE' | 'FRENCH' | 'FRISIAN' | 'IRISH_GAELIC' | 'GALICIAN' | 'HAUSA' | 'HEBREW' | 'HINDI' | 'CROATIAN' | 'HUNGARIAN' | 'ARMENIAN' | 'INDONESIAN' | 'ICELANDIC' | 'ITALIAN' | 'JAPANESE' | 'GEORGIAN' | 'KAZAKH' | 'GREENLANDIC' | 'KOREAN' | 'KURDISH' | 'KIRGHIZ' | 'LATIN' | 'LETZEBURGESCH' | 'LITHUANIAN' | 'LATVIAN' | 'MAORI' | 'MONGOLIAN' | 'MALAY' | 'MALTESE' | 'NORWEGIAN_BOKMAAL' | 'DUTCH' | 'NORWEGIAN_NYNORSK' | 'POLISH' | 'PASHTO' | 'PORTUGUESE' | 'RHAETO_ROMANCE' | 'ROMANIAN' | 'RUSSIAN' | 'SAMI_NORTHERN' | 'SLOVAK' | 'SLOVENIAN' | 'ALBANIAN' | 'SERBIAN' | 'SWEDISH' | 'SWAHILI' | 'TAMIL' | 'THAI' | 'FILIPINO' | 'TURKISH' | 'UKRAINIAN' | 'URDU' | 'UZBEK' | 'VIETNAMESE' | 'SORBIAN' | 'YIDDISH' | 'CHINESE_SIMPLIFIED' | 'CHINESE_TRADITIONAL' | 'ZULU'
<source> ::= string
<schema>  ::= string
<left_parenthesis> ::= '('
<right_parenthesis> ::= ')'
<comma> ::= ','
<and> ::= 'and'
<on> ::= 'on'
<as> ::= 'as'
<by> ::= 'by'
<sort_by> ::= 'sort'
<from> ::= 'from'
<if> ::= 'if'
<then> ::= 'then'
<else> ::= 'else'

==== Search Manager ====
===== Introduction =====
An alternative entry point to the Search functionality is the SearchManager Service. This service provides an abstraction over the SearchMasterService enabling non-blocking query submission and result retrieval. Through this service the client is capable of submitting a query, checking the execution progress of a specific query and finally retrieving the endpoint reference of the results. 

Upon submission of a query, the SearchManager Service creates a resource that is the placeholder of the query’s status. The end point reference (EPR) of this resource is returned to the client, so as to be able to retrieve, at a future time, the relevant information regarding the progress of the query. Internally the SearchManager Service has to: 
*Communicate with a SearchMaster Service and wait for the search operation to terminate.
*Return the EPR of a status resource that actually shows that the query is queued.
To this end, this service spawns a new thread that hides the blocking functionality of the SearchMaster Service. The above mentioned resource contains the status of the search request and the endpoint reference of the final results, if available. This resource is retrieved each time a client asks for the status of his request, through its corresponding EPR.