https://wiki.gcube-system.org/index.php?title=Distributed_Information_Retrieval_Support_Framework&feed=atom&action=historyDistributed Information Retrieval Support Framework - Revision history2024-03-29T08:43:40ZRevision history for this page on the wikiMediaWiki 1.25.1https://wiki.gcube-system.org/index.php?title=Distributed_Information_Retrieval_Support_Framework&diff=25278&oldid=prevLeonardo.candela at 18:00, 6 July 20162016-07-06T18:00:27Z<p></p>
<table class='diff diff-contentalign-left'>
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 18:00, 6 July 2016</td>
</tr><tr><td colspan="2" class="diff-lineno" id="L1" >Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">[[Category:TO BE REMOVED]]</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>Distributed Information Retrieval (DIR) subsumes functionalities that seek to increase the efficiency and the effectiveness with which unstructured queries are evaluated across multiple collections.  </div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>Distributed Information Retrieval (DIR) subsumes functionalities that seek to increase the efficiency and the effectiveness with which unstructured queries are evaluated across multiple collections.  </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>Given a set of such queries and two or more target collections for their evaluation, there are three main functionalities that fall squarely within the scope of DIR:</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>Given a set of such queries and two or more target collections for their evaluation, there are three main functionalities that fall squarely within the scope of DIR:</div></td></tr>
</table>Leonardo.candelahttps://wiki.gcube-system.org/index.php?title=Distributed_Information_Retrieval_Support_Framework&diff=6290&oldid=prevFabio.simeoni at 15:00, 23 March 20092009-03-23T15:00:28Z<p></p>
<table class='diff diff-contentalign-left'>
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 15:00, 23 March 2009</td>
</tr><tr><td colspan="2" class="diff-lineno" id="L8" >Line 8:</td>
<td colspan="2" class="diff-lineno">Line 8:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* '''collection description''': the synthesis and maintenance over time of summary information about the content of target collections, from partial inverted indices and language models to result traces for training or past queries. Collection description is only of indirect interest in the context of query evaluation and thus is not exposed by the Master service. Its output forms the basis upon which collections are ranked before being selected for query evaluation. It may also be required to normalise the scores with which the partial results of query evaluation are finally merged.</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* '''collection description''': the synthesis and maintenance over time of summary information about the content of target collections, from partial inverted indices and language models to result traces for training or past queries. Collection description is only of indirect interest in the context of query evaluation and thus is not exposed by the Master service. Its output forms the basis upon which collections are ranked before being selected for query evaluation. It may also be required to normalise the scores with which the partial results of query evaluation are finally merged.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The '''DIR Master''' service defines offers reference implementations of these functionalities <del class="diffchange diffchange-inline">and can be dynamically extended to expose additional implementations and to autonomically select among all the available implementations</del>.</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The '''DIR Master''' service defines offers reference implementations of these functionalities.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>=Architecture=</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>=Architecture=</div></td></tr>
<tr><td colspan="2" class="diff-lineno" id="L30" >Line 30:</td>
<td colspan="2" class="diff-lineno">Line 30:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>Masters for collection sets of zero or more collections are created in response to client requests to the <code>Factory</code> [[#The Factory|port-type]]. After the binding, collections can be added and removed to and from the bound collection sets via operations of the Master interface. For each bound collection, Masters interact with Index services in an attempt to localise information about the content of the collection (interactions are based on best-effort and caching strategies). In the reference implementation, the information forms  a ''term histogram'' of the collection, i.e. a dictionary of the stems of the most content-bearing words of the collection content, each annotated with their frequency of occurrence within the collection.  </div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>Masters for collection sets of zero or more collections are created in response to client requests to the <code>Factory</code> [[#The Factory|port-type]]. After the binding, collections can be added and removed to and from the bound collection sets via operations of the Master interface. For each bound collection, Masters interact with Index services in an attempt to localise information about the content of the collection (interactions are based on best-effort and caching strategies). In the reference implementation, the information forms  a ''term histogram'' of the collection, i.e. a dictionary of the stems of the most content-bearing words of the collection content, each annotated with their frequency of occurrence within the collection.  </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The localised information is stored in a cross-collection inverted index, the ''master index'', and made available during selection and fusion. Collection selection is driven by input parameters, primarily a query and a selection criterion to apply over a ranking of the collections produced with respect to the query. In the reference implementation, the query is a bag of keywords and the ''collection retrieval inference network'' (CORI) is used for collection ranking <ref>J. Callan. <em>Advances in Information Retrieval</em>, chapter: Distributed Information Retrieval, pages 127–150.Kluwer Academic Publishers, 2000.</ref>. CORI is a collection-level generalisation of the Bayesian Inference Network probabilistic model of retrieval for text documents; in the model, probabilities are based upon statistics that are analogous to term occurrence frequency (tf) and inverse document frequency (idf) in classic document retrieval. In particular, term frequency is replaced with ''document frequency'' (df) and ''inverse document frequency'' is replaced with ''inverse collection frequency'' (icf). Three selection criteria may then be chosen to return the first ''n'' target collections in the ranking. In the ''TopN'' criterion, ''n'' is an absolute constant. In the ''BestScores'' criterion, ''n'' is derived with respect to a threshold on the relevance score. In the ''ResultDistribution'' criterion, ''n'' is derived from a distribution of the number of results to be returned to the user.</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The localised information is stored in a cross-collection inverted index, the ''master index'', and made available during selection and fusion. Collection selection is driven by input parameters, primarily a query and a selection criterion to apply over a ranking of the collections produced with respect to the query. In the reference implementation, the query is a bag of keywords and the <ins class="diffchange diffchange-inline">'</ins>''collection retrieval inference network<ins class="diffchange diffchange-inline">'</ins>'' (<ins class="diffchange diffchange-inline">'''</ins>CORI<ins class="diffchange diffchange-inline">'''</ins>) is used for collection ranking <ref>J. Callan. <em>Advances in Information Retrieval</em>, chapter: Distributed Information Retrieval, pages 127–150.Kluwer Academic Publishers, 2000.</ref>. CORI is a collection-level generalisation of the Bayesian Inference Network probabilistic model of retrieval for text documents; in the model, probabilities are based upon statistics that are analogous to term occurrence frequency (tf) and inverse document frequency (idf) in classic document retrieval. In particular, term frequency is replaced with ''document frequency'' (df) and ''inverse document frequency'' is replaced with ''inverse collection frequency'' (icf). Three selection criteria may then be chosen to return the first ''n'' target collections in the ranking. In the ''TopN'' criterion, ''n'' is an absolute constant. In the ''BestScores'' criterion, ''n'' is derived with respect to a threshold on the relevance score. In the ''ResultDistribution'' criterion, ''n'' is derived from a distribution of the number of results to be returned to the user.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>As to result fusion, this is driven by an input query and two or more references to typically remote resultsets. The reference implementation guarantees a consistent merging of the resultsets by ''re-computing'' results scores with exact, non-heuristic techniques. In particular, Masters rely on: (i) global collection content statistics available in the master index, and (ii) term occurrence statistics for each result (such as the number of terms in the corresponding document and the frequency with which the query terms occur in the document). Effectively, the availability of collection-wide and result-wide statistical information allows the service to consistently re-rank the virtual collection comprised of all the documents identified in the result set.</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>As to result fusion, this is driven by an input query and two or more references to typically remote resultsets. The reference implementation guarantees a <ins class="diffchange diffchange-inline">'''</ins>consistent merging<ins class="diffchange diffchange-inline">''' </ins>of the resultsets by ''re-computing'' results scores with exact, non-heuristic techniques. In particular, Masters rely on: (i) global collection content statistics available in the master index, and (ii) term occurrence statistics for each result (such as the number of terms in the corresponding document and the frequency with which the query terms occur in the document). Effectively, the availability of collection-wide and result-wide statistical information allows the service to consistently re-rank the virtual collection comprised of all the documents identified in the result set.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>At the time and within the scope of their creation, a selection of the state of Masters is published in the Information System. In WSRF terminology, these are the Resource Properties that identify the target collections and by which Masters can be discovered by clients. The reference implementation publishes only the identifiers of the bound collection set.</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>At the time and within the scope of their creation, a selection of the state of Masters is published in the Information System. In WSRF terminology, these are the Resource Properties that identify the target collections and by which Masters can be discovered by clients. The reference implementation publishes only the identifiers of the bound collection set.</div></td></tr>
<tr><td colspan="2" class="diff-lineno" id="L44" >Line 44:</td>
<td colspan="2" class="diff-lineno">Line 44:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>The public interface of the <code>Factory</code> port-type can be found [[Master Factory WSDL | here]].</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>The public interface of the <code>Factory</code> port-type can be found [[Master Factory WSDL | here]].</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">{{UnderUpdate}}</del></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"><references/></ins></div></td></tr>
</table>Fabio.simeonihttps://wiki.gcube-system.org/index.php?title=Distributed_Information_Retrieval_Support_Framework&diff=6289&oldid=prevFabio.simeoni: /* Masters */2009-03-23T14:45:15Z<p><span dir="auto"><span class="autocomment">Masters</span></span></p>
<table class='diff diff-contentalign-left'>
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 14:45, 23 March 2009</td>
</tr><tr><td colspan="2" class="diff-lineno" id="L30" >Line 30:</td>
<td colspan="2" class="diff-lineno">Line 30:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>Masters for collection sets of zero or more collections are created in response to client requests to the <code>Factory</code> [[#The Factory|port-type]]. After the binding, collections can be added and removed to and from the bound collection sets via operations of the Master interface. For each bound collection, Masters interact with Index services in an attempt to localise information about the content of the collection (interactions are based on best-effort and caching strategies). In the reference implementation, the information forms  a ''term histogram'' of the collection, i.e. a dictionary of the stems of the most content-bearing words of the collection content, each annotated with their frequency of occurrence within the collection.  </div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>Masters for collection sets of zero or more collections are created in response to client requests to the <code>Factory</code> [[#The Factory|port-type]]. After the binding, collections can be added and removed to and from the bound collection sets via operations of the Master interface. For each bound collection, Masters interact with Index services in an attempt to localise information about the content of the collection (interactions are based on best-effort and caching strategies). In the reference implementation, the information forms  a ''term histogram'' of the collection, i.e. a dictionary of the stems of the most content-bearing words of the collection content, each annotated with their frequency of occurrence within the collection.  </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The localised information is stored in a cross-collection inverted index, the ''master index'', and made available during selection and fusion. Collection selection is driven by input parameters, primarily a query and a selection criterion to apply over a ranking of the collections produced with respect to the query. In the reference implementation, the query is a bag of keywords and the ''collection retrieval inference network'' (CORI) is used for collection ranking. CORI is a collection-level generalisation of the Bayesian Inference Network probabilistic model of retrieval for text documents; in the model, probabilities are based upon statistics that are analogous to term occurrence frequency (tf) and inverse document frequency (idf) in classic document retrieval. In particular, term frequency is replaced with ''document frequency'' (df) and ''inverse document frequency'' is replaced with ''inverse collection frequency'' (icf). Three selection criteria may then be chosen to return the first ''n'' target collections in the ranking. In the ''TopN'' criterion, ''n'' is an absolute constant. In the ''BestScores'' criterion, ''n'' is derived with respect to a threshold on the relevance score. In the ''ResultDistribution'' criterion, ''n'' is derived from a distribution of the number of results to be returned to the user.</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The localised information is stored in a cross-collection inverted index, the ''master index'', and made available during selection and fusion. Collection selection is driven by input parameters, primarily a query and a selection criterion to apply over a ranking of the collections produced with respect to the query. In the reference implementation, the query is a bag of keywords and the ''collection retrieval inference network'' (CORI) is used for collection ranking <ins class="diffchange diffchange-inline"><ref>J. Callan. <em>Advances in Information Retrieval</em>, chapter: Distributed Information Retrieval, pages 127–150.Kluwer Academic Publishers, 2000.</ref></ins>. CORI is a collection-level generalisation of the Bayesian Inference Network probabilistic model of retrieval for text documents; in the model, probabilities are based upon statistics that are analogous to term occurrence frequency (tf) and inverse document frequency (idf) in classic document retrieval. In particular, term frequency is replaced with ''document frequency'' (df) and ''inverse document frequency'' is replaced with ''inverse collection frequency'' (icf). Three selection criteria may then be chosen to return the first ''n'' target collections in the ranking. In the ''TopN'' criterion, ''n'' is an absolute constant. In the ''BestScores'' criterion, ''n'' is derived with respect to a threshold on the relevance score. In the ''ResultDistribution'' criterion, ''n'' is derived from a distribution of the number of results to be returned to the user.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>As to result fusion, this is driven by an input query and two or more references to typically remote resultsets. The reference implementation guarantees a consistent merging of the resultsets by ''re-computing'' results scores with exact, non-heuristic techniques. In particular, Masters rely on: (i) global collection content statistics available in the master index, and (ii) term occurrence statistics for each result (such as the number of terms in the corresponding document and the frequency with which the query terms occur in the document). Effectively, the availability of collection-wide and result-wide statistical information allows the service to consistently re-rank the virtual collection comprised of all the documents identified in the result set.</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>As to result fusion, this is driven by an input query and two or more references to typically remote resultsets. The reference implementation guarantees a consistent merging of the resultsets by ''re-computing'' results scores with exact, non-heuristic techniques. In particular, Masters rely on: (i) global collection content statistics available in the master index, and (ii) term occurrence statistics for each result (such as the number of terms in the corresponding document and the frequency with which the query terms occur in the document). Effectively, the availability of collection-wide and result-wide statistical information allows the service to consistently re-rank the virtual collection comprised of all the documents identified in the result set.</div></td></tr>
</table>Fabio.simeonihttps://wiki.gcube-system.org/index.php?title=Distributed_Information_Retrieval_Support_Framework&diff=6288&oldid=prevFabio.simeoni: /* Masters */2009-03-23T14:29:57Z<p><span dir="auto"><span class="autocomment">Masters</span></span></p>
<table class='diff diff-contentalign-left'>
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 14:29, 23 March 2009</td>
</tr><tr><td colspan="2" class="diff-lineno" id="L34" >Line 34:</td>
<td colspan="2" class="diff-lineno">Line 34:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>As to result fusion, this is driven by an input query and two or more references to typically remote resultsets. The reference implementation guarantees a consistent merging of the resultsets by ''re-computing'' results scores with exact, non-heuristic techniques. In particular, Masters rely on: (i) global collection content statistics available in the master index, and (ii) term occurrence statistics for each result (such as the number of terms in the corresponding document and the frequency with which the query terms occur in the document). Effectively, the availability of collection-wide and result-wide statistical information allows the service to consistently re-rank the virtual collection comprised of all the documents identified in the result set.</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>As to result fusion, this is driven by an input query and two or more references to typically remote resultsets. The reference implementation guarantees a consistent merging of the resultsets by ''re-computing'' results scores with exact, non-heuristic techniques. In particular, Masters rely on: (i) global collection content statistics available in the master index, and (ii) term occurrence statistics for each result (such as the number of terms in the corresponding document and the frequency with which the query terms occur in the document). Effectively, the availability of collection-wide and result-wide statistical information allows the service to consistently re-rank the virtual collection comprised of all the documents identified in the result set.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>At the time and within the scope of their creation, a selection of the state of Masters is published in the Information System. In WSRF terminology, these are the Resource Properties that identify the target collections and by which Masters can be discovered by clients.</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>At the time and within the scope of their creation, a selection of the state of Masters is published in the Information System. In WSRF terminology, these are the Resource Properties that identify the target collections and by which Masters can be discovered by clients<ins class="diffchange diffchange-inline">. The reference implementation publishes only the identifiers of the bound collection set</ins>.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>==The Factory==</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>==The Factory==</div></td></tr>
</table>Fabio.simeonihttps://wiki.gcube-system.org/index.php?title=Distributed_Information_Retrieval_Support_Framework&diff=6287&oldid=prevFabio.simeoni at 14:10, 23 March 20092009-03-23T14:10:23Z<p></p>
<table class='diff diff-contentalign-left'>
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 14:10, 23 March 2009</td>
</tr><tr><td colspan="2" class="diff-lineno" id="L8" >Line 8:</td>
<td colspan="2" class="diff-lineno">Line 8:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* '''collection description''': the synthesis and maintenance over time of summary information about the content of target collections, from partial inverted indices and language models to result traces for training or past queries. Collection description is only of indirect interest in the context of query evaluation and thus is not exposed by the Master service. Its output forms the basis upon which collections are ranked before being selected for query evaluation. It may also be required to normalise the scores with which the partial results of query evaluation are finally merged.</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* '''collection description''': the synthesis and maintenance over time of summary information about the content of target collections, from partial inverted indices and language models to result traces for training or past queries. Collection description is only of indirect interest in the context of query evaluation and thus is not exposed by the Master service. Its output forms the basis upon which collections are ranked before being selected for query evaluation. It may also be required to normalise the scores with which the partial results of query evaluation are finally merged.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The '''DIR Master''' service <del class="diffchange diffchange-inline">implements </del>these  </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The '''DIR Master''' service <ins class="diffchange diffchange-inline">defines offers reference implementations of </ins>these functionalities <ins class="diffchange diffchange-inline">and can be dynamically extended to expose additional implementations and to autonomically select among all the available implementations</ins>.</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>functionalities.</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>=Architecture=</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>=Architecture=</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>In a gCube infrastructure, the service operates within the context of the search framework. Its role is twofold: as a ''query pre-processor'', it its invoked to perform collection selection prior to query evaluation; as a ''search operator'', it is invoked to merge results after the query has been evaluated across selected target collections.  </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>In a gCube infrastructure, the service operates within the context of the search framework. Its role is twofold: as a ''query pre-processor'', it its invoked to perform collection selection prior to query evaluation; as a ''search operator'', it is invoked to merge results after the query has been evaluated across selected target collections. <ins class="diffchange diffchange-inline"> </ins>Within the implementation of the service, collection selection and result fusion are local processes. Collection description, however, relies on FullText Index services to localise and maintain statistics about the content of the target collections. The relationships between the Master service and the services that interact with it are illustrated in the following component diagram:</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Within the implementation of the service, collection selection and result fusion are local processes. Collection description, however, relies on FullText Index services to localise and maintain statistics about the content of the target collections.  </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The relationships between the Master service and the services that interact with it are illustrated in the following component diagram:</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>   </div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>   </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>[[Image:Masterarchitecture.jpg]]</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>[[Image:Masterarchitecture.jpg]]</div></td></tr>
<tr><td colspan="2" class="diff-lineno" id="L22" >Line 22:</td>
<td colspan="2" class="diff-lineno">Line 18:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>=Design=</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>=Design=</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The design of the service is distributed across two port-types: the <code>Master</code> and the <code>Factory</code> port-<del class="diffchange diffchange-inline">types</del>.</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The design of the service is <ins class="diffchange diffchange-inline">conceptually </ins>distributed across two port-types: the <code>Master</code> <ins class="diffchange diffchange-inline">port-type </ins>and the <code>Factory</code> port-<ins class="diffchange diffchange-inline">type</ins>.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>[[Image:Masterdesign1.jpg]]</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>[[Image:Masterdesign1.jpg]]</div></td></tr>
<tr><td colspan="2" class="diff-lineno" id="L28" >Line 28:</td>
<td colspan="2" class="diff-lineno">Line 24:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>== Masters ==</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>== Masters ==</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The <code>Master</code> port-type is stateful, in that it maintains summary information about the target collections, both in memory and on the local file system. Such local 'proxies' of the target collections are then grouped in potentially overlapping ''collection sets'' that represent the <del class="diffchange diffchange-inline">execution </del>scope of a class of distributed queries. Collections sets are then bound to the port-type interface on a per-request basis, in line with the implied resource pattern of WSRF. In particular, the pairing of the <code>Master</code> interface with collection sets <del class="diffchange diffchange-inline">identifies </del>WS-Resources referred to as a ''Masters''.  </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The <code>Master</code> port-type is stateful, in that it maintains summary information about the target collections, both in memory and on the local file system. Such local 'proxies' of the target collections are then grouped in potentially overlapping ''collection sets'' that represent the scope of <ins class="diffchange diffchange-inline">execution for </ins>a class of distributed queries. Collections sets are then bound to the port-type interface on a per-request basis, in line with the implied resource pattern of WSRF. In particular, the pairing of the <code>Master</code> interface with collection sets <ins class="diffchange diffchange-inline">yields </ins>WS-Resources referred to as a ''Masters''.  </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>[[Image:Masterdesign2.jpg]]</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>[[Image:Masterdesign2.jpg]]</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Masters for zero or more <del class="diffchange diffchange-inline">target </del>collections are <del class="diffchange diffchange-inline">first </del>created in response to client requests to the <code>Factory</code> [[#The Factory|port-type]], <del class="diffchange diffchange-inline">although target </del>collections <del class="diffchange diffchange-inline">may </del>be <del class="diffchange diffchange-inline">also bound </del>to Master <del class="diffchange diffchange-inline">at a later time through their own </del>interface. <del class="diffchange diffchange-inline">In any case, the binding of </del>each <del class="diffchange diffchange-inline">target </del>collection <del class="diffchange diffchange-inline">to a Master requires interactions </del>with <del class="diffchange diffchange-inline">the </del>Index <del class="diffchange diffchange-inline">management </del>services<del class="diffchange diffchange-inline">. The </del>interactions are based on best-effort and caching strategies <del class="diffchange diffchange-inline">and aim at localising statistical information about </del>the <del class="diffchange diffchange-inline">content of </del>the <del class="diffchange diffchange-inline">collections. This </del>information <del class="diffchange diffchange-inline">is locally stored in </del>a <del class="diffchange diffchange-inline">cross-collection index, the </del>''<del class="diffchange diffchange-inline">master index</del>'', <del class="diffchange diffchange-inline">and made available during selection and fusion</del>.</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Masters for <ins class="diffchange diffchange-inline">collection sets of </ins>zero or more collections are created in response to client requests to the <code>Factory</code> [[#The Factory|port-type]]<ins class="diffchange diffchange-inline">. After the binding</ins>, collections <ins class="diffchange diffchange-inline">can </ins>be <ins class="diffchange diffchange-inline">added and removed </ins>to <ins class="diffchange diffchange-inline">and from the bound collection sets via operations of the </ins>Master interface. <ins class="diffchange diffchange-inline">For </ins>each <ins class="diffchange diffchange-inline">bound </ins>collection<ins class="diffchange diffchange-inline">, Masters interact </ins>with Index services <ins class="diffchange diffchange-inline">in an attempt to localise information about the content of the collection (</ins>interactions are based on best-effort and caching strategies<ins class="diffchange diffchange-inline">). In </ins>the <ins class="diffchange diffchange-inline">reference implementation, </ins>the information <ins class="diffchange diffchange-inline">forms  </ins>a ''<ins class="diffchange diffchange-inline">term histogram</ins>'' <ins class="diffchange diffchange-inline">of the collection</ins>, <ins class="diffchange diffchange-inline">i.e. a dictionary of the stems of the most content-bearing words of the collection content, each annotated with their frequency of occurrence within the collection</ins>.  </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">At </del>the <del class="diffchange diffchange-inline">time </del>and <del class="diffchange diffchange-inline">within the scope of their creation</del>, a selection of the <del class="diffchange diffchange-inline">state </del>of <del class="diffchange diffchange-inline">Masters </del>is <del class="diffchange diffchange-inline">published </del>in the <del class="diffchange diffchange-inline">Information System</del>. In <del class="diffchange diffchange-inline">WSRF terminology</del>, <del class="diffchange diffchange-inline">these are the Resource Properties that identify </del>the target collections <del class="diffchange diffchange-inline">and by which Masters can </del>be <del class="diffchange diffchange-inline">discovered by clients</del>.</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">The localised information is stored in a cross-collection inverted index, </ins>the <ins class="diffchange diffchange-inline">''master index'', </ins>and <ins class="diffchange diffchange-inline">made available during selection and fusion. Collection selection is driven by input parameters</ins>, <ins class="diffchange diffchange-inline">primarily a query and </ins>a selection <ins class="diffchange diffchange-inline">criterion to apply over a ranking </ins>of the <ins class="diffchange diffchange-inline">collections produced with respect to the query. In the reference implementation, the query is a bag </ins>of <ins class="diffchange diffchange-inline">keywords and the ''collection retrieval inference network'' (CORI) </ins>is <ins class="diffchange diffchange-inline">used for collection ranking. CORI is a collection-level generalisation of the Bayesian Inference Network probabilistic model of retrieval for text documents; </ins>in the <ins class="diffchange diffchange-inline">model, probabilities are based upon statistics that are analogous to term occurrence frequency (tf) and inverse document frequency (idf) in classic document retrieval</ins>. In <ins class="diffchange diffchange-inline">particular</ins>, <ins class="diffchange diffchange-inline">term frequency is replaced with ''document frequency'' (df) and ''inverse document frequency'' is replaced with ''inverse collection frequency'' (icf). Three selection criteria may then be chosen to return </ins>the <ins class="diffchange diffchange-inline">first ''n'' </ins>target collections <ins class="diffchange diffchange-inline">in the ranking. In the ''TopN'' criterion, ''n'' is an absolute constant. In the ''BestScores'' criterion, ''n'' is derived with respect to a threshold on the relevance score. In the ''ResultDistribution'' criterion, ''n'' is derived from a distribution of the number of results to </ins>be <ins class="diffchange diffchange-inline">returned to the user</ins>.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">Adding a target collection </del>to <del class="diffchange diffchange-inline">the Master requires interactions with WS-Resources </del>and <del class="diffchange diffchange-inline">Running Instances of Index Management services</del>.  </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">As </ins>to <ins class="diffchange diffchange-inline">result fusion, this is driven by an input query </ins>and <ins class="diffchange diffchange-inline">two or more references to typically remote resultsets</ins>. The <ins class="diffchange diffchange-inline">reference implementation guarantees a consistent merging of </ins>the <ins class="diffchange diffchange-inline">resultsets by </ins>''<ins class="diffchange diffchange-inline">re-computing</ins>'' <ins class="diffchange diffchange-inline">results scores with exact</ins>, <ins class="diffchange diffchange-inline">non-heuristic techniques. In particular, Masters rely on: (</ins>i<ins class="diffchange diffchange-inline">) global collection </ins>content <ins class="diffchange diffchange-inline">statistics available in </ins>the <ins class="diffchange diffchange-inline">master index</ins>, <ins class="diffchange diffchange-inline">and (ii) term occurrence statistics for </ins>each <ins class="diffchange diffchange-inline">result (such as the number </ins>of <ins class="diffchange diffchange-inline">terms in the corresponding document and </ins>the <ins class="diffchange diffchange-inline">frequency with which the query terms occur </ins>in the <ins class="diffchange diffchange-inline">document). Effectively</ins>, the <ins class="diffchange diffchange-inline">availability </ins>of <ins class="diffchange diffchange-inline">collection-wide and result-wide statistical information allows </ins>the <ins class="diffchange diffchange-inline">service to consistently re-rank the virtual collection comprised </ins>of all the <ins class="diffchange diffchange-inline">documents identified in the result set</ins>.</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The <del class="diffchange diffchange-inline">interactions exhibit inherit best-effort and caching strategies from it. A successful interaction results in </del>the <del class="diffchange diffchange-inline">local availability of a </del>''<del class="diffchange diffchange-inline">term histogram</del>'' <del class="diffchange diffchange-inline">of the collection</del>, i<del class="diffchange diffchange-inline">.e. a dictionary of the stems of the most </del>content<del class="diffchange diffchange-inline">-bearing words of </del>the <del class="diffchange diffchange-inline">collection content</del>, each <del class="diffchange diffchange-inline">annotated with their frequency </del>of <del class="diffchange diffchange-inline">occurrence within </del>the <del class="diffchange diffchange-inline">collection. The histogram is then ingested </del>in the <del class="diffchange diffchange-inline">''master index''</del>, <del class="diffchange diffchange-inline">a local inverted index of </del>the <del class="diffchange diffchange-inline">union </del>of the <del class="diffchange diffchange-inline">content </del>of all the <del class="diffchange diffchange-inline">target associated with Masters</del>.  </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">From </del>the <del class="diffchange diffchange-inline">master index, collection content statics can then be retrieved for </del>the <del class="diffchange diffchange-inline">application </del>of <del class="diffchange diffchange-inline">selection and fusion strategies. These are identified dynamically on a per-request basis. For example</del>, a <del class="diffchange diffchange-inline">collection </del>selection <del class="diffchange diffchange-inline">strategy begins with the application </del>of <del class="diffchange diffchange-inline">an algorithm to rank </del>the <del class="diffchange diffchange-inline">target collections with respect to a client-specified query. This is then followed by the application </del>of <del class="diffchange diffchange-inline">a selection criterion, also specified by clients, against the resulting collection ranking. Differently from selection criteria, however, ranking algorithms are identified implicitly, by comparing characteristics of the individual request (e.g. the type of the query and query terms) with those of prototypical examples of the inputs expected by the available algorithms. This dynamic approach allows the port-type to present a very extensible interface and thus: (i) support multiple algorithms at any one time, and (ii) evolve to support more algorithms whilst maintaining backwards compatibility. Similar considerations apply to merging strategies.</del></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">At </ins>the <ins class="diffchange diffchange-inline">time and within </ins>the <ins class="diffchange diffchange-inline">scope </ins>of <ins class="diffchange diffchange-inline">their creation</ins>, a selection of the <ins class="diffchange diffchange-inline">state </ins>of Masters is <ins class="diffchange diffchange-inline">published in </ins>the <ins class="diffchange diffchange-inline">Information System</ins>. In <ins class="diffchange diffchange-inline">WSRF terminology</ins>, <ins class="diffchange diffchange-inline">these </ins>are the <ins class="diffchange diffchange-inline">Resource Properties </ins>that <ins class="diffchange diffchange-inline">identify </ins>the target collections and <ins class="diffchange diffchange-inline">by </ins>which <ins class="diffchange diffchange-inline">Masters can be discovered by clients</ins>.</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">In the current version of the service, </del>Masters <del class="diffchange diffchange-inline">rank target collections by estimating the likelihood that their content will prove relevant to the information needs that underlies queries. The ''collection retrieval inference network'' (CORI) </del>is <del class="diffchange diffchange-inline">a collection-level generalisation of </del>the <del class="diffchange diffchange-inline">Bayesian Inference Network probabilistic model of retrieval for text documents</del>. In <del class="diffchange diffchange-inline">the model</del>, <del class="diffchange diffchange-inline">probabilities </del>are <del class="diffchange diffchange-inline">based upon statistics that are analogous to term occurrence frequency (tf) and inverse document frequency (idf) in classic document retrieval. In particular, term frequency is replaced with ''document frequency'' (df) and ''inverse document frequency'' is replaced with ''inverse collection frequency'' (icf). Three selection criteria may then be chosen to return </del>the <del class="diffchange diffchange-inline">first n target collections in the ranking. In the ''TopN'' criterion, n is an absolute constant. In the ''BestScores'' criterion, n is derived with respect to a threshold on the relevance score. In the ''ResultDistribution'' criterion, n is derived from a distribution of the number of results to be returned to the user.</del></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">As to fusion, the current version of the service guarantees a ''consistent'' merging of the results sets </del>that <del class="diffchange diffchange-inline">emanate from </del>the target collections <del class="diffchange diffchange-inline">in response to the evaluation of a given query. In particular, Masters ''re-compute'' the relevance estimates of the documents identified in the result sets using non-heuristic techniques. To do so, Masters rely on: (i) global collection content statistics available in the master index, </del>and <del class="diffchange diffchange-inline">(ii) term occurrence statistics for each result (such as the number of terms in the corresponding document and the frequency with </del>which <del class="diffchange diffchange-inline">the query terms occur in the document). Effectively, the availability of collection-wide and result-wide statistical information allows the service to consistently re-rank the virtual collection comprised of all the documents identified in the result set</del>.</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>==The Factory==</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>==The Factory==</div></td></tr>
</table>Fabio.simeonihttps://wiki.gcube-system.org/index.php?title=Distributed_Information_Retrieval_Support_Framework&diff=6286&oldid=prevFabio.simeoni at 13:37, 23 March 20092009-03-23T13:37:14Z<p></p>
<table class='diff diff-contentalign-left'>
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 13:37, 23 March 2009</td>
</tr><tr><td colspan="2" class="diff-lineno" id="L1" >Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">Support for </del>Distributed Information Retrieval (DIR) subsumes functionalities that seek to increase the efficiency and the effectiveness with which unstructured queries are <del class="diffchange diffchange-inline">evauated </del>across <del class="diffchange diffchange-inline">a number of independent </del>collections.  </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Distributed Information Retrieval (DIR) subsumes functionalities that seek to increase the efficiency and the effectiveness with which unstructured queries are <ins class="diffchange diffchange-inline">evaluated </ins>across <ins class="diffchange diffchange-inline">multiple </ins>collections.  </div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Given a set of such queries and two or more target collections for their evaluation, there are three main functionalities that fall squarely within the scope of DIR <del class="diffchange diffchange-inline">support</del>:</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Given a set of such queries and two or more target collections for their evaluation, there are three main functionalities that fall squarely within the scope of DIR:</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>* '''collection selection''': the identification of the subset of target collections that appear to be the most promising candidates for the evaluation of a given query. The process relies upon ''goodness criteria'' and ''selection criteria'': the first are used to rank the target collections from the most promising to the least promising, while the second are used to select collections <del class="diffchange diffchange-inline">for query evaluation </del>based on their rank. Depending on the choice of goodness and selection criteria, the process may promote the efficiency or the effectiveness of query evaluation. In the first case, the output of the service is used to limit the <del class="diffchange diffchange-inline">costs </del>of query evaluation only to the selected collections. In the second case, the output of the service is used to regulate the number of results retrieved from each collection, and thus to limit the number of irrelevant results that are ultimately presented to the user. The two usage models may well coexist within a single collection selection strategy.</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* '''collection selection''': the identification of the subset of target collections that appear to be the most promising candidates for the evaluation of a given query. The process relies upon ''goodness criteria'' and ''selection criteria'': the first are used to rank the target collections from the most promising to the least promising, while the second are used to select collections based on their rank. Depending on the choice of goodness and selection criteria, the process may promote the efficiency or the effectiveness of query evaluation. In the first case, the output of the service is used to limit the <ins class="diffchange diffchange-inline">cost </ins>of query evaluation only to the selected collections. In the second case, the output of the service is used to regulate the number of results retrieved from each collection, and thus to limit the number of irrelevant results that are ultimately presented to the user. The two usage models may well coexist within a single collection selection strategy.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>* '''result fusion''': the integration of the partial results obtained by evaluating <del class="diffchange diffchange-inline">the queries </del>against (<del class="diffchange diffchange-inline">selections </del>of) the target collections. Typically, the process is one of harmonisation of result scores that have been assigned with respect to different retrieval models and content statistics. Accordingly, the process promotes the effectiveness of query evaluation.  </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* '''result fusion''': the integration of the partial results obtained by evaluating <ins class="diffchange diffchange-inline">a given query </ins>against (<ins class="diffchange diffchange-inline">a selection </ins>of) the target collections. Typically, the process is one of harmonisation of result scores that have been assigned with respect to different retrieval models and content statistics. Accordingly, the process promotes the effectiveness of query evaluation.  </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* '''collection description''': the synthesis and maintenance over time of summary information about the content of target collections, from partial inverted indices and language models to result traces for training or past queries. Collection description is only of indirect interest in the context of query evaluation and thus is not exposed by the Master service. Its output forms the basis upon which collections are ranked before being selected for query evaluation. It may also be required to normalise the scores with which the partial results of query evaluation are finally merged.</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* '''collection description''': the synthesis and maintenance over time of summary information about the content of target collections, from partial inverted indices and language models to result traces for training or past queries. Collection description is only of indirect interest in the context of query evaluation and thus is not exposed by the Master service. Its output forms the basis upon which collections are ranked before being selected for query evaluation. It may also be required to normalise the scores with which the partial results of query evaluation are finally merged.</div></td></tr>
</table>Fabio.simeonihttps://wiki.gcube-system.org/index.php?title=Distributed_Information_Retrieval_Support_Framework&diff=6285&oldid=prevFabio.simeoni: /* Architecture */2009-03-23T13:30:55Z<p><span dir="auto"><span class="autocomment">Architecture</span></span></p>
<table class='diff diff-contentalign-left'>
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 13:30, 23 March 2009</td>
</tr><tr><td colspan="2" class="diff-lineno" id="L14" >Line 14:</td>
<td colspan="2" class="diff-lineno">Line 14:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>In a gCube infrastructure, the service operates within the context of the search framework. Its role is twofold: as a ''query pre-processor'', it its invoked to perform collection selection prior to query evaluation; as a ''search operator'', it is invoked to merge results after the query has been evaluated across selected target collections.  </div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>In a gCube infrastructure, the service operates within the context of the search framework. Its role is twofold: as a ''query pre-processor'', it its invoked to perform collection selection prior to query evaluation; as a ''search operator'', it is invoked to merge results after the query has been evaluated across selected target collections.  </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Within the implementation of the service, collection selection and result fusion are local processes. Collection description, however, relies on FullText Index <del class="diffchange diffchange-inline">Management services </del>services to localise and maintain statistics about the content of the target collections.  </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Within the implementation of the service, collection selection and result fusion are local processes. Collection description, however, relies on FullText Index services to localise and maintain statistics about the content of the target collections.  </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>The relationships between the Master service and the services that interact with it are illustrated in the following component diagram:</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>The relationships between the Master service and the services that interact with it are illustrated in the following component diagram:</div></td></tr>
</table>Fabio.simeonihttps://wiki.gcube-system.org/index.php?title=Distributed_Information_Retrieval_Support_Framework&diff=6235&oldid=prevLeonardo.candela at 11:41, 13 March 20092009-03-13T11:41:47Z<p></p>
<table class='diff diff-contentalign-left'>
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 11:41, 13 March 2009</td>
</tr><tr><td colspan="2" class="diff-lineno" id="L1" >Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">The '''</del>DIR <del class="diffchange diffchange-inline">Master''' service optimises </del>the <del class="diffchange diffchange-inline">evaluation of </del>unstructured queries across two or more <del class="diffchange diffchange-inline">autonomous collections, the ''</del>target collections<del class="diffchange diffchange-inline">''.  In particular</del>, <del class="diffchange diffchange-inline">the service offers the following </del>functionalities:</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">Support for Distributed Information Retrieval (</ins>DIR<ins class="diffchange diffchange-inline">) subsumes functionalities that seek to increase </ins>the <ins class="diffchange diffchange-inline">efficiency and the effectiveness with which </ins>unstructured queries <ins class="diffchange diffchange-inline">are evauated </ins>across <ins class="diffchange diffchange-inline">a number of independent collections. </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">Given a set of such queries and </ins>two or more target collections <ins class="diffchange diffchange-inline">for their evaluation</ins>, <ins class="diffchange diffchange-inline">there are three main </ins>functionalities <ins class="diffchange diffchange-inline">that fall squarely within the scope of DIR support</ins>:</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* '''collection selection''': the identification of the subset of target collections that appear to be the most promising candidates for the evaluation of a given query. The process relies upon ''goodness criteria'' and ''selection criteria'': the first are used to rank the target collections from the most promising to the least promising, while the second are used to select collections for query evaluation based on their rank. Depending on the choice of goodness and selection criteria, the process may promote the efficiency or the effectiveness of query evaluation. In the first case, the output of the service is used to limit the costs of query evaluation only to the selected collections. In the second case, the output of the service is used to regulate the number of results retrieved from each collection, and thus to limit the number of irrelevant results that are ultimately presented to the user. The two usage models may well coexist within a single collection selection strategy.</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* '''collection selection''': the identification of the subset of target collections that appear to be the most promising candidates for the evaluation of a given query. The process relies upon ''goodness criteria'' and ''selection criteria'': the first are used to rank the target collections from the most promising to the least promising, while the second are used to select collections for query evaluation based on their rank. Depending on the choice of goodness and selection criteria, the process may promote the efficiency or the effectiveness of query evaluation. In the first case, the output of the service is used to limit the costs of query evaluation only to the selected collections. In the second case, the output of the service is used to regulate the number of results retrieved from each collection, and thus to limit the number of irrelevant results that are ultimately presented to the user. The two usage models may well coexist within a single collection selection strategy.</div></td></tr>
<tr><td colspan="2" class="diff-lineno" id="L7" >Line 7:</td>
<td colspan="2" class="diff-lineno">Line 8:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* '''collection description''': the synthesis and maintenance over time of summary information about the content of target collections, from partial inverted indices and language models to result traces for training or past queries. Collection description is only of indirect interest in the context of query evaluation and thus is not exposed by the Master service. Its output forms the basis upon which collections are ranked before being selected for query evaluation. It may also be required to normalise the scores with which the partial results of query evaluation are finally merged.</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* '''collection description''': the synthesis and maintenance over time of summary information about the content of target collections, from partial inverted indices and language models to result traces for training or past queries. Collection description is only of indirect interest in the context of query evaluation and thus is not exposed by the Master service. Its output forms the basis upon which collections are ranked before being selected for query evaluation. It may also be required to normalise the scores with which the partial results of query evaluation are finally merged.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">Collectively, these functionalities characterise the field known as (unstructured) </del>''<del class="diffchange diffchange-inline">Distributed Information Retrieval</del>'<del class="diffchange diffchange-inline">' (</del>DIR<del class="diffchange diffchange-inline">).</del></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">The </ins>'''DIR <ins class="diffchange diffchange-inline">Master''' service implements these </ins></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">functionalities.</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>=Architecture=</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>=Architecture=</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
</table>Leonardo.candelahttps://wiki.gcube-system.org/index.php?title=Distributed_Information_Retrieval_Support_Framework&diff=5316&oldid=prevFabio.simeoni: /* Masters */2008-11-12T17:21:09Z<p><span dir="auto"><span class="autocomment">Masters</span></span></p>
<table class='diff diff-contentalign-left'>
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 17:21, 12 November 2008</td>
</tr><tr><td colspan="2" class="diff-lineno" id="L27" >Line 27:</td>
<td colspan="2" class="diff-lineno">Line 27:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>== Masters ==</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>== Masters ==</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The <code>Master</code> port-type is stateful, in that it maintains <del class="diffchange diffchange-inline">(</del>in memory and on the local file system<del class="diffchange diffchange-inline">) summary information about the target collections</del>. Such 'proxies' of the target collections are grouped in <del class="diffchange diffchange-inline">(</del>potentially overlapping<del class="diffchange diffchange-inline">) </del>''collection sets'' that represent the execution scope of a class of distributed queries. Collections sets are then bound to the port-type interface on a per-request basis, in line with the implied resource pattern of WSRF. In particular, pairing the <code>Master</code> interface with collection sets identifies WS-Resources referred to as a ''Masters''.  </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The <code>Master</code> port-type is stateful, in that it maintains <ins class="diffchange diffchange-inline">summary information about the target collections, both </ins>in memory and on the local file system. Such <ins class="diffchange diffchange-inline">local </ins>'proxies' of the target collections are <ins class="diffchange diffchange-inline">then </ins>grouped in potentially overlapping ''collection sets'' that represent the execution scope of a class of distributed queries. Collections sets are then bound to the port-type interface on a per-request basis, in line with the implied resource pattern of WSRF. In particular, <ins class="diffchange diffchange-inline">the </ins>pairing <ins class="diffchange diffchange-inline">of </ins>the <code>Master</code> interface with collection sets identifies WS-Resources referred to as a ''Masters''.  </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>[[Image:Masterdesign2.jpg]]</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>[[Image:Masterdesign2.jpg]]</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Masters are created in response to client requests to the <code>Factory</code> [[#The Factory|port-type]]<del class="diffchange diffchange-inline">. At the </del>time <del class="diffchange diffchange-inline">and within the scope of </del>their <del class="diffchange diffchange-inline">creation</del>, <del class="diffchange diffchange-inline">a selection of </del>the <del class="diffchange diffchange-inline">state </del>of <del class="diffchange diffchange-inline">Masters is published in </del>the <del class="diffchange diffchange-inline">Information System</del>. <del class="diffchange diffchange-inline">In WSRF terminology, these </del>are the <del class="diffchange diffchange-inline">Resource Properties that identify </del>the <del class="diffchange diffchange-inline">target </del>collections and <del class="diffchange diffchange-inline">by which Masters can be discovered by clients</del>.</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Masters <ins class="diffchange diffchange-inline">for zero or more target collections </ins>are <ins class="diffchange diffchange-inline">first </ins>created in response to client requests to the <code>Factory</code> [[#The Factory|port-type]]<ins class="diffchange diffchange-inline">, although target collections may be also bound to Master at a later </ins>time <ins class="diffchange diffchange-inline">through </ins>their <ins class="diffchange diffchange-inline">own interface. In any case</ins>, the <ins class="diffchange diffchange-inline">binding </ins>of <ins class="diffchange diffchange-inline">each target collection to a Master requires interactions with </ins>the <ins class="diffchange diffchange-inline">Index management services</ins>. <ins class="diffchange diffchange-inline">The interactions </ins>are <ins class="diffchange diffchange-inline">based on best-effort and caching strategies and aim at localising statistical information about </ins>the <ins class="diffchange diffchange-inline">content of </ins>the collections<ins class="diffchange diffchange-inline">. This information is locally stored in a cross-collection index, the ''master index'', </ins>and <ins class="diffchange diffchange-inline">made available during selection and fusion</ins>.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">At the time and within the scope of their creation, a selection of the state of Masters is published in the Information System. In WSRF terminology, these are the Resource Properties that identify the target collections and by which Masters can be discovered by clients.</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Adding a collection to the Master requires interactions with WS-Resources and Running Instances of Index <del class="diffchange diffchange-inline">Management and Collection </del>Management services. The interactions <del class="diffchange diffchange-inline">are entirely based on instantiations of the Handler’s framework included in the gCF, and thus </del>inherit best-effort and caching strategies from it. A successful interaction results in the local availability of a ''term histogram'' of the collection, i.e. a dictionary of the stems of the most content-bearing words of the collection content, each annotated with their frequency of occurrence within the collection. The histogram is then ingested in the ''master index'', a local inverted index of the union of the content of all the target associated with Masters.  </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Adding a <ins class="diffchange diffchange-inline">target </ins>collection to the Master requires interactions with WS-Resources and Running Instances of Index Management services.  </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The interactions <ins class="diffchange diffchange-inline">exhibit </ins>inherit best-effort and caching strategies from it. A successful interaction results in the local availability of a ''term histogram'' of the collection, i.e. a dictionary of the stems of the most content-bearing words of the collection content, each annotated with their frequency of occurrence within the collection. The histogram is then ingested in the ''master index'', a local inverted index of the union of the content of all the target associated with Masters.  </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>From the master index, collection content statics can then be retrieved for the application of selection and fusion strategies. These are identified dynamically on a per-request basis. For example, a collection selection strategy begins with the application of an algorithm to rank the target collections with respect to a client-specified query. This is then followed by the application of a selection criterion, also specified by clients, against the resulting collection ranking. Differently from selection criteria, however, ranking algorithms are identified implicitly, by comparing characteristics of the individual request (e.g. the type of the query and query terms) with those of prototypical examples of the inputs expected by the available algorithms. This dynamic approach allows the port-type to present a very extensible interface and thus: (i) support multiple algorithms at any one time, and (ii) evolve to support more algorithms whilst maintaining backwards compatibility. Similar considerations apply to merging strategies.</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>From the master index, collection content statics can then be retrieved for the application of selection and fusion strategies. These are identified dynamically on a per-request basis. For example, a collection selection strategy begins with the application of an algorithm to rank the target collections with respect to a client-specified query. This is then followed by the application of a selection criterion, also specified by clients, against the resulting collection ranking. Differently from selection criteria, however, ranking algorithms are identified implicitly, by comparing characteristics of the individual request (e.g. the type of the query and query terms) with those of prototypical examples of the inputs expected by the available algorithms. This dynamic approach allows the port-type to present a very extensible interface and thus: (i) support multiple algorithms at any one time, and (ii) evolve to support more algorithms whilst maintaining backwards compatibility. Similar considerations apply to merging strategies.</div></td></tr>
</table>Fabio.simeonihttps://wiki.gcube-system.org/index.php?title=Distributed_Information_Retrieval_Support_Framework&diff=5315&oldid=prevFabio.simeoni: /* Masters */2008-11-12T17:10:29Z<p><span dir="auto"><span class="autocomment">Masters</span></span></p>
<table class='diff diff-contentalign-left'>
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 17:10, 12 November 2008</td>
</tr><tr><td colspan="2" class="diff-lineno" id="L40" >Line 40:</td>
<td colspan="2" class="diff-lineno">Line 40:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>In the current version of the service, Masters rank target collections by estimating the likelihood that their content will prove relevant to the information needs that underlies queries. The ''collection retrieval inference network'' (CORI) is a collection-level generalisation of the Bayesian Inference Network probabilistic model of retrieval for text documents. In the model, probabilities are based upon statistics that are analogous to term occurrence frequency (tf) and inverse document frequency (idf) in classic document retrieval. In particular, term frequency is replaced with ''document frequency'' (df) and ''inverse document frequency'' is replaced with ''inverse collection frequency'' (icf). Three selection criteria may then be chosen to return the first n target collections in the ranking. In the ''TopN'' criterion, n is an absolute constant. In the ''BestScores'' criterion, n is derived with respect to a threshold on the relevance score. In the ''ResultDistribution'' criterion, n is derived from a distribution of the number of results to be returned to the user.</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>In the current version of the service, Masters rank target collections by estimating the likelihood that their content will prove relevant to the information needs that underlies queries. The ''collection retrieval inference network'' (CORI) is a collection-level generalisation of the Bayesian Inference Network probabilistic model of retrieval for text documents. In the model, probabilities are based upon statistics that are analogous to term occurrence frequency (tf) and inverse document frequency (idf) in classic document retrieval. In particular, term frequency is replaced with ''document frequency'' (df) and ''inverse document frequency'' is replaced with ''inverse collection frequency'' (icf). Three selection criteria may then be chosen to return the first n target collections in the ranking. In the ''TopN'' criterion, n is an absolute constant. In the ''BestScores'' criterion, n is derived with respect to a threshold on the relevance score. In the ''ResultDistribution'' criterion, n is derived from a distribution of the number of results to be returned to the user.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>As to fusion, the current version of the service guarantees a ''consistent'' merging of the results sets that emanate from the target collections in response to the evaluation of a given query. In particular, Masters ''re-compute'' the relevance estimates of the documents identified in the result sets using non-heuristic techniques. To do so, Masters rely on: (i) global collection content statistics available in the master index, and (ii) term occurrence statistics for each result (such as the number of terms in the corresponding document and the frequency with which the query terms occur in the document). Effectively, the availability of collection-wide and result-wide statistical information allows the service to consistently re-rank the virtual collection comprised of all the documents identified in the result set.  </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>As to fusion, the current version of the service guarantees a ''consistent'' merging of the results sets that emanate from the target collections in response to the evaluation of a given query. In particular, Masters ''re-compute'' the relevance estimates of the documents identified in the result sets using non-heuristic techniques. To do so, Masters rely on: (i) global collection content statistics available in the master index, and (ii) term occurrence statistics for each result (such as the number of terms in the corresponding document and the frequency with which the query terms occur in the document). Effectively, the availability of collection-wide and result-wide statistical information allows the service to consistently re-rank the virtual collection comprised of all the documents identified in the result set.</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>==The Factory==</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>==The Factory==</div></td></tr>
</table>Fabio.simeoni