Difference between revisions of "Data Retrieval Facilities"
From Gcube Wiki
(→Key Features) |
(→Subsystems) |
||
(7 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
+ | [[Category:gCube Features]] | ||
== Overview == | == Overview == | ||
gCube provides Information Retrieval facilities over large heterogeneous | gCube provides Information Retrieval facilities over large heterogeneous | ||
Line 16: | Line 17: | ||
;Declarative Query Language over a heterogeneous environment | ;Declarative Query Language over a heterogeneous environment | ||
− | gCube Data Retrieval framework unifies Data Sources that use different data representation and semantics through the [http://www.loc.gov/standards/sru/specs/cql.html | + | : gCube Data Retrieval framework unifies Data Sources that use different data representation and semantics through the [http://www.loc.gov/standards/sru/specs/cql.html CQL] standard. |
− | ;On the fly | + | ;On the fly Integration of Data Sources |
: A Data Source that publishes its Information Retrieval capabilities can be on-the-fly involved in the IR process. | : A Data Source that publishes its Information Retrieval capabilities can be on-the-fly involved in the IR process. | ||
;Scalability in the number of Data Sources | ;Scalability in the number of Data Sources | ||
− | : | + | : Planning and Optimization mechanisms detect the minimum number of Sources needed to be involved during query answering, along with an optimal plan for execution. |
;Direct Integration of External Information Providers | ;Direct Integration of External Information Providers | ||
− | : | + | : Through the [http://www.opensearch.org/ OpenSearch] standard, external Information Providers can be queried dynamically. The results they provide can be aggregated with the rest of results during query answering. |
;Indexing Capabilities for Replication and High Availability | ;Indexing Capabilities for Replication and High Availability | ||
− | : | + | : Multidimensional and Full-text indexing capabilites using an architecture that efficiently supports replication and high availability. |
;Distributed Execution Environment offering High Performance and Flexibility | ;Distributed Execution Environment offering High Performance and Flexibility | ||
− | : | + | : Efficient execution of search plans over a large heterogeneous environment. |
== Subsystems == | == Subsystems == | ||
Data Retrieval framework comprises the following two subsystems: | Data Retrieval framework comprises the following two subsystems: | ||
− | [[Search Planning and Execution Specification]] | + | ;[[Search Planning and Execution Specification]] |
− | + | :which enables the integration of CQL-compliant Data Sources and are responsible for answering queries by combining Data Sources capabilities and Search Operators | |
− | [[Data Sources Specification]] | + | ;[[Data Sources Specification]] |
+ | :which aims to provide integration of the data from different data providers into our infrastructure |
Latest revision as of 17:47, 13 January 2014
Overview
gCube provides Information Retrieval facilities over large heterogeneous environments. Sources of information that use different technologies, data representation and semantics can be integrated and exploited by gCube's Data Retrieval framework. The architecture and mechanisms provided by the framework ensure flexibility, scalability, high performance and availability.
The gCube Data Retrieval Framework aims at hiding the complexity of the underlying environment by:
- providing a declarative approach for querying the hosted information
- scaling to the number of hosted information sources
- Integrating dynamically external sources of information
Key Features
- Declarative Query Language over a heterogeneous environment
- gCube Data Retrieval framework unifies Data Sources that use different data representation and semantics through the CQL standard.
- On the fly Integration of Data Sources
- A Data Source that publishes its Information Retrieval capabilities can be on-the-fly involved in the IR process.
- Scalability in the number of Data Sources
- Planning and Optimization mechanisms detect the minimum number of Sources needed to be involved during query answering, along with an optimal plan for execution.
- Direct Integration of External Information Providers
- Through the OpenSearch standard, external Information Providers can be queried dynamically. The results they provide can be aggregated with the rest of results during query answering.
- Indexing Capabilities for Replication and High Availability
- Multidimensional and Full-text indexing capabilites using an architecture that efficiently supports replication and high availability.
- Distributed Execution Environment offering High Performance and Flexibility
- Efficient execution of search plans over a large heterogeneous environment.
Subsystems
Data Retrieval framework comprises the following two subsystems:
- Search Planning and Execution Specification
- which enables the integration of CQL-compliant Data Sources and are responsible for answering queries by combining Data Sources capabilities and Search Operators
- Data Sources Specification
- which aims to provide integration of the data from different data providers into our infrastructure