Difference between revisions of "InspireIndexing"

Latest revision as of 16:30, 16 June 2011

INSPIRE Parallel Indexing

The parallel indexing module is designed for indexing very large collections of full text documents (hundreds of thousands or millions) The implementation uses Hadoop and Lucene libraries and is meant to be executed on a Hadoop facility. The input documents as well as computed indexes are located on the Hadoop DFS. The arguments for an indexing job:

Input directory
Output directory
number of workers

@@ Line 1: / Line 1: @@
 === INSPIRE Parallel Indexing ===
-Indexing documentation.
+The parallel indexing module is designed for indexing very large collections of full text documents (hundreds of thousands or millions)
+The implementation uses Hadoop and Lucene libraries and is meant to be executed on a Hadoop facility.
+The input documents as well as computed indexes are located on the Hadoop DFS.
+The arguments for an indexing job:
+* Input directory
+* Output directory
+* number of workers

Difference between revisions of "InspireIndexing"

Latest revision as of 16:30, 16 June 2011

INSPIRE Parallel Indexing

Navigation menu

Views

Personal tools

gCube Wiki

gCube features

gCube documentation

Integration and Distribution

Search

Tools