Digital Library Administration
Contents
- 1 VDL Creation and Management
- 2 Resources Management
- 3 VO and Users Management
- 4 Content & Storage Management
- 4.1 Simple Setup of Storage Management using Apache Derby
- 4.2 Advanced Setup of Storage Management using an arbitrary relational JDBC database
- 4.3 Advanced Setup of Storage Management for protocol handlers
- 4.4 Advanced Setup of Storage Management for using not a database to store raw content
- 4.5 Setup of Content & Collection Management
- 5 Metadata Management
- 6 Index Management
- 7 Search Management
- 8 Feature Extraction
- 9 Process Management
VDL Creation and Management
Resources Management
THIS SECTION OF GCUBE DOCUMENTATION IS CURRENTLY UNDER UPDATE.
Generic Resources Management
In order to properly setup a VDL, several Generic Reosources are needed to be published on DIS. The VDL Administrator can create them by using the Generic Resource Portlet. Additionally, every time that a new schema appears on VDL, a MetadataSchemaInfo, a PresentationXSLT_<schemaName>_<xsltName> and a MetadataXSLT_<schemaName>_<xsltName> Generic Resources must be created for this schema.
DefaultUserProfile
The DefaultUserProfile Generic Resource contains information about the mandatory elements that all user's profiles must have.
The VDL Administrator must create this Generic Resource named "DefaultUserProfile".
The body of this resource must be in the following form:
<userprofile>
<userinfo>
<username></username>
<fullname></fullname>
<email></email>
</userinfo>
<userpreferences>
<language></language>
<langcolpairs></langcolpairs>
<xslts>
<metadataxslt></metadataxslt>
<presentationxslt></presentationxslt>
</xslts>
</userpreferences>
</userprofile>
If the DefaultUserProfile Generic Resource should be changed, the Profile Administration Portlet can be used in order to apply the changes.
ScenarioCollectionInfo
This Generic Resource contains information about the available collections for a specific VDL and their hierarchical structure.
The collections can be clustered in group so as to help end users to identify similar collections and to present the collections in a human managable way.
The VDL Administrator must create a Generic Resource named: "ScenarioCollectionInfo" whose body must be in the following form:
<DL name="<AbsoluteDLName>">
<collections name="collection group 1 name" shortname="short name" description="description of group of callections">
<collection name="collection 1.1 name" reference="reference url for this collection" shortname="short name for the collection" description="collection desription"/>
<collection name="collection 1.2 name" reference="reference url for this collection" shortname="short name for the collection" description="collection desription"/>
<collection name="collection 1.3 name" reference="reference url for this collection" shortname="short name for the collection" description="collection desription"/>
...
</collections>
<collections name="collection group 2 name" shortname="short name" description="description of group of callections">
<collection name="collection 2.1 name" reference="reference url for this collection" shortname="short name for the collection" description="collection desription"/>
<collection name="collection 2.2 name" reference="reference url for this collection" shortname="short name for the collection" description="collection desription"/>
...
</collections>
...
</DL>
The root element is DL and it has an attribute named "name". The name attribute is very important and it has to be in the form: /<VO>/<Community>/<DLName>
Additionally, the DL element, contains an arbitrary number of "collections" elements. Each of these elements represent a group of collections.
Its attributes are:
- name: The name of the group
- shortname: The shortname of the group
- description: Its description
Furthermore, its collections element contains an arbitrary number of "collection" elements. Each of these elements represents an actual collection.
Its attributes are:
- name: The name of the colection. This name must be a perfect match with the collection name as it exists in collection management service.
- shortname: The shortname of the collection
- description: Its description
- reference: A reference URL for this collection
MetadataSchemaInfo
One such Generic Resource must exist for each schema of the VDL.
It contains information about which are the searchable and the browsable fields in addition to what type of search must be applied.
The VDL Administrator must create one Generic Resource for each schema named "MetadataSchemaInfo ".
The body of this resource must be in the following form:
<schemaName>
<option>
<option-name>displayed name in search fields</option-name>
<option-value>actual xml-element name in metadata or an XPath expression</option-value>
<option-type>type of search to apply</option-type>
<option-sort>XPath expression to be used for sort (exist only for browsable fields)</option-sort>
</option>
...
</schemaName>
The root element is the name of the corresponding schema. This node contains an arbitrary nmber of "option" elements.
Each option element contains the following elements:
- option-name: This is the displayed name in the search fields.
- option-type: It can either be fielded or xpath. If it is fielded it means that the fielded search operator must be used in search. On the other hand, if it is xpath, the filter by xpath search operator must be used in search.
- option-value: If type is fielded the name of the field (xml-element) must be here. Otherwise, it must contain the xpath expression that identifies the field in xml schema. (Xpath is usually used when the corresponding field is not an element but an attribute in the metadata).
- option-sort:This is an optional element. If it exists, it means that this field is browsable (the user can browse the collection and receive the results sorted by this field). So, the XPath expression that identifies this field must exist here.
TitleXSLT
This Generic Resource contains an XSLT that extracts from every Result-Record the title field, regardless the schema, or where it came from (google, full text, quick serach etc).
The VDL Administrator must create a Generic Resource named "TitleXSLT" whose body must be an XSLT. Bellow you can find a template XSLT:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output encoding="UTF-8" method="html" omit-xml-declaration="yes"/>
<xsl:template match="/">
<xsl:choose>
<xsl:when test="//*[local-name()='title'][1]">
<xsl:value-of select="//*[local-name()='title'][1]"/>
</xsl:when>
<xsl:when test="//*[local-name()='resTitle'][1]">
<xsl:value-of select="//*[local-name()='resTitle'][1]"/>
</xsl:when>
<xsl:when test="//titleStmt/title[@type='main'][1]">
<xsl:value-of select="//titleStmt/title[@type='main'][1]"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="/root/docFields/*[1]"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
GenericXSLT
This Generic Resource contains an XSLT that transforms Result-Record of QuickSearch to html records so as to be presented to the end user.
The VDL Administrator must create a Generic Resource named "GenericXSLT" whose body must be an XSLT. Bellow you can find a template XSLT:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output encoding="UTF-8" method="html" omit-xml-declaration="yes"/>
<xsl:template match="/">
<table width="100%">
<xsl:for-each select="root/docFields/*">
<tr>
<td align="right" class="window-title-inactive" width="120">
<b><xsl:value-of select="local-name()"/>:</b>
</td>
<td>
<xsl:value-of select="substring(self::node(),1,100)"/>
<xsl:if test="string-length(self::node()) > 100">
<i>... (more)</i>
</xsl:if>
</td>
</tr>
</xsl:for-each>
<tr>
<td align="right" class="window-title-inactive" width="120">
<b>Collection:</b>
</td>
<td>collection-name-here</td>
</tr>
</table>
</xsl:template>
</xsl:stylesheet>
GoogleXSLT
This Generic Resource contains an XSLT that transforms Result-Record of GoogleSearch to html records so as to be presented to the end user.
The VDL Administrator must create a Generic Resource named "GoogleXSLT" whose body must be an XSLT. Bellow you can find a template XSLT:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output encoding="UTF-8" method="html" omit-xml-declaration="yes"/>
<xsl:template match="/">
<a target="_blanck">
<xsl:attribute name="href"><xsl:value-of select="//*[local-name()</span>='URL']"/></xsl:attribute>
<i>
<xsl:value-of select="//*[local-name()</span>='title']"/>
</i>
</a>
<br/>
<xsl:value-of select="//*[local-name()</span>='snippet']"/>
</xsl:template>
</xsl:stylesheet>
PresentationXSLT_<schemaName>_<xsltName>
At least one such Generic Resource must exist for each schema of the VDL.
It contains an XSLT that transforms the Result-Record to html records so as to be presented to the end user.
The VDL Administrator must create at least one Generic Resource for each schema named "PresentationXSLT_<schemaName>_<xsltName>" whose body must be an XSLT, where schemaName is the name of the corresponding schema and xsltName is a name for the xslt.
Notice: There must be at least one xslt per schema. This xslt must be named "default" and it is the one that will be used in the user-profile as the selected xslt for this schema when the profile will be created.
Bellow you can see a template XSLT:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output encoding="UTF-8" method="html" omit-xml-declaration="yes"/>
<xsl:template match="/">
<xsl:if test="//*[local-name()='creator']">
<xsl:value-of select="//*[local-name()='creator']"/>
,
</xsl:if>
<xsl:if test="//*[local-name()='title']">
<i>
<xsl:value-of select="//*[local-name()='title']"/>
</i>
</xsl:if>
(
<xsl:if test="//*[local-name()='date']">
<xsl:value-of select="//*[local-name()='date']"/>
,
</xsl:if>
<xsl:if test="//*[local-name()='language']">
<xsl:value-of select="//*[local-name()='language']"/>
,
</xsl:if>
collection-short-here)
</xsl:template>
</xsl:stylesheet>
MetadataXSLT_<schemaName>_<xsltName>
At least one such Generic Resource must exist for each schema of the VDL.
It contains an XSLT that transforms the metadata record to html so as to be presented to the end user.
The VDL Administrator must create at least one Generic Resource for each schema named "MetadataXSLT_<schemaName>_<xsltName>" whose body must be an XSLT, where schemaName is the name of the corresponding schema and xsltName is a name for the xslt.
Notice: There must be at least one xslt per schema. This xslt must be named "default" and it is the one that will be used in the user-profile as the selected xslt for this schema when the profile will be created.
Bellow you can see a template XSLT:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output encoding="UTF-8" indent="yes" method="html" version="1.0"/>
<xsl:template match="/">
<table border="1" style="border-collapse: collapse;" width="60%">
<xsl:if test="//*[local-name()='title']">
<th align="left" class="diligent-header">Title</th>
<xsl:for-each select="//*[local-name()='title']">
<tr>
<td>
<xsl:value-of select="self::node()"/>
</td>
</tr>
</xsl:for-each>
</xsl:if>
<xsl:if test="//*[local-name()='creator']">
<th align="left" class="diligent-header">Creator</th>
<xsl:for-each select="//*[local-name()='creator']">
<tr>
<td>
<xsl:value-of select="self::node()"/>
</td>
</tr>
</xsl:for-each>
</xsl:if>
<xsl:if test="//*[local-name()='subject']">
<th align="left" class="diligent-header">Subject</th>
<xsl:for-each select="//*[local-name()='subject']">
<tr>
<td>
<xsl:value-of select="self::node()"/>
</td>
</tr>
</xsl:for-each>
</xsl:if>
<xsl:if test="//*[local-name()='description']">
<th align="left" class="diligent-header">Description</th>
<xsl:for-each select="//*[local-name()='description']">
<tr>
<td>
<xsl:value-of select="self::node()"/>
</td>
</tr>
</xsl:for-each>
</xsl:if>
<xsl:if test="//*[local-name()='publisher']">
<th align="left" class="diligent-header">Publisher</th>
<xsl:for-each select="//*[local-name()='publisher']">
<tr>
<td>
<xsl:value-of select="self::node()"/>
</td>
</tr>
</xsl:for-each>
</xsl:if>
<xsl:if test="//*[local-name()='contributor']">
<th align="left" class="diligent-header">Contributor</th>
<xsl:for-each select="//*[local-name()='contributor']">
<tr>
<td>
<xsl:value-of select="self::node()"/>
</td>
</tr>
</xsl:for-each>
</xsl:if>
<xsl:if test="//*[local-name()='date']">
<th align="left" class="diligent-header">Date</th>
<xsl:for-each select="//*[local-name()='date']">
<tr>
<td>
<xsl:value-of select="self::node()"/>
</td>
</tr>
</xsl:for-each>
</xsl:if>
<xsl:if test="//*[local-name()='type']">
<th align="left" class="diligent-header">Type</th>
<xsl:for-each select="//*[local-name()='type']">
<tr>
<td>
<xsl:value-of select="self::node()"/>
</td>
</tr>
</xsl:for-each>
</xsl:if>
<xsl:if test="//*[local-name()='format']">
<th align="left" class="diligent-header">Format</th>
<xsl:for-each select="//*[local-name()='format']">
<tr>
<td>
<xsl:value-of select="self::node()"/>
</td>
</tr>
</xsl:for-each>
</xsl:if>
<xsl:if test="//*[local-name()='identifier']">
<th align="left" class="diligent-header">Identifier</th>
<xsl:for-each select="//*[local-name()='identifier']">
<tr>
<td>
<xsl:value-of select="self::node()"/>
</td>
</tr>
</xsl:for-each>
</xsl:if>
<xsl:if test="//*[local-name()='source']">
<th align="left" class="diligent-header">Source</th>
<xsl:for-each select="//*[local-name()='source']">
<tr>
<td>
<xsl:value-of select="self::node()"/>
</td>
</tr>
</xsl:for-each>
</xsl:if>
<xsl:if test="//*[local-name()='language']">
<th align="left" class="diligent-header">Language</th>
<xsl:for-each select="//*[local-name()='language']">
<tr>
<td>
<xsl:value-of select="self::node()"/>
</td>
</tr>
</xsl:for-each>
</xsl:if>
<xsl:if test="//*[local-name()='relation']">
<th align="left" class="diligent-header">Relation</th>
<xsl:for-each select="//*[local-name()='relation']">
<tr>
<td>
<xsl:value-of select="self::node()"/>
</td>
</tr>
</xsl:for-each>
</xsl:if>
<xsl:if test="//*[local-name()='coverage']">
<th align="left" class="diligent-header">Coverage</th>
<xsl:for-each select="//*[local-name()='coverage']">
<tr>
<td>
<xsl:value-of select="self::node()"/>
</td>
</tr>
</xsl:for-each>
</xsl:if>
<xsl:if test="//*[local-name()='rights']">
<th align="left" class="diligent-header">Rights</th>
<xsl:for-each select="//*[local-name()='rights']">
<tr>
<td>
<xsl:value-of select="self::node()"/>
</td>
</tr>
</xsl:for-each>
</xsl:if>
</table>
</xsl:template>
</xsl:stylesheet>
VO and Users Management
Content & Storage Management
Content Management strictly relies on Storage Management. Therefore it is a prerequisite to setup a running instance of Storage Management before Content Management can be successfully started. There are two possibilities to setup Storage Management: a simple one using Apache Derby as a database backend and an advanced one, where an existing database is used via JDBC.
Simple Setup of Storage Management using Apache Derby
Apache Derby is an open source relational database implemented entirely in Java and available under the Apache License, Version 2.0 with a small footprint of about 2 megabytes. It is sufficient to be used as a database backend for getting started with Storage Management. However, when much data is stored or some more elaborate backup & recovery strategies should get used, traditional (huge) RDBMS might be a better choice.
If Storage Management is deployed dynamically or manually from the GAR, it's default installation places a configuration file in $GLOBUS_LOCATION/etc/<Service-Gar-Filename>/StorageManager.properties that expects Derby to be available and have permissions to write at file under ./StorageManagementService/db/storage_db. Derby is started in embedded mode, for which it doesn't even need a username or password. Multiple connections from the same Java Virtual Machine are possible and are also quite fast, but no two Java VM can access the DB at the same time.
If all dependencies have been installed correctly, the container should start and create a new database if needed.
The lines defining the JDBC connection to the database in the above mentioned configuration files are:
DefaultRawFileContentManager=jdbc\:derby DefaultRelationshipAndPropertyManager=jdbc\:derby # derby settings (Default) jdbc\:derby.class=org.diligentproject.contentmanagement.baselayer.rdbmsImpl.GenericJDBCDatabase jdbc\:derby.params.count=4 jdbc\:derby.params.0=local_derby_storage_db jdbc\:derby.params.1=org.apache.derby.jdbc.EmbeddedDriver jdbc\:derby.params.2=jdbc\:derby\:./StorageManagementService/db/storage_db;create\=true jdbc\:derby.params.3=5By changing the
jdbc\:derby.params.2=jdbc\:derby\:./StorageManagementService/db/storage_db;create\=trueafter derby\: you can choose another place to store the database.
In this setting, all relationships and properties as well as the raw file content are stored inside the Derby database. This is defined in the first two lines of the configurtaion snipped shown above.
Advanced Setup of Storage Management using an arbitrary relational JDBC database
Storage Management depends on the following external components:
- Apache Jakarta Commons Database Connection Pooling which requires itself Commons Pool and therefore also Commons Collections
- a JDBC-driver for the database to use.
The first one should get dynamically deployed, the second you will have to install since it depends only on the RDBMS you want to use. Most common choice is to use MySQL, since it is used for many of the gLite components as well like DPM or LFC, such that there is no need to set up another RDBMS. The corresponding JDBC driver is named Connector/J and is released under a dual-lincesing strategy like the MySQL RDBMS itself: a commercial license and the GNU General Public License. For this reason, neither the RDBMS nor the JDBC driver are directly distributed with the gCube software. The JDBC driver must be available to the container and therefore its .jar file(s) may need to be stored in $GLOBUS_LOCATION/lib/.
You will have to prepare the DBMS manually to create a new database that will get used for Storage Management. For this, you may also want to install mysql-client, MySQL Administrator, and MySQL Query Browser - or a database-independent tool like ExecuteQuery. On Scientific Linux 3, the following steps need to be performed:
apt-get install mysql-server mysql-client mysqladmin create <dbname> mysql --user=root <dbname>
This will install the MySQL server (if not already present) and the corresponding command-line client. The next line will create a new, empty database. The last line will connect to this database using the comand-line client. If the RDBMS has been set up to require a password for the local root account, use the option -p to be promted for the password. Once you are logged in, you have to create a new user with sufficient rights to connect, create new and alter tables and perform all kinds of selects, inserts, updates, delete from them in this database.
The easiest way to achieve this in MySQL isGRANT ALL PRIVILEGES ON <dbname>.* TO '<username>'@'%' IDENTIFIED BY '<password>';(MySQL has its very own syntax instead of CREATE USER here until version 5.0 - see [1] for more details.)
If you use MySQL versions < 5, it has by default a limited file size of individual database files of 4GB or even 2GB on some filesystems. This might become a problem if you either store many, many files or just a couple of huge files and MySQL might start to complain "Table is full". In this case, execute the SQL command
ALTER TABLE Raw_Object_Content MAX_ROWS=1000000000 AVG_ROW_LENGTH=1000;
to allocate pointers for bigger tables. See [2] for details.
Due to some inconvenience in the MySQL protocol for transfering BLOBs of several megabytes, you might have to increase the MAX_ALLOWED_PACKET variable in the my.cnf. On Scientif Linux this is located in /var/lib/mysql/ - see [3] for details.
For using MySQL, you can use the following lines in the above mentioned configuration file:
# local mysql settings (template for MySQL instances) jdbc\:mysql_local.class=org.diligentproject.contentmanagement.baselayer.rdbmsImpl.GenericJDBCDatabase jdbc\:mysql_local.params.count=4 jdbc\:mysql_local.params.0=local_mysql_db jdbc\:mysql_local.params.1=com.mysql.jdbc.Driver jdbc\:mysql_local.params.2=jdbc\:mysql\://127.0.0.1/storage_db?user\=THE_USER&password\=THE_PASS jdbc\:mysql_local.params.3=100You will have to change the line
jdbc\:mysql_local.params.2=jdbc\:mysql\://127.0.0.1/storage_db?user\=THE_USER&password\=THE_PASSin order to use the correct IP-address of your server, the database name, the username and the password. This is nothing else than a regular JDBC connection (plus \ infront of each : to escape them in the Java property file) string, so if you are familiar with that, it should be quite simple to use; otherwise there is plenty of documentation how to make sense out of this, e.g. [4].
In addition, you have to set the Storage Manager to use this database by default. Therefore simply edit the lines on top to:
DefaultRawFileContentManager=jdbc\:mysql_local DefaultRelationshipAndPropertyManager=jdbc\:mysql_local
If you do consistent renaming / copy & paste, there is no need to stick to the mysql_local.
Other databases, like PostgreSQL in a version > 8 or others should also work depending on there compliance to there standardization according to ANSI SQL92 and handling of BLOBs. But this has not been extensively tested yet.
Advanced Setup of Storage Management for protocol handlers
Storage Management is able to use a couple of other protocols to retrieve and store files. The default configuration contains the following entries:
# handlers protocol.handler.count=4 protocol.handler.0.class=org.diligentproject.contentmanagement.baselayer.inMessageImpl.InMemoryContentManager protocol.handler.1.class=org.diligentproject.contentmanagement.baselayer.networkFileTransfer.FTPPseudoContentManager protocol.handler.2.class=org.diligentproject.contentmanagement.baselayer.networkFileTransfer.HTTPPseudoContentManager ## Alternative for HTTPPseudoContentManager: Commons HTTPClient, requires additional libraries # protocol.handler.2.class=org.diligentproject.contentmanagement.baselayer.networkFileTransfer.CommonsHTTPClientPseudeContentManager protocol.handler.3.class=org.diligentproject.contentmanagement.baselayer.networkFileTransfer.GridFTPContentManager ## WARNING: do not run LocalFilesystemStorage handler on productive service, unless it is really, really secure! # protocol.handler.4.class=org.diligentproject.contentmanagement.baselayer.filesystemImpl.LocalFilesystemStorage
The handlers are used in the order they are defined in the configuration file. If several handlers claim to handle the same protocol, only the one with the lowest number is used. The count must be in line with the defined handlers from 0 to count-1.
The inmessage:// is convenient since it can transfer raw content directly inside the SOAP message and therefore does not require an additional communitcation protocol, which requires another handshaking between client and server and -in secure environment- a seperate authentication & authorization and possibly completely seperate user management. Unfortnately, this protocol does not work for files bigger than approximately 2 megabytes due to limitations of the container. There no simple modification that would enable GT4 and it's underlying Axis to cope with big base64 encoded message parts. Big files have to be transferred using other protocols, like FTP, HTTP, or GridFTP. The only workaround for downloads from SMS is to use chunked downloads. On the other hand, the before mentioned well-established protocols may also provide much better performance for big files, where the handshaking takes comparably small time.
The next two lines set up handlers for downloading from FTP and HTTP locations using the build-in clients of the Java Class Library. For better performance and also deal with some security issues in SUNs implementation [5], Apache Jakarta Commons HTTPClient can also be used. For this, simply comment outprotocol.handler.2.class=org.diligentproject.contentmanagement.baselayer.networkFileTransfer.HTTPPseudoContentManagerand uncomment
protocol.handler.2.class=org.diligentproject.contentmanagement.baselayer.networkFileTransfer.CommonsHTTPClientPseudeContentManagerThis will require that the correct .jar file from above mentioned location (or from the Service Archive) is also installed in $GLOBUS_LOCATION/lib/ together with its dependency on Apache Jakarta Commons Codec.
Advanced Setup of Storage Management for using not a database to store raw content
A template for using the file system instead of the RDBMS is presented in the configuration file:
# filesystem settings (another template) file\:fileStorage.class=org.diligentproject.contentmanagement.baselayer.filesystemImpl.LocalFilesystemStorage file\:fileStorage.params.count=1 file\:fileStorage.params.0=/usr/local/globus/etc/StorageManagementService/Stored_Content
To make this the default location to store the content, you have to set in the first lines of the configuration file:
DefaultRawFileContentManager=file\:fileStorage
Another option would be to use GridFTP here to store the files in a Storage Element on the Grid.
Setup of Content & Collection Management
Content & Collection Management entirely rely on Storage Management and interact heavily with it. Therefore it is a good choice to deploy them on the same node that is hosting Storage Management to avoid that network communication becames the bottleneck for perfermance.
The only parameter that might need adjustment can be found in both configuration files at $GLOBUS_LOCATION/etc/<CMS-GAR-Filename>/ContentManager.properties and $GLOBUS_LOCATION/etc/<ColMS-GAR-Filename>/CollectionManager.properties, respectively.
StorageManagementService=http\://127.0.0.1\:8080/wsrf/services/diligentproject/contentmanagement/StorageManagementServiceService
This line must point to the EPR of the Storage Management Service that should be used. If the GT4 container is running on it's default port 8080 and all three services are deployed on the same node, there should be no need to adjust this. Otherwise the port might need to get corrected (in both configruation files).
Metadata Management
The Metadata Management aims at modelling of arbitrary metadata relationships (IDB-relationships). The only assumption it does is that the metadata objects are serialized as well-formed XML documents. The service has a two-fold role:
- to manage Metadata Objects and Metadata Collections
- to establish secondary role-typed links. Such relationships can be in place between any type of Information Object and in the scope of a Collection or not
The Metadata Management Components'
The main functionality of the Metadata Management components is the management of Metadata Objects, Metadata Collection and their relationships. To operate over Metadata Collections, the Metadata Management instantiates Collection Managers for each collection. A Collection Manager is the access point to all the possible operations over a specific Metadata Collection. From an architectural point of view, the Metadata Manager adopts the Factory pattern and Collection Managers are implemented as a GCUBEWSResource. Physically, the service is composed by:
- the MetadataManagerFactory, a factory service that creates new Collection Managers and offers some cross-Collection operations
- the MetadataManagerService, a service that operates over Metadata Collections (MCs) and on Metadata Objects as Elements, i.e. members of a specific Metadata Collection
The MetadataManagerFactory
The MetadataManagerFactor Service creates new Collection Managers and offers some cross-Collection operations. Moreover, it operates on Metadata Objects as Information Objects related to other Information Objects and not as Members of Metadata Collections.
- createManager(CollectionID, params): This operation takes a Collection ID and a set of creation parameters and creates a new Manager in order to manage a Metadata Collection bound to such a Collection. If a Metadata Collection with the specified Metadata characteristics does not exist, the Manager creates the Metadata Collection, binds it with the Document Collection with the given secondary role relationship and publishes its profile in the Information System.
The Creation parameters are a set of key-value;the following keys are defined in the MMLibrary, the mandatory parameter accepted by the operation:
- COLLECTIONNAME -> name of the collection
- DESCRIPTION -> description
- ISUSERCOLLECTION -> if the collection is a user collection or not (“True”/”False”)
- ISINDEXABLE -> if the collection is indexable or not (“True”/”False”)
- RELATEDCOLLECTION -> the information
- METADATAFORMAT -> the metadata name and the metadata language as specified in the ISO 639-2
- SECONDARYROLE -> the secondary role
The optional parameter accepted by the operation are:
- GENERATEDBY -> the source Metadata Collection from which the current one has been generated (by the Metadata Broker), if any
- ISEDITABLE -> if the collection is editable or not (“True”/”False”)
- CREATOR -> the name of the creator of the Metadata Collection
- createManagerFromCollection (MetadataCollectionID): This operation takes a Metadata Collection ID. It returns:
- the related CollectionManager, if it already exists
- creates a new CollectionManager and returns it, if the Metadata Collection exists
- an error, if the Collection ID is not valid
- addMetadata(ObjectID, MO, SecondaryRole): This operation takes a new non-collectable Metadata Object and
- completes the metadata header information (e.g. the MOID, if it is not specified)
- stores (or updates if the MOID is already included in the MO header) the object on the Storage Management Service as Information Object
- creates a <is-described-by, <SecondaryRole>> binding in the Storage Management Service between the Metadata Object and the Information Object identified by the given Object ID
- returns the assigned MOID
- deleteMetadata(MOID): This operation deletes from the Storage Management Service the Metadata Object identified by the given ID.
- getMetadata ((ObjectID, SecondaryRole, CollectionID, Rank)[]): For each given ObjectID, this operation returns the Metadata Objets. They are:
- bound with the specified secondary role (the primary role is, of course, is-described-by) to the Information Object identified by that ObjectID
- members of the specified Metadata Collection. The operation relies on the String[] retrieveReferred(String targetObjectID, String role, String secondaryrole) operation of the Storage Management Service.
Index Management
Search Management
Each of the Search Framework Services, once deployed along with their dependencies, are designed to be autonomous and needs no user parametrization or supervising. Two issues that may come up and should be mentioned are the following:
- The user under which the services run must have write permissions to the /tmp directory.
- The execution of the plan produced by any Search Master Service, is determined by the presence of a lock file ($GLOBUS_LOCATION/etc/SearchMaster/BPEL.lock). If this file exists, the plan is forwarded to the Process Execution Service. Otherwise, the plan is executed internally by an embedded execution engine component (which does not support secure calls)
Feature Extraction
Currently, feature extraction reuses existing feature extractors that where developed and used in the ISIS/OSIRIS prototype system. These have been implemented in C++ using many libraries that are not easily portable to any other platform than Windows, on which the ISIS system is running. The Feature Extraction Service wraps an instance demo installation hosted at UNIBAS. The configuration of the service contains the URL of the service. Since this ISIS service is not a DILIGENT service, it cannot be dynamically retrived from the DIS. The other configuration parameter is the EPR of Content Management; this is configured in the service for debugging and performance reasons, since it allows for assignment of Feature Extraction Service to the closest CMS instance to reduce network traffic. In subsequent releases, the default is expected to change to dynamic retrieval of the CMS to contact and only allowing optional configuration to use dedicated instances. The configuration file can be found at $GLOBUS_LOCATION/etc/<FE-GAR-Filename>/FeatureExtraction.properties
#This file configures the feature extraction service. contentmanagement.contentmanagementservice.epr=http\://dil03.cs.unibas.ch\:8080/wsrf/services/diligentproject/contentmanagement/ContentManagementServiceService isis.fee.endpoint=http\://isisdemo.cs.unibas.ch\:9700/ARTE/FEE/ExtractFeature
Process Management
Most Process Management Services do not require any manual configuration after deployment, with the exception of the GLite Job Wrapper Service. The required configuration steps are outlined below.
GLite Job Wrapper Service configuration
There are two settings that must be defined in the JNDI configuration file of the service (usually $GLOBUS_LOCATION/etc/org_diligentproject_glite_jobwrapper/jndi-config.xml). These settings are used to specify the WMProxy endpoint to use for job submissions, and the user certificate for running the jobs. The format in the JNDI configuration file is as follows:
<environment name="proxyCredentialsFile" type="java.lang.String" value="/tmp/x509up_u1000"/>
<environment name="WMProxyURL" type="java.lang.String" value="https://dil01.cs.unibas.ch:7443/glite_wms_wmproxy_server"/>
The rest of the configuration should not be modified.
The proxyCredentialsFile is a VOMS proxy file on the local file system. The administrator of the node is responsible for making sure that this proxy certificate is valid (i.e. not expired) at all times, and that it is a certificate accepted by the WMProxy server pointed to by the WMProxyURL.