Difference between revisions of "Digital Library Administration"

From Gcube Wiki
Jump to: navigation, search
(Advanced Setup of Storage Management using an arbitrary relationanl JDBC-Database)
(Content & Storage Management)
Line 11: Line 11:
 
=== Simple Setup of Storage Management using Apache Derby ===
 
=== Simple Setup of Storage Management using Apache Derby ===
  
Apache Derby ([http://db.apache.org/derby/]) is an open source relational database implemented entirely in Java and available under the Apache License, Version 2.0 with a small footprint of about 2 megabytes. It is sufficient to be used as a database backend for getting started with Storage Management. However, when much data is stored or some more elaborate backup & recovery strategies should get used, traditional (huge) RDBMS might be a better choice.
+
[http://db.apache.org/derby/ Apache Derby] is an open source relational database implemented entirely in Java and available under the Apache License, Version 2.0 with a small footprint of about 2 megabytes. It is sufficient to be used as a database backend for getting started with Storage Management. However, when much data is stored or some more elaborate backup & recovery strategies should get used, traditional (huge) RDBMS might be a better choice.
  
 
If Storage Management is deployed dynamically or manually from the GAR, it's default installation places a configuration file in '''$GLOBUS_LOCATION/etc/<Service-Gar-Filename>/StorageManager.properties''' that expects Derby to be available and have permissions to write at file under '''./StorageManagementService/db/storage_db.'''
 
If Storage Management is deployed dynamically or manually from the GAR, it's default installation places a configuration file in '''$GLOBUS_LOCATION/etc/<Service-Gar-Filename>/StorageManager.properties''' that expects Derby to be available and have permissions to write at file under '''./StorageManagementService/db/storage_db.'''
Line 38: Line 38:
  
 
Storage Management depends on the following external components:
 
Storage Management depends on the following external components:
# [http://jakarta.apache.org/commons/dbcp/ Apache Commons Database Connection Pooling]
+
# [http://jakarta.apache.org/commons/dbcp/ Apache Commons Database Connection Pooling] which requires itself [http://jakarta.apache.org/commons/pool/ Commons Pool] and therefore also [http://jakarta.apache.org/commons/collections/ Commons Collections]
 +
# a JDBC-driver for the database to use.
 +
 
 +
The first one should get dynamically deployed, the second you will have to install since it depends only on the RDBMS you want to use. Most common choice is to use [http://www.mysql.com/ MySQL], since it is used for many of the gLite components as well like DPM or LFC, such that there is no need to set up another RDBMS. The corresponding JDBC driver is named [http://dev.mysql.com/downloads/connector/j/ Connector/J] and is released under a dual-lincesing strategy like the MySQL RDBMS itself: a commercial license and the GNU General Public License. For this reason, neither the RDBMS nor the JDBC driver are directly distributed with the gCube software.
 +
 
 +
You will have to prepare the DBMS manually to create a new database that will get used for Storage Management. For this, you may also want to install mysql-client,  [http://www.mysql.com/products/tools/administrator/ MySQL Administrator], and
 +
[http://www.mysql.com/products/tools/query-browser/ MySQL Query Browser] - or a database-independent tool like [http://executequery.org/ ExecuteQuery].
 +
On Scientific Linux 3, the following steps need to be performed:
 +
<pre>
 +
apt-get install mysql-server mysql-client
 +
mysqladmin create <dbname>
 +
mysql --user=root <dbname>
 +
</pre>
 +
This will install the MySQL server (if not already present) and the corresponding command-line client. The next line will create a new, empty database. The last line will connect to this database using the comand-line client. If the RDBMS has been set up to require a password for the local root account, use the option -p to be promted for the password. Once you are logged in, you have to create a new user with sufficient rights to connect, create new and alter tables and perform all kinds of selects, inserts, updates, delete from them in this database.
 +
The easiest way to achieve this in MySQL is <pre>GRANT ALL PRIVILEGES ON <dbname>.* TO '<username>'@'%' IDENTIFIED BY '<password>';</pre> (MySQL has its very own syntax instead of CREATE USER here until version 5.0 - see [http://dev.mysql.com/doc/refman/4.1/en/adding-users.html] for more details.)
 +
 
 +
If you use MySQL versions &lt; 5, it has by default a limited file size of individual
 +
database files of 4GB or even 2GB on some filesystems. This might become
 +
a problem if you either store many, many files or just a couple of huge
 +
files and MySQL might start to complain "Table is full". In this case, execute the SQL command
 +
<pre>ALTER TABLE Raw_Object_Content MAX_ROWS=1000000000 AVG_ROW_LENGTH=1000;</pre>
 +
to allocate pointers for bigger tables. See [http://dev.mysql.com/doc/refman/5.0/en/full-table.html] for details.
 +
 
 +
Due to some inconvenience in the MySQL protocol for transfering BLOBs of
 +
several megabytes, you might have to increase the MAX_ALLOWED_PACKET
 +
variable in the my.cnf. On Scientif Linux this is located in
 +
'''/var/lib/mysql/''' - see
 +
[http://dev.mysql.com/doc/refman/5.0/en/packet-too-large.html] for details.
 +
 
 +
For using MySQL, you can use the following lines in the above mentioned configuration file:
 +
<pre>
 +
# local mysql settings (template for MySQL instances)
 +
jdbc\:mysql_local.class=org.diligentproject.contentmanagement.baselayer.rdbmsImpl.GenericJDBCDatabase
 +
jdbc\:mysql_local.params.count=4
 +
jdbc\:mysql_local.params.0=local_mysql_db
 +
jdbc\:mysql_local.params.1=com.mysql.jdbc.Driver
 +
jdbc\:mysql_local.params.2=jdbc\:mysql\://127.0.0.1/storage_db?user\=THE_USER&password\=THE_PASS
 +
jdbc\:mysql_local.params.3=100
 +
</pre>
 +
 
 +
You will have to change the line <pre>jdbc\:mysql_local.params.2=jdbc\:mysql\://127.0.0.1/storage_db?user\=THE_USER&password\=THE_PASS</pre> in order to use the correct IP-address of your server, the database name, the username and the password. This is nothing else than a regular JDBC connection (plus \ infront of each : to escape them in the Java property file) string, so if you are familiar with that, it should be quite simple to use; otherwise there is plenty of documentation how to make sense out of this, e.g. [http://dev.mysql.com/doc/refman/5.1/en/connector-j-reference-configuration-properties.html].
 +
 
 +
In addition, you have to set the Storage Manager to use this database by default. Therefore simply edit the lines on top to:
 +
<pre>
 +
DefaultRawFileContentManager=jdbc\:mysql_local
 +
DefaultRelationshipAndPropertyManager=jdbc\:mysql_local
 +
</pre>
 +
If you do consistent renaming / copy &amp; paste, there is no need to stick to the '''mysql_local'''.
  
 
== Metadata Management ==
 
== Metadata Management ==

Revision as of 18:33, 23 July 2007

VDL Creation and Management

Resources Management

VO and Users Management

Content & Storage Management

Content Management strictly relies on Storage Management. Therefore it is a prerequisite to setup a running instance of Storage Management before Content Management can be successfully started. There are two possibilities to setup Storage Management: a simple one using Apache Derby as a database backend and an advanced one, where an existing database is used via JDBC.

Simple Setup of Storage Management using Apache Derby

Apache Derby is an open source relational database implemented entirely in Java and available under the Apache License, Version 2.0 with a small footprint of about 2 megabytes. It is sufficient to be used as a database backend for getting started with Storage Management. However, when much data is stored or some more elaborate backup & recovery strategies should get used, traditional (huge) RDBMS might be a better choice.

If Storage Management is deployed dynamically or manually from the GAR, it's default installation places a configuration file in $GLOBUS_LOCATION/etc/<Service-Gar-Filename>/StorageManager.properties that expects Derby to be available and have permissions to write at file under ./StorageManagementService/db/storage_db. Derby is started in embedded mode, for which it doesn't even need a username or password. Multiple connections from the same Java Virtual Machine are possible and are also quite fast, but no two Java VM can access the DB at the same time.

If all dependencies have been installed correctly, the container should start and create a new database if needed.

The lines defining the JDBC connection to the database in the above mentioned configuration files are:

DefaultRawFileContentManager=jdbc\:derby
DefaultRelationshipAndPropertyManager=jdbc\:derby
# derby settings (Default)
jdbc\:derby.class=org.diligentproject.contentmanagement.baselayer.rdbmsImpl.GenericJDBCDatabase
jdbc\:derby.params.count=4
jdbc\:derby.params.0=local_derby_storage_db
jdbc\:derby.params.1=org.apache.derby.jdbc.EmbeddedDriver
jdbc\:derby.params.2=jdbc\:derby\:./StorageManagementService/db/storage_db;create\=true
jdbc\:derby.params.3=5
By changing the
jdbc\:derby.params.2=jdbc\:derby\:./StorageManagementService/db/storage_db;create\=true
after derby\: you can choose another place to store the database.

In this setting, all relationships and properties as well as the raw file content are stored inside the Derby database. This is defined in the first two lines of the configurtaion snipped shown above.

Advanced Setup of Storage Management using an arbitrary relational JDBC-Database

Storage Management depends on the following external components:

  1. Apache Commons Database Connection Pooling which requires itself Commons Pool and therefore also Commons Collections
  2. a JDBC-driver for the database to use.

The first one should get dynamically deployed, the second you will have to install since it depends only on the RDBMS you want to use. Most common choice is to use MySQL, since it is used for many of the gLite components as well like DPM or LFC, such that there is no need to set up another RDBMS. The corresponding JDBC driver is named Connector/J and is released under a dual-lincesing strategy like the MySQL RDBMS itself: a commercial license and the GNU General Public License. For this reason, neither the RDBMS nor the JDBC driver are directly distributed with the gCube software.

You will have to prepare the DBMS manually to create a new database that will get used for Storage Management. For this, you may also want to install mysql-client, MySQL Administrator, and MySQL Query Browser - or a database-independent tool like ExecuteQuery. On Scientific Linux 3, the following steps need to be performed:

apt-get install mysql-server mysql-client
mysqladmin create <dbname>
mysql --user=root <dbname>

This will install the MySQL server (if not already present) and the corresponding command-line client. The next line will create a new, empty database. The last line will connect to this database using the comand-line client. If the RDBMS has been set up to require a password for the local root account, use the option -p to be promted for the password. Once you are logged in, you have to create a new user with sufficient rights to connect, create new and alter tables and perform all kinds of selects, inserts, updates, delete from them in this database.

The easiest way to achieve this in MySQL is
GRANT ALL PRIVILEGES ON <dbname>.* TO '<username>'@'%' IDENTIFIED BY '<password>';
(MySQL has its very own syntax instead of CREATE USER here until version 5.0 - see [1] for more details.)

If you use MySQL versions < 5, it has by default a limited file size of individual database files of 4GB or even 2GB on some filesystems. This might become a problem if you either store many, many files or just a couple of huge files and MySQL might start to complain "Table is full". In this case, execute the SQL command

ALTER TABLE Raw_Object_Content MAX_ROWS=1000000000 AVG_ROW_LENGTH=1000;

to allocate pointers for bigger tables. See [2] for details.

Due to some inconvenience in the MySQL protocol for transfering BLOBs of several megabytes, you might have to increase the MAX_ALLOWED_PACKET variable in the my.cnf. On Scientif Linux this is located in /var/lib/mysql/ - see [3] for details.

For using MySQL, you can use the following lines in the above mentioned configuration file:

# local mysql settings (template for MySQL instances)
jdbc\:mysql_local.class=org.diligentproject.contentmanagement.baselayer.rdbmsImpl.GenericJDBCDatabase
jdbc\:mysql_local.params.count=4
jdbc\:mysql_local.params.0=local_mysql_db
jdbc\:mysql_local.params.1=com.mysql.jdbc.Driver
jdbc\:mysql_local.params.2=jdbc\:mysql\://127.0.0.1/storage_db?user\=THE_USER&password\=THE_PASS
jdbc\:mysql_local.params.3=100
You will have to change the line
jdbc\:mysql_local.params.2=jdbc\:mysql\://127.0.0.1/storage_db?user\=THE_USER&password\=THE_PASS
in order to use the correct IP-address of your server, the database name, the username and the password. This is nothing else than a regular JDBC connection (plus \ infront of each : to escape them in the Java property file) string, so if you are familiar with that, it should be quite simple to use; otherwise there is plenty of documentation how to make sense out of this, e.g. [4].

In addition, you have to set the Storage Manager to use this database by default. Therefore simply edit the lines on top to:

DefaultRawFileContentManager=jdbc\:mysql_local
DefaultRelationshipAndPropertyManager=jdbc\:mysql_local

If you do consistent renaming / copy & paste, there is no need to stick to the mysql_local.

Metadata Management

Index Management

Search Management

Process Management