Storage Management

From Gcube Wiki
Revision as of 17:21, 4 May 2010 by Federico.defaveri (Talk | contribs) (Reference Model)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Storage Management

gCube is strongly data-oriented. Contrary to other environments for the grid, the approach taken within the gCube system to the management of files and other resources goes in the direction of a fine-grained control not only over files, but also over metadata referring to them and their relationships. Resources available in the infrastructure are described using a custom data model, the so called Information-Object model. This allows to store, even at the lowest level of interaction, not only the raw content files, but also a plentiful of other meta-informations, like properties and inter-relationships. On the base of this simple but flexible model, it is also possible to build more abstract and rich data models. The Storage Management Service is a fundamental piece of a gCube architecture. Its role is to take care of the storage of data, and provide through its interface an abstraction to storage based on the info-object model. Building on this basic data model, other services in the Information Organization family provide to other gCube services more sophisticated data models to manage complex documents, document collections, metadata and annotations.

Reference Model

The elementary constructs of the Information Object Model are information-objects and object references. They can be visualized respectively as nodes and arcs in a graph. An ER model clarifying how these constructs fit together is shown in Figure 1.

Figure 1. The Info Object Model

An information object (IO) represents an elementary information unity. It is uniquely identified by an Object Identifier (OID), is labelled with a name1 and a type2 and Information optionally annotated with a number of properties. These properties are simple key-type-value associations. Finally, it can be associated with a raw-content. The raw content of an object is content of any kind. The model hides the actual storage details of the content of an object, that can be for instance stored as a file in gLite or as BLOB-field in a database, or maintained in storage facilities not under direct control of the Information Organization Services, e.g. as file stored in a remote server and accessible through some protocol like http, ftp or gridftp. An object reference “links” two information objects. Each object might (i) reference many other objects and (ii) be referenced by many objects (m-n relationship). A reference is directed, it is labelled with a type attribute, called primary role, a secondary role, that may optionally further specify the function of the primary role3, and a position attribute, that allows to build ordered graph structures. It can also be associate with a number of other properties. The generality of this simple information model allows to build complex data-structures. Services within the Information Organization stack build on top of this model to offer a more structured and specific view of data. In particular, they can specialize the semantics attached to the labels used to annotate information objects and references, and thus creates new connections and properties used to construct custom data structures.

Detailed Service Description

The Storage Management Service provides the operations defined on Information-Objects by the Information-Object model introduced above. This includes assignment of storage properties, set up of inter-object relationships, and connection of an information object with a raw content (i.e. a file). Part of this information is internally maintained by the Storage Management Service, stored in a relational DBMS, whose logical schema is depicted in Figure 2. The raw content of information objects can be stored internally, using a file-system based mechanism, or stored in a distributed way on the grid (for example reside at an ftp site).

Figure 2. ER Schema Used to Instantiate the Information Object Model

This schema models the essential features of the information-object model introduced in the previous section. An Information Object is characterized by an OID, comprises a number of storage properties, can be seen as an abstraction over different types of contents and can reference other Information Objects via relationships. It can be noticed that the type and the name of the Information Object are not modelled explicitly in this schema. This attributes are managed as if they were just another storage properties, and constrained (when necessary) only at the application level. The other properties attached to an Information Object (and their intended semantics) are in general not defined in advance. However, the storage management itself attaches semantics to a number of properties which are used to handle or optimize the storage of info.objects and the access to them. These are:

  • Owner identifies a gCube user who owns that Information Object. Typically, the user who has created an Information Object becomes the owner.
  • Permission is plain access/update/removal directive for non-owning users. It states whether other users may access (read), update or remove this Information Object. It has been introduced to support a plain security mechanism.
  • URL is an access pattern for external documents that physically reside in archives and are only reflected by a placeholder Information Object in storage management. It is essential for archive import, in particular, if those archives host content which is not imported into gCube storage but resides in the archive.

To relationships is also possible to attach general-purpose properties, in the form of type and role. The Storage Management Service also supports attaching to a relationship a delete-propagation rule (consulted upon removal of the referring object to determine whether to automatically delete the referred object). This facility provides efficient support for integrity constraints for for models build above the info-object model, that use references to represent complex information. Several propagation rules have been defined. As mentioned above, all references are directed. Hence it is necessary not only to specify that deletion cascades, but also the direction in which this deletion is cascaded. The following rules are defined:

  • delete-target-propagate – indicates that whenever the source object is deleted, the referenced object (the target object of the reference and the reference itself) will also be deleted and deletion cascades. Similar behaviour is also triggered, if not the source object, but just the reference is removed. It should be used carefully;
  • delete-target-if-single-appearance – indicates that whenever the source object is deleted and the target object does not appear in the same role in another reference, then this target object is *deleted, thus minimizing the danger of accidental object deletions;
  • delete-source-propagate – indicates that whenever the target object is deleted, the source object of the reference (and the reference itself) will also be deleted and deletion cascades. Similar behaviour is also triggered, if not the source object, but just the reference is removed. Should be used with extreme caution;
  • delete-source-if-single-appearance – indicates that whenever the target object is deleted, the source object will only be deleted, if it does not appear in the same role in another reference. For instance, if a document is member of several collections, it will not be deleted until the last collection which it references has been deleted.
  • no-delete-propagate – indicates that no additional deletion is triggered, whenever this reference is deleted.

It is important to notice, however, that the storage manager is not responsible for maintaining the consistency of references among objects: the service ignores the semantics of roles introduced by higher-level services. This is the responsibility of the services that manage specific kind of objects (like complex documents). The Storage Management Service considers roles only in relation with the propagation of deletions. Whenever data needs to be transferred, e.g. to download a document, it is necessary to define how the raw file content should be made available. For this, It is important to notice, however, that the storage manager is not responsible for maintaining the consistency of references among objects: the service ignores the semantics of roles introduced by higher-level services. This is the responsibility of the services that manage specific kind of objects (like complex documents). The Storage Management Service considers roles only in relation with the propagation of deletions.

Resources and Properties

The Storage Management service is a WSRF compliant stateless web service. It does not publish any resource on the IS.

Service Port Types

Currently the SMS service expose two porttypes:

  • porttype1, the old port type.
  • porttype, a new port type with enhanced functionalities.