Resource Registry

From Gcube Wiki
Jump to: navigation, search


Introduction

Resource registry is a resource aggregating and management component of the information related to the resources that exist on Information System (IS). It provides a high level of abstraction for accessing, creating and modifying the resources as well as an optimized way for these operations due to the minimization of the communications with the IS. Resource Registry can be considered as an intermediate level for accessing the IS that is environment independent.

It is important to note that Resource Registry is not a replacement of the Information System since in the context of gCube all information ultimately resides in the gCube IS. However, IS functionality such as statistics gathering and node registration can be incorporated through Resource Registry plugins.

The following components are currently using the Resource Registry to access and manage information from IS:


Resource Registry is consisted by a few components that are available in our Maven repositories with the following coordinates

<!-- resource registry main components -->
<artifactId>rraggregator</artifactId>
<groupId>org.gcube.execution</groupId>
<version>...</version>
 
<artifactId>rraggregator-no-deps</artifactId>
<groupId>org.gcube.execution</groupId>
<version>...</version>
 
<groupId>org.gcube.execution</groupId>
<artifactId>rrmodel</artifactId>
<version>...</version>
 
<groupId>org.gcube.execution</groupId>
<artifactId>rrgcubebridge</artifactId>
<version>...</version>
 
<!-- plugins -->
<groupId>org.gcube.execution</groupId>
<artifactId>rrplugins</artifactId>
<version>...</version>
 
<groupId>org.gcube.execution</groupId>
<artifactId>rrgcubeplugins</artifactId>
<version>...</version>
 
<!-- configuration providers for portal/service-client -->
<groupId>org.gcube.execution</groupId>
<artifactId>rrconfprovider-service</artifactId>
<version>...</version>
 
<groupId>org.gcube.execution</groupId>
<artifactId>rrconfprovider-portal</artifactId>
<version>...</version>
 
 
<!-- configuration for earch component -->
<groupId>org.gcube.execution</groupId>
<artifactId>resourceregistry-configuration-default</artifactId>
<version>...</version>
 
<groupId>org.gcube.execution</groupId>
<artifactId>resourceregistry-configuration-dts</artifactId>
<version>...</version>
 
<groupId>org.gcube.execution</groupId>
<artifactId>resourceregistry-configuration-execution</artifactId>
<version>...</version>
 
<groupId>org.gcube.execution</groupId>
<artifactId>resourceregistry-configuration-index</artifactId>
<version>...</version>
 
<groupId>org.gcube.execution</groupId>
<artifactId>resourceregistry-configuration-portal</artifactId>
<version>...</version>
 
<groupId>org.gcube.execution</groupId>
<artifactId>resourceregistry-configuration-search</artifactId>
<version>...</version>
 
<groupId>org.gcube.execution</groupId>
<artifactId>resourceregistry-configuration-workflow</artifactId>
<version>...</version>
 
<!-- in case the component runs on multiple VOs and we want to exclude some VOs from being updated -->
<groupId>org.gcube.execution</groupId>
<artifactId>resourceregistry-configuration-vo-nonupdaters</artifactId>
<version>...</version>

Architecture

Resource Registry basically is a library on top of a level of datastores which holds the actual data. The datastore layer is what the user sees as a state of the Information System and where the user performs her management operations on these data. Periodically, Resource Registry contacts the IS in order to (i) get new data (Retrieve phase) and (ii) apply the local changes so they are visible to other components (Update phase). This operation is the basic working cycle of Resource Registry and is called Bridging Iteration. The datastores layer is consisted of the following different (DataNucleus) data stores:

  • Local Buffer Data Store
    • Used to collect information during bridging iterations
    • Internal use
    • Current implementation: Derby
  • Local Data Store
    • Contains the aggregated image produced by the latest iteration
    • Guarantees consistent data
    • Current implementation: Derby
  • Remote Data Store
    • Used whenever the remote repository can be modeled as a DataNucleus data store
    • E.g. LDAP
    • Optional use

The persistency of this layer is provided by DataNucleus JDO.


The core logic of Resource Registry is included in the Resource Registry Aggregator component and can be summarized in the following five operations:

  • Environment initialization
  • Bridging iterations execution
  • Data store replication handling and coarse grained locking
  • Graceful shutdown
  • Automatic clear of datastores on error
  • Manual clear of datastores

Provider specific logic is included in repository provider implementations

  • Implementations handle the actual interfacing with the remote repository
  • E.g.: RRGCubeBridge interfaces with gCube IS
  • Implementations are easily replaceable
    • Easy adaption to different environments
    • Ability to change implementation in order to perform different tasks

The roles of Resource Registry

In the context of gCube the role of Resource Registry is three-fold:

  • Aggregator of all information related to the resources available in the infrastructure such as:
    • Hosting Nodes
    • Search System endpoints
    • PE2ng execution nodes
    • Workflow Engine Endpoints
    • Data Sources
      • Full Text, Forward Indexes
      • OpenSearch
    • Search Fields
      • Fields, Searchables, Presentables
    • Data Collections
      • Tree Collections
  • Supporting component providing management abstraction to create, access and modify resources through entities model or predefined queries over a local database cache for a variety of components
  • Exposure and manipulation component of the configuration of the Search System
    • Resource Registry can be configured to update any of the entities in its model
    • In the gCube System the updated entities are: Fields, Searchables, Presentables and Static Configuration
    • All updates are performed through the Search Manager portlet

Data Management

Currently there are two different ways for accessing, creating and/or modifying data through Resource Registry. Either directly though the API of the respectively model entities which expose a DAO pattern API or by using predefined queries, from the QueryHelper utility, that have been created to support some usual complex queries.

Manage through model entities

  • Example:
Field f = new Field();
f.setID("73025ae1-d15c-4735-9756-31ba019bc714");
f.load(true);
f.getName();

Manage through predefined queries

  • A multitude of queries are provided
Set<String> getCapabilitiesByFieldCollection(field,collection)
  • Through queries related to the entities themselves
Field.getSearchableFieldsOfCollectionByCapabilities(loadDetails,collection, capabilities)


Plugins

Resource Registry has been designed in a way that supports extentions by using plugins. There are currently 6 different types of supported plugins:

  • based on the phase of the bridging iteration in which they are executed
    • PRE_RETRIEVE
    • POST_RETRIEVE
    • PRE_UPDATE
    • POST_UPDATE
  • independent of bridging iteration cycles
    • PERIODIC: Periodic execution, independent of bridging cycle period
    • ONE_OFF: Executed only once

The order in which plugins are executed is configurable and it is significant only between plugins of the same type

Bridging iteration dependent plugins usually require loading of all available entities of certain types

  • In order to ensure maximum performance, the already retrieved entities should be passed to the plugin chain
  • Plugins declare the entity types they need
  • Repository provider implementations should ideally pre-fetch these entities and pass them to the plugin chain


Resource Registry has been extended to support a of management tasks by using small pluggable components that encapsulate the logic of each task. The following tasks have been implemented using plugins:

  • Automatic field creation/update
  • Data Source and presentable management
  • Field annotation management


Apart from those already presented there is also another group of plugins:

  • Technology specific Plugins
  • Tied to specific environments
    • E.g. Unix, Tomcat, Axis2
  • Packaged in separate components
  • Used to perform tasks such as
    • Environment data gathering: e.g. local gHN host name, port
    • IS-like functionality (not in the gCube environment)


Currently Resource Registry supports the following plugins inside and outside the gCube environment:

  • Inside gCube environment:
    • FieldUpdaterPlugin
      • POST_RETRIEVE
    • DataSourceManagerPlugin
      • PRE_UPDATE
    • PresentationInfoManagerPlugin
      • PRE_UPDATE
    • GCubeHostFinderPlugin
      • ONE_OFF
  • Outside gCube environment:
    • Axis2ServiceRegistrationPlugin
      • PERIODIC
    • HostingNodeRegistrationPlugin
      • PERIODIC
    • ServiceUnregistrationPlugin
      • PERIODIC
    • TomcatHostFinderPlugin
      • ONE_OFF

Resource Registry Configuration

Resource Registry can be tailored to every need by configuring several properties. The user can configure which Datasource types, Datasource services and entities from the IS that wants manage and how (readonly/update). The bridging period can also be configured as well as the plugins she wants to run along with ResourceRegistry. Also, the database location and credentials can be changed using the configuation files. All these properties can be changed in the configuration files that come along with Resource Registry but for some usual cases there have been created components with these configurations already set. For example, there is a configuration for Resource Registry for the Search System component that can be used without any changes. We present a list of the configuration files that are need from Resource Registry and are included in the already created configuration components:

  • config.gcubebridge.properties: configuration of the Datasource types and services
  • datanucleus.buffer.properties: configuration of the location and credentials for the "buffer" database
  • datanucleus.derby.properties: configuration of the location and credentials for the "buffer" database
  • resourceregistry.properties: configuration of the bridging period, plugins etc.
  • targets.model.properties: configuration of readonly, update and inmemory entities from the IS

The following components have been created to support usual cases:

  • resourceregistry-configuration-dts. configuation for the Data Transformation
  • resourceregistry-configuration-execution. configuration for the Execution Engine
  • resourceregistry-configuration-index. configuration for the Index
  • resourceregistry-configuration-portal. configuration for the Portal
  • resourceregistry-configuration-search. configuration for the Search System
  • resourceregistry-configuration-workflow. configuration for the Workflow Engine
  • resourceregistry-configuration-default. default (not updater) configuration

In some cases the user may desire to run Resource Registry in multiple VOs but only update some of them. This can be done by specifying the list of the VOs she wants to exclude in the property file nonupdatescopes.properties in the component resourceregistry-configuration-vo-nonupdaters.

Note that the configuration only requires that the property files mentioned are loaded in the classpath, so it is possible to configure Resource Registry without having the specific configuration components but only load the propetry files in the classpath.

Provider Configuration

Since gCube 2.17.0 Resource Registry has been refactored to be integrated with the FeatherWeight Stack. Although most the dependencies to gcf have been removed, some of them were completely gCore specific. Because of that, the refactoring led to the creation of 2 different components for provider configuration:

  • rrconfprovider-portal: without dependencies to the gcf, used by portals
  • rrconfprovider-service: with few dependencies to the gcf, used by services and clients

One of these components should be loaded in the classpath depending on the use case.