Resource Registry

Introduction

Resource registry is a resource aggregating and management component of the information related to the resources that exist on Information System (IS). It provides a high level of abstraction for accessing, creating and modifying the resources as well as an optimized way for these operations due to the minimization of the communications with the IS. Resource Registry can be considered as an intermediate level for accessing the IS that is environment independent.

It is important to note that Resource Registry is not a replacement of the Information System since in the context of gCube all information ultimately resides in the gCube IS. However, IS functionality such as statistics gathering and node registration can be incorporated through Resource Registry plugins.

The following components are currently using the Resource Registry to access and manage information from IS:

Search System
Data Sources
ASL
Search Manager portlet
PE2ng
- Workflow Engine
- Execution Engine
gCube Data Transformation Service
Data Access

Resource Registry is consisted by a few components that are available in our Maven repositories with the following coordinates

<!-- resource registry main components -->
<artifactId>rraggregator</artifactId>
<groupId>org.gcube.execution</groupId>
<version>...</version>
 
<artifactId>rraggregator-no-deps</artifactId>
<groupId>org.gcube.execution</groupId>
<version>...</version>
 
<groupId>org.gcube.execution</groupId>
<artifactId>rrmodel</artifactId>
<version>...</version>
 
<groupId>org.gcube.execution</groupId>
<artifactId>rrgcubebridge</artifactId>
<version>...</version>
 
<!-- plugins -->
<groupId>org.gcube.execution</groupId>
<artifactId>rrplugins</artifactId>
<version>...</version>
 
<groupId>org.gcube.execution</groupId>
<artifactId>rrgcubeplugins</artifactId>
<version>...</version>
 
<!-- configuration providers for portal/service-client -->
<groupId>org.gcube.execution</groupId>
<artifactId>rrconfprovider-service</artifactId>
<version>...</version>
 
<groupId>org.gcube.execution</groupId>
<artifactId>rrconfprovider-portal</artifactId>
<version>...</version>
 
 
<!-- configuration for earch component -->
<groupId>org.gcube.execution</groupId>
<artifactId>resourceregistry-configuration-default</artifactId>
<version>...</version>
 
<groupId>org.gcube.execution</groupId>
<artifactId>resourceregistry-configuration-dts</artifactId>
<version>...</version>
 
<groupId>org.gcube.execution</groupId>
<artifactId>resourceregistry-configuration-execution</artifactId>
<version>...</version>
 
<groupId>org.gcube.execution</groupId>
<artifactId>resourceregistry-configuration-index</artifactId>
<version>...</version>
 
<groupId>org.gcube.execution</groupId>
<artifactId>resourceregistry-configuration-portal</artifactId>
<version>...</version>
 
<groupId>org.gcube.execution</groupId>
<artifactId>resourceregistry-configuration-search</artifactId>
<version>...</version>
 
<groupId>org.gcube.execution</groupId>
<artifactId>resourceregistry-configuration-workflow</artifactId>
<version>...</version>
 
<!-- in case the component runs on multiple VOs and we want to exclude some VOs from being updated -->
<groupId>org.gcube.execution</groupId>
<artifactId>resourceregistry-configuration-vo-nonupdaters</artifactId>
<version>...</version>

Architecture

Resource Registry basically is a library on top of a level of datastores which holds the actual data. The datastore layer is what the user sees as a state of the Information System and where the user performs her management operations on these data. Periodically, Resource Registry contacts the IS in order to (i) get new data (Retrieve phase) and (ii) apply the local changes so they are visible to other components (Update phase). This operation is the basic working cycle of Resource Registry and is called Bridging Iteration. The datastores layer is consisted of the following different (DataNucleus) data stores:

Local Buffer Data Store
- Used to collect information during bridging iterations
- Internal use
- Current implementation: Derby
Local Data Store
- Contains the aggregated image produced by the latest iteration
- Guarantees consistent data
- Current implementation: Derby
Remote Data Store
- Used whenever the remote repository can be modeled as a DataNucleus data store
- E.g. LDAP
- Optional use

The persistency of this layer is provided by DataNucleus JDO.

The core logic of Resource Registry is included in the Resource Registry Aggregator component and can be summarized in the following five operations:

Environment initialization
Bridging iterations execution
Data store replication handling and coarse grained locking
Graceful shutdown
Automatic clear of datastores on error
Manual clear of datastores

Provider specific logic is included in repository provider implementations

Implementations handle the actual interfacing with the remote repository
E.g.: RRGCubeBridge interfaces with gCube IS
Implementations are easily replaceable
- Easy adaption to different environments
- Ability to change implementation in order to perform different tasks

The roles of Resource Registry

In the context of gCube the role of Resource Registry is three-fold:

Aggregator of all information related to the resources available in the infrastructure such as:
- Hosting Nodes
- Search System endpoints
- PE2ng execution nodes
- Workflow Engine Endpoints
- Data Sources
  - Full Text, Forward Indexes
  - OpenSearch
- Search Fields
  - Fields, Searchables, Presentables
- Data Collections
  - Tree Collections

Supporting component providing management abstraction to create, access and modify resources through entities model or predefined queries over a local database cache for a variety of components

Exposure and manipulation component of the configuration of the Search System
- Resource Registry can be configured to update any of the entities in its model
- In the gCube System the updated entities are: Fields, Searchables, Presentables and Static Configuration
- All updates are performed through the Search Manager portlet

Data Management

Currently there are two different ways for accessing, creating and/or modifying data through Resource Registry. Either directly though the API of the respectively model entities which expose a DAO pattern API or by using predefined queries, from the QueryHelper utility, that have been created to support some usual complex queries.

Manage through model entities

Example:

Field f = new Field();
f.setID("73025ae1-d15c-4735-9756-31ba019bc714");
f.load(true);
f.getName();

Manage through predefined queries

A multitude of queries are provided

Set<String> getCapabilitiesByFieldCollection(field,collection)

Through queries related to the entities themselves

Field.getSearchableFieldsOfCollectionByCapabilities(loadDetails,collection, capabilities)

Plugins

Resource Registry has been designed in a way that supports extentions by using plugins. There are currently 6 different types of supported plugins:

based on the phase of the bridging iteration in which they are executed
- PRE_RETRIEVE
- POST_RETRIEVE
- PRE_UPDATE
- POST_UPDATE
independent of bridging iteration cycles
- PERIODIC: Periodic execution, independent of bridging cycle period
- ONE_OFF: Executed only once

The order in which plugins are executed is configurable and it is significant only between plugins of the same type

Bridging iteration dependent plugins usually require loading of all available entities of certain types

In order to ensure maximum performance, the already retrieved entities should be passed to the plugin chain
Plugins declare the entity types they need
Repository provider implementations should ideally pre-fetch these entities and pass them to the plugin chain

Resource Registry has been extended to support a of management tasks by using small pluggable components that encapsulate the logic of each task. The following tasks have been implemented using plugins:

Automatic field creation/update
Data Source and presentable management
Field annotation management

Apart from those already presented there is also another group of plugins:

Technology specific Plugins
Tied to specific environments
- E.g. Unix, Tomcat, Axis2
Packaged in separate components
Used to perform tasks such as
- Environment data gathering: e.g. local gHN host name, port
- IS-like functionality (not in the gCube environment)

Currently Resource Registry supports the following plugins inside and outside the gCube environment:

Inside gCube environment:
- FieldUpdaterPlugin
  - POST_RETRIEVE
- DataSourceManagerPlugin
  - PRE_UPDATE
- PresentationInfoManagerPlugin
  - PRE_UPDATE
- GCubeHostFinderPlugin
  - ONE_OFF
Outside gCube environment:
- Axis2ServiceRegistrationPlugin
  - PERIODIC
- HostingNodeRegistrationPlugin
  - PERIODIC
- ServiceUnregistrationPlugin
  - PERIODIC
- TomcatHostFinderPlugin
  - ONE_OFF

Resource Registry Configuration

Resource Registry can be tailored to every need by configuring several properties. The user can configure which Datasource types, Datasource services and entities from the IS that wants manage and how (readonly/update). The bridging period can also be configured as well as the plugins she wants to run along with ResourceRegistry. Also, the database location and credentials can be changed using the configuation files. All these properties can be changed in the configuration files that come along with Resource Registry but for some usual cases there have been created components with these configurations already set. For example, there is a configuration for Resource Registry for the Search System component that can be used without any changes. We present a list of the configuration files that are need from Resource Registry and are included in the already created configuration components:

config.gcubebridge.properties: configuration of the Datasource types and services
datanucleus.buffer.properties: configuration of the location and credentials for the "buffer" database
datanucleus.derby.properties: configuration of the location and credentials for the "buffer" database
resourceregistry.properties: configuration of the bridging period, plugins etc.
targets.model.properties: configuration of readonly, update and inmemory entities from the IS

The following components have been created to support usual cases:

resourceregistry-configuration-dts. configuation for the Data Transformation
resourceregistry-configuration-execution. configuration for the Execution Engine
resourceregistry-configuration-index. configuration for the Index
resourceregistry-configuration-portal. configuration for the Portal
resourceregistry-configuration-search. configuration for the Search System
resourceregistry-configuration-workflow. configuration for the Workflow Engine
resourceregistry-configuration-default. default (not updater) configuration

In some cases the user may desire to run Resource Registry in multiple VOs but only update some of them. This can be done by specifying the list of the VOs she wants to exclude in the property file nonupdatescopes.properties in the component resourceregistry-configuration-vo-nonupdaters.

Note that the configuration only requires that the property files mentioned are loaded in the classpath, so it is possible to configure Resource Registry without having the specific configuration components but only load the propetry files in the classpath.

Provider Configuration

Since gCube 2.17.0 Resource Registry has been refactored to be integrated with the FeatherWeight Stack. Although most the dependencies to gcf have been removed, some of them were completely gCore specific. Because of that, the refactoring led to the creation of 2 different components for provider configuration:

rrconfprovider-portal: without dependencies to the gcf, used by portals
rrconfprovider-service: with few dependencies to the gcf, used by services and clients

One of these components should be loaded in the classpath depending on the use case.

Resource Registry

Contents

Introduction

Architecture

The roles of Resource Registry

Data Management

Manage through model entities

Manage through predefined queries

Plugins

Resource Registry Configuration

Provider Configuration

Navigation menu

Views

Personal tools

gCube Wiki

gCube features

gCube documentation

Integration and Distribution

Search

Tools