Difference between revisions of "GCore Based Information System"

From Gcube Wiki
Jump to: navigation, search
(Reference Architecture)
m (Luca.frosini moved page Information System to GCore Based Information System: Creating new Page for smartgear based IS)
 
(40 intermediate revisions by 5 users not shown)
Line 1: Line 1:
== Information System ==
+
[[Category: Developer's Guide]][[Category:Information System]]
The Information System (IS) plays a central role in a gCube-based Infrastructure by implementing the features supporting the publishing, discovery and ‘real-time’ monitoring of the set of resources forming a gCube-based infrastructure. It acts as the registry of the infrastructure, i.e. all the resources are registered there and every service partaking to the infrastructure must refer to it to dynamically discover the rest of Infrastructure constituents. For each resource, two kinds of information will be published:
+
{| align="right"
 +
|| __TOC__
 +
|}
 +
The gCube Information System (shortly, IS) delivers functionalities for publishing, discovering, and monitoring the set of resources forming the infrastructure. It acts as the registry of the infrastructure, i.e. all the resources are registered in the IS and every service partaking in the infrastructure must refer to it to dynamically discover the other infrastructure constituents. Moreover, the approach provided by the IS is of great support for the dynamic deployment capabilities of gCube.
  
* the ''profile'', statically characterising the resource, e.g. its type;  
+
In this context, a resource can be:
* the ''status'', characterising the operational status of the resource, e.g. indicators of the size of the resource currently managed .
+
* a [[Reference_Model#Resource_Domain|''gCube resource'']], supporting the deployment and operation of a gCube infrastructure;  
 +
* an ''instance state'', characterizing the operational state of an instance of a gCube service
 +
* a ''generic resource'', any XML well-formed document (a text that follows all the syntactic rules labelled as well-formedness rules in the [http://www.w3.org/TR/REC-xml/ XML specification])
  
 
Because of its central role, key requirements in terms of quality of service for such a subsystem are ''performance'', ''scalability'', ''freshness'' and ''availability''. Moreover, facilities supporting the interaction with such subsystem have been included in the gCore Framework.
 
Because of its central role, key requirements in terms of quality of service for such a subsystem are ''performance'', ''scalability'', ''freshness'' and ''availability''. Moreover, facilities supporting the interaction with such subsystem have been included in the gCore Framework.
  
=== Reference Architecture ===
+
== Reference Architecture ==
The functions requested to an information system fall in one of the following three phases: production/publishing, collection/storage, consumption/query. Similarly, the components forming this subsystem (presented in Figure 1) contribute to implement it with respect to one of these three functions.  
+
Architecturally, the IS is composed by a group of services and libraries enhancing the experience of potential clients. The central role is played by the '''InformationCollector''' (IC) service, in charge of collecting and storing information about the infrastructure (or a subset) and responding to those that call for discovering.
 +
There are two ways to feed the IC, depending on the nature of the information published. If the information is a gCube Resource profile, a request for publication must be sent to the '''Registry''' service. This service is devoted to validate and filter profiles in order to decide whether a resource is accepted or not as part of the infrastructure (other gCube services are in charge of regulating the access to the accepted resources).
 +
On the other hand, if the information to publish is an instance state or a generic resource, it does not need to pass through the Registry service's acceptance procedure and can be directly sent to the IC.
  
[[Image:IS-Architecture.png|frame|center|Figure 1. Information System Reference Architecture]]
+
The third service belonging the IS is the '''Notifier''', offering a mechanism for subscription/notification on events related to gCube Resource's lifetime. By relying on the [http://www.ibm.com/developerworks/library/specification/ws-notification/ WS-Notification] and in cooperation with the Registry service, this service sends notifications to subscribed consumers about events happening in the Registry service (such as the registration of a new resource).
  
The components supporting the production/publishing phase are:
+
All of the three services have a related client library abstracting over the details of the services' interface:
 +
* IS-Client: for interacting with the IC service for discovering
 +
* IS-Publisher: for interacting with the IC and Registry services for publication
 +
* IS-Notification: for becoming a consumer of gCube's notification events sent by the Notifier
  
* '''[[IS-Registry]]''' – this Service supports the publishing/un-publishing of ''gCube resources''; a gCube resource is advertised through its ''profile'', i.e. the resource profile represents the existence of a resource;
+
Finally, the Information System subsystem is equipped with an optional service named '''gLiteBridge'''. Its role is to foster the interoperability with gLite-based infrastructures by publishing in the IS computing elements, storage elements and sites harvested from their information systems (mainly BDII).
* '''[[IS-gLiteBridge]]''' – this Service supports the publishing/un-publishing of ''resources ''gathered from a gLite based infrastructure; a gCube-based infrastructure include resources forming a gLite-based infrastructure;
+
* '''[[IS-Publisher]]''' – this Library supports services in publishing/un-publishing groups of ''resource properties'' as well as registering/un-registering groups of ''topics''. Actually, this library is an ''interface'' other Services will rely on. Because of this fundamental role in supporting Services operation in a gCube-based infrastructure, a reference implementation of such an interface (''gCubePublisher'') is part of the gCore Framework;
+
  
The components supporting the collection/storage phase are:
+
Figure 1 presents the components of the Information System and their main interactions:  
  
* '''[[IS-IC]]''' – this Service aggregates the information published in the IS; form a logical point of view it is a global registry but, because of the expected quality of service, it has been designed to support a federation model, i.e. chains of [[IS-IC]] can be configured to collectively implement the global registry function;
+
[[Image:IS-Architecture.jpg|frame|center|Figure 1. Information System Architecture and Main Interactions]]
  
The components supporting the consumption/query phase are:
+
They globally deliver the following functionalities with respect to the information handled:
 +
* production and publication
 +
* collection and storage
 +
* discovery and consumption
  
* '''[[IS-Client]]''' – this Library supports Services in retrieving information published in the IS; it supports the discovery of both ''profiles'' and ''properties''. Actually, this library is an ''interface'' Services will rely on. Because of this fundamental role in supporting Services operation in a gCube-based infrastructure, a reference implementation of such an interface (''[[ExistClient|ExistLibrary]]'') is part of the gCore Framework;
+
The Information System supports two deployment scenarios: Standard Configuration and Advanced Configuration
* '''[[IS-Notifier]]''' – this Service supports other Services in subscribing/unsubscribing to ''topics'' produced by the various Services; this service decouples the actual producer of the topic from the actual consumer allowing for producers re-location;
+
== Standard Configuration ==
* '''[[IS-Manager]]''' – this Service supports other Services and clients in observing, checking, or keeping a continuous record of the status of the resources forming the infrastructure. Because of this role, it can also be classified as a component supporting the collection/storage phase but it is preferable to have it in the components supporting consumption/query phase because it is considered closer to this area.  
+
It does support the new [[Featherweight Stack| Featherweight Client Stack]], born to better support clients in interacting with web services. It currently does not yet provide support for subscription and notification.
 +
=== Server Side ===
 +
* '''[[IS-Collector|IS-InformationCollector]]''' – gCube Web Service: collect, store, and make available information related to the actual state of a gCube infrastructure and/or of an assigned subset of it;
 +
* '''[[IS-Registry]]''' – gCube Web Service: support the publishing/un-publishing of profiles describing gCube resources;  
 +
* '''[[IS-gLiteBridge]]''' – Optional - gCube Web Service: support the publishing/un-publishing of ''resources ''gathered from a gLite based infrastructure that gCube services may access to;
 +
=== Client Side ===
 +
* [[ic-client|'''ic-client''']] - NEW gCube [[Featherweight Stack| Featherweight Client Stack ]] Library: build on the API of <code>discovery-client</code> to support resource discovery over the [[IS-Collector|Information Collector]] service.
  
The subsystem has been conceived to rely on standards, in particular the WS-ServiceGroup[http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-ServiceGroup-1.2-draft-02.pdf]  and WS-ResourceProperty  specifications. From a technical point of view it exploits the ''Aggregator Framework'' software framework[http://www.globus.org/toolkit/docs/4.0/info/aggregator/] produced by the Globus Project[http://www.globus.org/]. The Aggregator Framework is a software framework that collects data from ''aggregator sources'' and sends data to ''aggregator sinks''. It also allows implementing pluggable and customized sources and sinks and connecting them together following the WS-ServiceGroup specification. These capabilities have been exploited in the IS by implementing some gCube services as aggregator sinks (e.g. the ''[[IS-IC]]'') and allowing any gCube service to become an aggregator source (through the ''gCubePublisher''); The [[IS-Registry]] is another example of aggregator source. The data exchanged within the IS connections are always ''WS-ResourceProperty documents''. This allows the IS to be as generic as possible and to be plugged with new aggregator sources at any time.
+
* [[Registry-Publisher|'''registry-publisher''']] - NEW gCube [[Featherweight Stack| Featherweight Client Stack ]] Library: API to publish resources with the [[IS-Registry|Registry]] service.
  
Another principle has been followed during the design of the IS: ''program to an interface, not an implementation'' by meaning that we tried to maintain the IS consumers and producers as much as possible decoupled from its implementation. Concretely, this reflects on mechanisms to dynamically load the IS-Client, IS-Notifier and IS-Publisher at runtime and exposing only their abstract interface. Then, once their clients depend on interfaces only, they are decoupled from the implementation.
+
== Advanced Configuration ==
 +
It does provide support for subscription and notification. However, it imposes constraints on client side.
 +
=== Server Side ===
 +
* '''[[IS-Collector|IS-InformationCollector]]''' –  gCube Web Service: collect, store, and make available information related to the actual state of a gCube infrastructure and/or of an assigned subset of it;
 +
* '''[[IS-Registry]]''' – gCube Web Service: support the publishing/un-publishing of profiles describing gCube resources;
 +
* '''[[IS-gLiteBridge]]''' – Optional - gCube Web Service: support the publishing/un-publishing of ''resources ''gathered from a gLite based infrastructure that gCube services may access to;
 +
* '''[[IS-Notifier]]''' – gCube Web Service: support other services in subscribing/unsubscribing to ''topics'' produced by the various Services; this service decouples the actual producer of the topic from the actual consumer allowing for producers re-location;
 +
=== Client Side ===
 +
* '''[[IS-Publisher]]''' – gCube Library: support services in publishing/un-publishing information in the Information Collector service. It's the gateway for any information going to the IS;
 +
* '''[[IS-Client]]''' – gCube Library: support services in discovering information published in the IS;
 +
* '''[[IS-Notification]]''' – gCube Library: provide a publication/subscription/notification mechanism for Topics produced and consumed by services.
 +
* '''[[IS-Cache]]''' - gCube Library: provide caching functionality for the information published in the IS;
  
[[Category:Information System]]
+
== Design Notes ==
 +
 
 +
The IS has been conceived to rely on standards, most noticeably:
 +
 
 +
* [http://www.ibm.com/developerworks/library/specification/ws-notification/ WS-Notifications]
 +
* [http://docs.oasis-open.org/wsrf/wsrf-ws_service_group-1.2-spec-pr-01.pdf WS-ServiceGroup 1.2]
 +
* [http://docs.oasis-open.org/wsrf/wsrf-ws_resource_properties-1.2-spec-os.pdf WS-ResourceProperty 1.2]
 +
* [http://www.ogf.org/documents/GFD.75.pdf Web Services Data Access and Integration – The XML Realization (WS-DAIX) Specification, Version 1.0]
 +
* [http://www.w3.org/TR/xpath-functions/ XQuery 1.0]
 +
 +
Early versions mostly exploited WS-ServiceGroup and WS-ResourceProperty  specifications. Starting from version 2.0 (released in Feb 2011), the IS is designed around the WS-DAIX specification for publishing.
 +
WS-Notifications is at the heart of the functionalities delivered by the IS-Notifier service. Finally, the queries accepted by the IS has to be compliant with the XQuery language.
 +
 
 +
Worthy to mention, during the design of the IS, the following principle has been widely adopted: ''program to an interface, not an implementation''. This means that we tried to maintain the IS consumers and producers as much as possible decoupled from its implementation. More concretely, a gCube service has to know only the IS-Client, IS-Notifier and IS-Publisher interfaces and that's all. It does not need to care about their implementation (mechanisms to dynamically load the IS-Client, IS-Notifier and IS-Publisher at runtime have been put in place) nor the actual IS deployment scenario (completely abstracted by the IS client libraries).
 +
 
 +
== QoS ==
 +
All the design aspects of the IS have been tackled taking into account the fact that if the IS does not work or works slowly or offers a poor service, all the infrastructure follows.
 +
The chain of operations involving the discovery phase is carefully designed and implemented to reduce the waiting time of callers. The IC service works in a stateless manner in this part, by only executing the query against the underlying XML indexing system. Also the SOAP messages sends and received are the simplest possible in order to reduce the marshaling and unmarshaling computation time.
 +
Yet, to do not overlap with the discovery phase, the publications work in a bulky way to reduce the incoming calls to the IC and do not compete with the invocations for queries. The IS-Publisher collects and queues requests for publication and sends them to the Registry and then to the IC by cutting as much as possible the number of competing calls.
 +
Form the deployment point of view, IS services can be distributed and partially replicated in a gCube infrastructure to manage subsets of resources (usually belonging to different scopes). Different scenarios can be set up in order to meet the performance and scalability requirements according to the extent of the infrastructure itself (e.g. how many resources to be managed, how many nodes are available, and so on).

Latest revision as of 13:14, 19 October 2016

The gCube Information System (shortly, IS) delivers functionalities for publishing, discovering, and monitoring the set of resources forming the infrastructure. It acts as the registry of the infrastructure, i.e. all the resources are registered in the IS and every service partaking in the infrastructure must refer to it to dynamically discover the other infrastructure constituents. Moreover, the approach provided by the IS is of great support for the dynamic deployment capabilities of gCube.

In this context, a resource can be:

  • a gCube resource, supporting the deployment and operation of a gCube infrastructure;
  • an instance state, characterizing the operational state of an instance of a gCube service
  • a generic resource, any XML well-formed document (a text that follows all the syntactic rules labelled as well-formedness rules in the XML specification)

Because of its central role, key requirements in terms of quality of service for such a subsystem are performance, scalability, freshness and availability. Moreover, facilities supporting the interaction with such subsystem have been included in the gCore Framework.

Reference Architecture

Architecturally, the IS is composed by a group of services and libraries enhancing the experience of potential clients. The central role is played by the InformationCollector (IC) service, in charge of collecting and storing information about the infrastructure (or a subset) and responding to those that call for discovering. There are two ways to feed the IC, depending on the nature of the information published. If the information is a gCube Resource profile, a request for publication must be sent to the Registry service. This service is devoted to validate and filter profiles in order to decide whether a resource is accepted or not as part of the infrastructure (other gCube services are in charge of regulating the access to the accepted resources). On the other hand, if the information to publish is an instance state or a generic resource, it does not need to pass through the Registry service's acceptance procedure and can be directly sent to the IC.

The third service belonging the IS is the Notifier, offering a mechanism for subscription/notification on events related to gCube Resource's lifetime. By relying on the WS-Notification and in cooperation with the Registry service, this service sends notifications to subscribed consumers about events happening in the Registry service (such as the registration of a new resource).

All of the three services have a related client library abstracting over the details of the services' interface:

  • IS-Client: for interacting with the IC service for discovering
  • IS-Publisher: for interacting with the IC and Registry services for publication
  • IS-Notification: for becoming a consumer of gCube's notification events sent by the Notifier

Finally, the Information System subsystem is equipped with an optional service named gLiteBridge. Its role is to foster the interoperability with gLite-based infrastructures by publishing in the IS computing elements, storage elements and sites harvested from their information systems (mainly BDII).

Figure 1 presents the components of the Information System and their main interactions:

Figure 1. Information System Architecture and Main Interactions

They globally deliver the following functionalities with respect to the information handled:

  • production and publication
  • collection and storage
  • discovery and consumption

The Information System supports two deployment scenarios: Standard Configuration and Advanced Configuration

Standard Configuration

It does support the new Featherweight Client Stack, born to better support clients in interacting with web services. It currently does not yet provide support for subscription and notification.

Server Side

  • IS-InformationCollector – gCube Web Service: collect, store, and make available information related to the actual state of a gCube infrastructure and/or of an assigned subset of it;
  • IS-Registry – gCube Web Service: support the publishing/un-publishing of profiles describing gCube resources;
  • IS-gLiteBridge – Optional - gCube Web Service: support the publishing/un-publishing of resources gathered from a gLite based infrastructure that gCube services may access to;

Client Side

Advanced Configuration

It does provide support for subscription and notification. However, it imposes constraints on client side.

Server Side

  • IS-InformationCollector – gCube Web Service: collect, store, and make available information related to the actual state of a gCube infrastructure and/or of an assigned subset of it;
  • IS-Registry – gCube Web Service: support the publishing/un-publishing of profiles describing gCube resources;
  • IS-gLiteBridge – Optional - gCube Web Service: support the publishing/un-publishing of resources gathered from a gLite based infrastructure that gCube services may access to;
  • IS-Notifier – gCube Web Service: support other services in subscribing/unsubscribing to topics produced by the various Services; this service decouples the actual producer of the topic from the actual consumer allowing for producers re-location;

Client Side

  • IS-Publisher – gCube Library: support services in publishing/un-publishing information in the Information Collector service. It's the gateway for any information going to the IS;
  • IS-Client – gCube Library: support services in discovering information published in the IS;
  • IS-Notification – gCube Library: provide a publication/subscription/notification mechanism for Topics produced and consumed by services.
  • IS-Cache - gCube Library: provide caching functionality for the information published in the IS;

Design Notes

The IS has been conceived to rely on standards, most noticeably:

Early versions mostly exploited WS-ServiceGroup and WS-ResourceProperty specifications. Starting from version 2.0 (released in Feb 2011), the IS is designed around the WS-DAIX specification for publishing. WS-Notifications is at the heart of the functionalities delivered by the IS-Notifier service. Finally, the queries accepted by the IS has to be compliant with the XQuery language.

Worthy to mention, during the design of the IS, the following principle has been widely adopted: program to an interface, not an implementation. This means that we tried to maintain the IS consumers and producers as much as possible decoupled from its implementation. More concretely, a gCube service has to know only the IS-Client, IS-Notifier and IS-Publisher interfaces and that's all. It does not need to care about their implementation (mechanisms to dynamically load the IS-Client, IS-Notifier and IS-Publisher at runtime have been put in place) nor the actual IS deployment scenario (completely abstracted by the IS client libraries).

QoS

All the design aspects of the IS have been tackled taking into account the fact that if the IS does not work or works slowly or offers a poor service, all the infrastructure follows. The chain of operations involving the discovery phase is carefully designed and implemented to reduce the waiting time of callers. The IC service works in a stateless manner in this part, by only executing the query against the underlying XML indexing system. Also the SOAP messages sends and received are the simplest possible in order to reduce the marshaling and unmarshaling computation time. Yet, to do not overlap with the discovery phase, the publications work in a bulky way to reduce the incoming calls to the IC and do not compete with the invocations for queries. The IS-Publisher collects and queues requests for publication and sends them to the Registry and then to the IC by cutting as much as possible the number of competing calls. Form the deployment point of view, IS services can be distributed and partially replicated in a gCube infrastructure to manage subsets of resources (usually belonging to different scopes). Different scenarios can be set up in order to meet the performance and scalability requirements according to the extent of the infrastructure itself (e.g. how many resources to be managed, how many nodes are available, and so on).