Difference between revisions of "Developer's Guide: Introduction"

From Gcube Wiki
Jump to: navigation, search
m
m (Related Documents)
 
(9 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
[[Category: Developer's Guide]]
 +
{|align=right
 +
||__TOC__
 +
|}
 +
 
== Overview ==
 
== Overview ==
Welcome to the gCube's Developers Guide. The purpose of this document is to provide instructions for developers wishing to exploit a gCube based grid infrastructure. gCube is a versatile, rich featured grid platform that has been developed in the context of the D4Science European IST research project [http://www.d4science.eu].  
+
gCube is a software system specifically conceived to enable the creation and operation of an innovative typology of infrastructure - an [http://en.wikipedia.org/wiki/Hybrid_Data_Infrastructure Hybrid Data Infrastructure] - that by leveraging Grid, Cloud, Digital Library and Service-orientation principles and approaches is delivering a number of data management facilities '''as-a-Service'''.  
 +
One of its distinguishing feature is the orientation to serve the needs of diverse Communities of Practice by providing each of them with a dedicated, flexible, ready-to-use, web-based working environment, named [http://en.wikipedia.org/wiki/Virtual_research_environment Virtual Research Environment].
  
The platform follows the Service Oriented paradigm and exploits and extends various existing grid middlewares and collaborative tools like the Globus Toolkit 4 [http://www.globus.org], gLite [http://www.cern.ch/glite], the GridSphere Portal Framework [http://www.gridsphere.org], etc. gCube offers a feature full platform for distributed hosting, management and retrieval of data and information, and a framework for extending state-of-the-art indexing, selection, fusion, extraction, description, annotation, transformation, and presentation of content.
+
gCube offers a feature full platform for distributed hosting, management and retrieval of data and information, and a framework for extending state-of-the-art processing, indexing, selection, fusion, extraction, description, annotation, transformation, and presentation of "data".
  
== gCube Architecture ==
+
The [[Developer's Guide | gCube Developer's Guide]] describes how to develop software components capable of interfacing with gCube to either be part of it or to reuse some of its facilities.
A ''Reference Architecture'' is an architectural design pattern that indicates how an abstract set of mechanisms and relationships realises a predetermined set of requirements.
+
 
+
The gCube system captured by the Reference Architecture in Figure 1 is the Software System resulting by combining in a Service Oriented Architecture a number of subsystems.
+
 
+
[[Image:D4Science_architecture.png|frame|center|Figure 1. D4Science System Reference Architecture]]
+
 
+
Such subsystems are organised in a three-tier architecture consisting of:
+
 
+
* the '''''gCube run-time environment''''', named gCube Hosting Node environment or simply gHN, is the set of subsystems equipping each gCube empowered machine and forming the platform for the hosting and operation of the rest of system constituents. Namely, it consists of the ''gCube Container (to run gCube Services), the ''gCore Framework, named gCF'' [https://wiki.gcore.research-infrastructures.eu/gCube/index.php/Main_Page](to reinforce the gCube Container in supporting the operation of gCube Services), a number of local services, namely [[Deployer]], [[gHNManager]], and Delegation, and a number of libraries and stubs needed to manage the communication with all other gCube services.
+
* the '''''[[gCube Infrastructure Enabling Services]]''''' is the set of subsystems constituting the backbone of the gCube system and responsible to implement (''i'') the operation of an ''e-Infrastructure'' supporting ''resources sharing'' and (''ii'') the definition and operation of ''Virtual Research Environments'';
+
* the '''''gCube Application Services''''' is the set of subsystems implementing facilities for (''i'') ''storage'', ''organisation'', ''description'' and ''annotation'' of information in a VRE (''[[gCube Information Organisation Services|Information Organisation Services]]''), (''ii'') ''retrieval'' of information in the context of a VRE (''[[gCube Information Retrieval Services|Information Retrieval Services]]'') and (''iii'') provision of VO and VRE users with an interface for ''accessing'' such an e-Infrastructure.
+
 
+
The overall architecture has been designed following the Service Oriented Architecture principles:
+
 
+
* the main constituents of each subsystem are expected to be ''loosely-coupled'' Web Services (actually WSRF services);
+
* the constituents of the gCube-based e-Infrastructure will be discovered thanks to the ''[[Information System]]'' subsystem that, as usual, becomes fundamental to guarantee the operation of the rest;
+
* such loosely-coupled Services can organised in workflows as to form compound services whose orchestration is guaranteed by the ''[[Process Management]]'' subsystem.
+
 
+
It is worth noting in this reference architecture that the runtime environment is an integral part of the overall system because the management of the environment hosting the services and the management of the service lifetime is part of the gCube business logic. Thanks to the gHN capabilities, other gCube services can be dynamically deployed on remotely gHNs to serve the needs of Virtual Research Environments. Figure 2 presents the '''''gCube Hosting Node (gHN)''''' Reference Architecture.
+
 
+
[[Image:ghn_architecture.png|frame|center|Figure 2. gHN Reference Architecture]]
+
 
+
In the remainder of this section the constituents of the Reference Architecture are introduced starting from the lower layer.
+
 
+
=== The gCore Framework ===
+
The ''gCore Framework'' (gCF) is a Java framework for the development of high-quality gCube services and service clients. It provides an application framework that allows gCube services to abstract over functionality lower in the web services stack (WSRF, WS Notification, WS Addressing, etc.) and to build on top of advanced features for the management of state, scope, events, security, configuration, fault, service lifetime, and publication and discovery.
+
 
+
=== The gCube Infrastructure Enabling Services ===
+
The ''[[gCube Infrastructure Enabling Services]]'' is the family of subsystems implementing the foundational services that guarante the operation of the e-Infrastructure. Such functions are organised in four main areas: (''i'') organisation and execution of Virtual Research Environments (''[[VRE Management]]'') by guaranteeing an optimal consumption of the available resources (''[[Broker and Matchmaker]]''); (''ii'') registration of the infrastructure constituents (''[[Information System|Information Service]]''); (''iii'') the authentication and authorization policy enforcement enabling the highly controlled sharing of infrastructure constituents (''[[Virtual Organisation Management]]''); and (''iv'') definition and orchestration of complex workflows (''[[Process Management]]'') by guaranteeing an optimal consumption of the available resources (''[[Process Optimisation]]''). In particular:
+
 
+
* the '''[[VRE Management]]''' services are responsible for: (''i'') the definition of VREs and (''ii'') the dynamic deployment of VRE resources across the infrastructure. VREs definitions are declaratively specified through an appropriate and user-friendly user interface in a dedicated language and inform the derivation of an optimal deployment plan. The plan is based on availability, QoS requirements, resource inter-dependencies, and VRE sharing policies, but also on monitoring of failures (resources are dynamically redeployed) and load (resources are dynamically replicated). Three distinguished services (''[[Software Repository]]'', ''[[Deployer]]'', ''[[gHNManager|gCube Hosting Node Manager]]'') support VRE definition and dynamic deployment by, respectively, collecting service implementations, deploying service implementations and their dependencies on gHN, and hosting such service implementations at selected nodes.
+
* The '''[[Broker and Matchmaker]]''' service identifies the set of gHNs where to deploy a set of services. In particular, given a set of packages to be deployed, their requirements versus the environment and/or other services, it identifies the set of gHN to be used as target hosts for the deployment action.
+
* the '''[[Information System|Information Service]]''' allows the publication of descriptive information about VRE resources, the discovery of VRE resources based on descriptive information, and the real-time monitoring of VRE resources based on subscription/notification mechanisms. Heavily relied upon all the functional layers of gCube, the [[Information System|Information Service]] is a replicated service in which instances communicate in peer-to-peer fashion to maximize availability, response time, and fault tolerance.
+
* the '''[[Virtual Organisation Management]]''' services are responsible for equipping gCube with a robust and flexible security framework for managing Virtual Organizations (VOs). gCube exploits the VO mechanism to enforce a trusted and controlled environment in each dynamically created VRE. The main features consist in user and group management, authentication support, authorization definition, delegation, and enforcement of the security credential. These services rely on and integrate VOMS  and Globus Security Infrastructure (GSI)  to provide gCube with a security framework supporting various configurations. The actors of this framework are humans as well as services.
+
* the '''[[Process Management]]''' and '''[[Process Optimisation]]''' services support the definition and execution of processes, i.e., workflows combining gCube services, external services and gLite jobs to deliver new functionalities (also known as ''programming in the large''). In particular, these services provide the basic functionality for ''(i)'' creating processes either via a graphical ''process modelling'' tool or via a BPEL definition , ''(ii)'' reliably ''executing'' processes in a fully distributed and decentralized, thus highly scalable, way and ''(iii)'' ''optimizing'' processes both at build-time and at run-time. Process execution facility has been designed and implemented to take full advantages of the Grid, i.e. process steps are outsourced to the resources forming the e-Infrastructure and the process is executed in a distributed peer-to-peer modality. In particular, the [[Process Management]] service integrates the gLite software, thus enabling gCube to run such processes on EGEE resources. A monitoring front-end allows to get information on individual process instances which are not materialized on a single host because of their distributed execution. This monitoring front-end allows administrators to follow the state of execution of a process instance online and also shows where the different parts are being executed.
+
 
+
=== The gCube Application Services ===
+
The gCube Application Services is the family of subsystems delivering three outstanding functions of any Virtual Research Environment: (''i'') ''storage'', ''description'' and ''annotation'' of information in a VRE (''[[gCube Information Organisation Services]]''), (''ii'') ''retrieval'' of information in the context of a VRE (''[[gCube Information Retrieval Services]]'') and (''iii'') provision of VRE users with an interface for ''accessing'' such an information and the rest of functions equipping a VRE (''[[gCube Presentation Services]]'').
+
 
+
==== The gCube Information Organisation Services  ====
+
 
+
The ''[[gCube Information Organisation Services]]'' is the family of subsystems implementing the foundational services guaranteeing the management (storage, organisation, description and annotation) of information by implementing the notion of ''Information Objects'', i.e. logical unit of information potentially consisting of and linked to other Information Objects as to form ''compound objects''. Such functions are organised in three main areas: (''i'') the storage and organisation of Information Objects and their constituents (''[[Content Management|Content]] and [[Storage Management]]''); (''ii'') the management of the metadata objects equipping each Information Object (''[[Metadata Management]]''); and (''iii'') the management of the annotations objects potentially enriching each Information Objects (''[[Annotation Management]]''). In particular:
+
 
+
* the '''[[Content Management|Content]] & [[Storage Management]]''' services provide transparent access to Information Objects managed through gCube. In particular, they provide basic functionality for: ''(i)'' manipulating Information Objects and/or collections, i.e. creating, accessing, storing, and removing; ''(ii)'' orchestrating distributed storage nodes and providing a transparent access to them; ''(iii)'' a notification mechanism to maintain derived data upon changes in content; and ''(iv)'' importing Information Objects from different content providers through ''wrappers''. The kind of Information Object manageable by the [[Content Management|Content]] & [[Storage Management]] services is generic and flexible enough to model and thus support several content types. To make full exploitation of Grid storage facilities, the ''[[Storage Management]]'' service provides an abstract interface to the underlying distributed and heterogeneous actual storage interfaces and technologies (e.g. DPM  via SRM  and the GFAL interface ).Thanks to the gCube replication management subsystem and by integrating gLite, the gCube ''[[Storage Management]]'' service is capable to exploit the storage capacity of the EGEE infrastructure and maintain multiple copies of the Information Objects as to maximise IOs availability.
+
* the '''[[Metadata Management]]''' services provide functionality for managing metadata objects, i.e. additional data attached to Information Objects. In particular, these services support ''(i)'' the manipulation of metadata objects and metadata collections, i.e. creating, accessing, storing, and removing metadata objects compliant to one or more metadata format, ''(ii)'' the definition of metadata formats, ''(iii)'' the transformation of metadata objects into diverse formats via user-defined transformation programs, and ''(iv)'' the search for metadata objects. These characteristics make the services capable to manage multiple formats of metadata. Moreover, the support for diverse metadata formats and the relative transformation programs are an important feature for dealing with heterogeneity issues. To store metadata objects the services exploit the storage facilities provided by the [[Content Management|Content]] & [[Storage Management]] and thus guarantee improvement in reliability and access of managed objects.
+
* the '''[[Annotation Management]]''' services are responsible for cross-model, and cross-media back-end management of annotations, a manually authored and subjective specialisation of metadata objects. The services mediate between interactive annotation front-ends and [[Metadata Management]] services by: ''(i)'' enforcing a consistent modelling of annotation relationships between Information Objects, and ''(ii)'' increasing the simplicity, granularity, and flexibility with which annotations are created, collected, deleted, updated, and interrelated as specific forms of metadata objects.
+
 
+
==== The gCube Information Retrieval Services ====
+
The [[gCube Information Retrieval Services]] is the family of components offering Information Retrieval (IR) facilities to the gCube infrastructure, i.e. allowing searching over data and information by a wide range of techniques. The IR family of services can be decomposed in three major categories, which are presented below and are entitled as “frameworks” due to the fact that they are not standalone services. Instead, they are rather large collaborating systems based on protocols, specifications and software, which expose remarkable extensibility to the gCube system they empower:
+
 
+
* '''[[Search Framework]]''': This category includes all services focused on the search-specific aspects of the gCube platform. More analytically, it consists of the ''search orchestrator component'', ''search operators'', ''query processor components'' and the ''data transfer mechanism''. The workflow required for computing a user-query is the following: The search orchestrator receives queries from the gCube portal, communicates with the gCore IS service for retrieving environment information. In the next step, the orchestrator feeds this information along with the query to the query processor components which ultimately produce an execution plan. This plan is forwarded to the gCube execution engine (one of which is the [[Process Management|Process Management Service]]) which orchestrates the execution by invoking the search operators, as dictated by the plan. The data transfer is performed by the ''[[GCube ResultSet (gRS)|ResultSet]]'' component of the [[Search Framework]]. However, due to its importance, it requires special credit and therefore is analyzed in a distinct section. The final results are then forwarded to the user (portal). The Search Operators cover most of the traditional relational algebra operations, as well as some advanced ones, such as geospatial search and similarity search, thus providing a full fledged set of capabilities to the final user. [[Index Management Framework|Index Management]] and [[Distributed Information Retrieval Support Framework|DIR]] frameworks provide a major part of the Search Operators and are analyzed in distinct sections.
+
* '''[[Index Management Framework]]''': This category includes all services that are involved in the creation and management of gCube indices. Management refers to all aspects of an index lifecycle as well as support for search capabilities. In gCube a rich set of indices, such as full text, forward, feature, geospatial indices, is employed, offering a full-fledged set of storage and search capabilities regarding various data types and models. The services of [[Index Management Framework]] communicate with the [[Content Management|Content]] and [[Storage Management]] services in order to acquire the data set to be indexed and also to preserve their state. They also employ the gCore [[Information System|IS]] capabilities so as to publish themselves and therefore be used by clients.
+
* '''[[Distributed Information Retrieval Support Framework]]''': This category includes all services which enhance and support the [[gCube Information Retrieval Services|IR system]]. This framework provides higher-level IR capabilities which include content ranking, source selection and result set fusion (ranked merging of various data sets). Components of this framework communicate with the [[Index Management Framework|Index Management Services]] for statistic extraction and the [[Information System|IS service]] for information publication. [[Search Framework]] employs the advanced capabilities offered by [[Distributed Information Retrieval Support Framework|DIR framework]] in order to enhance its search capabilities, by refining queries, enhancing produced search results and finally exhibiting a higher level of services.
+
* An additional component which does not belong to any of the frameworks mentioned above, but acts independently and improves the search quality, is the '''[[Personalisation|Personalisation Service]]'''. It is indirectly invoked by the [[Search Framework]], through an appropriate wrapper, and used for enhancing user queries, by injecting additional “personalized” information.
+
 
+
==== The gCube Presentation Services ====
+
The [[gCube Presentation services]] form the logical top layer of a gCube-powered infrastructure. Their objective is twofold:
+
 
+
* To provide the means to build user interfaces for interacting with and exploiting the gCube system and infrastructure.
+
* To provide a full range of user interfaces for achieving interaction with the system, out-of-the-box.
+
 
+
The [[gCube presentation layer]] is based on the [[ASL_Library|Application Support Layer]] (ASL), which is a framework that abstracts the complexity of the underlying infrastructure so that the front-end developer focuses on the objectives of presentation rather the details of the protocols and rules for interacting with the underlying (WSRF) services. The ASL exposes to the developer well known tools as session and credential management and is accessible through various interfaces (currently HTTP and JAVA-native).
+
 
+
On top of the [[ASL_Library|ASL]] the developer can develop the user interface components needed for a particular application, depending on the execution environment that will host them (e.g. php web server, desktop application, application server etc).
+
 
+
The execution environment is normally provided by existing systems and can be powered by bare Operating Systems / Virtual Machines (e.g. desktop applications), plain html pages, dynamic web-sites (php, asp, jsp etc), portals, application servers etc.
+
 
+
[[gCube presentation layer]], offers an initial set of components currently running under the JRS168  specification, hosted by GridSphere portlet container , while, apart from gCube core services, it is based on Java and servlets technologies for offering it services.  
+
  
 
== Intended Readership ==
 
== Intended Readership ==
Line 78: Line 16:
  
 
* Those who want to reuse the code – Programmers who will use gCube’s libraries to build their own services and middleware components, without need to access the source code.
 
* Those who want to reuse the code – Programmers who will use gCube’s libraries to build their own services and middleware components, without need to access the source code.
* Those who want to modify/extend the source code – Programmers who will use the platforms source code to enhance it, correct it, adapt it to different environments and applications domains.  
+
* Those who want to modify/extend the source code – Programmers who will use the platforms source code to enhance it, correct it, adapt it to different environments and applications domains.
 +
 
 +
It assumes fluency and familiarity either with with the [https://wiki.gcore.research-infrastructures.eu/gCube/index.php/Main_Page gCore Framework] or with [https://gcube.wiki.gcube-system.org/gcube/index.php/SmartGears gCube SmartGears].
  
 
== Related Documents ==
 
== Related Documents ==
Apart from this Developers Guide, D4SCIENCE has also made available two additional support documents:
+
Apart from this Developers Guide, gCube provides two additional support documents:
* the [[User's Guide]], which provides usage information and guidelines for the end-user of the two user communities that currently exploit the platform, namely ImpECt and ARTE.  
+
* the [[User's Guide]], which provides usage information and guidelines for the end-user.  
* the [[Administrator's Guide]], which provides information and guidelines for the installation, configuration and daily administration of a gCube based computational grid infrastructure.  
+
* the [[Administrator's Guide]], which provides information and guidelines for the installation, configuration and daily administration of a gCube based computational grid infrastructure.
 
+
Additional material that will help potential gCube developers is the
+
* gLite 3.0 Manuals Series User Guide [http://glite.web.cern.ch/glite/documentation/default.asp]
+
* Globus Toolkit 4.0 Developer's Guide [http://www.globus.org/toolkit/docs/4.0/common/javawscore/developer-index.html]
+
 
+
Regarding the architecture and inner details of gCube, the interested reader can visit the official gCube platform web site [http://www.gcube-system.org].
+
 
+
== Lexical Abbreviations ==
+
The following abbreviations are used extensively throughout the document:
+
 
+
{| border="1" cellpadding="5" cellspacing="0" style="text-align:left"
+
| ABE
+
| Annotation Back End
+
 
+
|-
+
| ADL
+
| Advanced Distributed Learning
+
 
+
|-
+
| AFE
+
| Annotation Front End
+
 
+
|-
+
| AIS
+
| Archive Import Service
+
 
+
|-
+
| API
+
| Application Programming Interface
+
 
+
|-
+
| ASL
+
| Application Support Layer
+
 
+
|-
+
| BM
+
| Broker and Matchmaker
+
 
+
|-
+
| BPEL
+
| Business Process Execution Language
+
 
+
|-
+
| CE
+
| Computing Element
+
 
+
|-
+
| CMS
+
| Content Management Service
+
 
+
|-
+
| CORI
+
| Collection Retrieval Inference Network
+
 
+
|-
+
| CS
+
| Compound Service
+
 
+
|-
+
| D4Science
+
| DIstributed colLaboratories Infrastructure on Grid Enabled Technology 4 Science
+
 
+
|-
+
| DIR
+
| Distributed Information Retrieval
+
 
+
|-
+
| DPM
+
| Disk Pool Manager
+
 
+
|-
+
| DTS
+
| Data Transformation Service
+
 
+
|-
+
| EC
+
| European Commission
+
 
+
|-
+
| EPR
+
| EndPoint Reference
+
 
+
|-
+
| gCF
+
| gCube Core Framework
+
 
+
|-
+
| GFAL
+
| Grid File Access Library
+
 
+
|-
+
| gHN
+
| gCube
+
 
+
|-
+
| GPL
+
| General Public Licence
+
 
+
|-
+
| GUI
+
| Graphical User Interface
+
 
+
|-
+
| GUID
+
| Global Unique Identifier
+
 
+
|-
+
| GWT
+
| Google Web Toolkit
+
 
+
|-
+
| IO
+
| Input / Output
+
 
+
|-
+
| IR
+
| Information Retrieval
+
 
+
|-
+
| IS
+
| Information Service
+
 
+
|-
+
| JDBM
+
| Java DataBase Manager
+
 
+
|-
+
| JSP
+
| Java Server Pages
+
 
+
|-
+
| JSR
+
| Java Specification Request
+
 
+
|-
+
| LGPL
+
| Lesser General Public Licence
+
 
+
|-
+
| LMS
+
| Learning Management System
+
 
+
|-
+
| MD5
+
| Message-Digest algorithm 5
+
 
+
|-
+
| OASIS
+
| Organization for the Advancement of Structured Information Standards
+
 
+
|-
+
| PES
+
| Process Execution Service
+
 
+
|-
+
| POS
+
| Process Optimisation Service
+
 
+
|-
+
| SCORM
+
| Shareable Content Object Reference Model
+
 
+
|-
+
| SE
+
| Storage Element
+
 
+
|-
+
| SRM
+
| Storage Resource Manager
+
 
+
|-
+
| URI
+
| Uniform Resource Identifier
+
 
+
|-
+
| URL
+
| Uniform Resource Locator
+
 
+
|-
+
| VO
+
| Virtual Organisation
+
 
+
|-
+
| VOMS
+
| Virtual Organisation Membership Service
+
 
+
|-
+
| VRE
+
| Virtual Research Environment
+
 
+
|-
+
| WS
+
| Web Service
+
 
+
|-
+
| WSRF
+
| Web Service Resource Framework
+
 
+
|-
+
| XENA
+
| eXecution ENgine API
+
 
+
|-
+
| XSL
+
| XML Stylesheet Language
+
 
+
|-
+
| XSLT
+
| XSL Transformations
+
 
+
|}
+
 
+
+
  
 
== Problem Reporting ==
 
== Problem Reporting ==
 
For problem reporting or any other enquiries regarding this document please contact the Support Team (support[[Image:At symbol.gif]]d4science.research-infrastructures.eu).
 
For problem reporting or any other enquiries regarding this document please contact the Support Team (support[[Image:At symbol.gif]]d4science.research-infrastructures.eu).

Latest revision as of 15:45, 21 October 2013

Overview

gCube is a software system specifically conceived to enable the creation and operation of an innovative typology of infrastructure - an Hybrid Data Infrastructure - that by leveraging Grid, Cloud, Digital Library and Service-orientation principles and approaches is delivering a number of data management facilities as-a-Service. One of its distinguishing feature is the orientation to serve the needs of diverse Communities of Practice by providing each of them with a dedicated, flexible, ready-to-use, web-based working environment, named Virtual Research Environment.

gCube offers a feature full platform for distributed hosting, management and retrieval of data and information, and a framework for extending state-of-the-art processing, indexing, selection, fusion, extraction, description, annotation, transformation, and presentation of "data".

The gCube Developer's Guide describes how to develop software components capable of interfacing with gCube to either be part of it or to reuse some of its facilities.

Intended Readership

The document targets two classes of programmers:

  • Those who want to reuse the code – Programmers who will use gCube’s libraries to build their own services and middleware components, without need to access the source code.
  • Those who want to modify/extend the source code – Programmers who will use the platforms source code to enhance it, correct it, adapt it to different environments and applications domains.

It assumes fluency and familiarity either with with the gCore Framework or with gCube SmartGears.

Related Documents

Apart from this Developers Guide, gCube provides two additional support documents:

  • the User's Guide, which provides usage information and guidelines for the end-user.
  • the Administrator's Guide, which provides information and guidelines for the installation, configuration and daily administration of a gCube based computational grid infrastructure.

Problem Reporting

For problem reporting or any other enquiries regarding this document please contact the Support Team (supportAt symbol.gifd4science.research-infrastructures.eu).