Difference between revisions of "Information System"

From Gcube Wiki
Jump to: navigation, search
(Architecture)
(Architecture)
 
(35 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
[[Category: Developer's Guide]]
 +
[[Category: Information System]]
 +
{| align="right"
 +
||__TOC__
 +
|}
 +
  
 
The gCube Information System (IS) has been designed to support Research Infrastructure federation.
 
The gCube Information System (IS) has been designed to support Research Infrastructure federation.
  
 
== Definition ==
 
== Definition ==
 +
Several definitions of Information System (henceforth IS) exist. Each definition aims to capture either a specific role or a specific behavior in systems managing some kind of information.
  
 +
It is quite common to define an IS as ''"any organized system for the collection, organization, storage and communication of information"''.
  
''The term ‘research infrastructures’ refers to facilities, resources and related services used by the scientific community to conduct top-level research in their respective fields, ranging from social sciences to astronomy, genomics to nanotechnologies'' [https://ec.europa.eu/research/infrastructures/index_en.cfm?pg=about https://ec.europa.eu/research/infrastructures/index_en.cfm?pg=about]
+
The Encyclopaedia Britannica defines an IS as ''"an integrated set of components for '''collecting''', '''storing''', and '''processing data''' and for '''providing information''', knowledge, and digital products"''.
  
 +
All the definitions convey on the characteristics of Information. Information consists of data that:
 +
* is '''''accurate''''' and '''''timely''''',
 +
* is specific and '''''organized for a purpose''''',
 +
* is presented '''''within a context''''' that gives it meaning and relevance,
 +
* can increase understanding and '''''decrease uncertainty'''''
  
IS: a registry of the infrastructure offering global and partial view of  
+
According to the Business Dictionary, an IS is ''"a combination of hardware, software, infrastructure and '''trained personnel''' organized to facilitate planning, control, coordination, and '''decision making in an organization'''"''
* its resources (e.g. computing, storage, services, software, datasets);
+
In this context, trained personnel consists of  
* their current status (e.g. up and running, available);
+
* human resources
* their relationships with other resources;
+
* procedures for using, operating, and maintaining the information system
* the policies governing their exploitation.
+
* set of basic principles and associated guidelines, a.k.a policies, formulated and enforced to direct and limit actions in pursuit of long-term goals.
 +
 
 +
Looking at the MIT Press, an IS is ''"a software system to capture, transmit, store, retrieve, and manipulate data '''produced by software systems''' to provide access to information, thereby supporting people, organizations, or '''other software systems'''"''. This definition makes evident that software systems become producer and consumer of the Information System making it at the core of their business activities.
 +
 
 +
In the context of the research infrastructures <ref>''The term ‘research infrastructures’ refers to facilities, resources and related services used by the scientific community to conduct top-level research in their respective fields, ranging from social sciences to astronomy, genomics to nanotechnologies'' [https://ec.europa.eu/research/infrastructures/index_en.cfm?pg=about https://ec.europa.eu/research/infrastructures/index_en.cfm?pg=about]</ref> and the system of systems, we can define an information system (IS) as:
 +
 
 +
A software system
 +
* to capture, transmit, store, retrieve, and manipulate data '''produced by software systems'''
 +
* to provide access to information, '''''organized for a purpose and within a contextual domain'''''
 +
** used, accessed, and maintained according to '''well-known procedures''' operated under the limit of the (evolving) '''organization policies'''
 +
* to support people within an organization and '''other software systems'''
  
 
== Requirements ==
 
== Requirements ==
 +
 +
The Analysis of the requirements of an IS capable of providing support for Research Infrastructure led to identify the functionality the system has to provide (functional requirements) and the constraint and performances it has to respect (non-functional requirements).
  
 
=== Functional Requirements ===
 
=== Functional Requirements ===
  
* Data Definition Language (DDL) for schemas definition (entities and relations);
+
Functional Requirements have been defined as ''"A requirement that specifies a function that a system or system component must be able to perform"''<ref>IEEE (1990). Standard Glossary of Software Engineer
* Entity and Relation instances must be:
+
ing Terminology. IEEE Standard 610.12-1990.</ref>
 +
 
 +
From functional point of view, we identified the following requirements:
 +
 
 +
* '''Data Definition Language''' ('''DDL''') for schemas definition (entities and relations);
 +
* '''Entity and Relation''' instances must be:
 
** Univocally identifiable;
 
** Univocally identifiable;
 
** Selective/Partial updatable;
 
** Selective/Partial updatable;
 
** Validated against the Schema.
 
** Validated against the Schema.
* Referential Integrity;
+
* '''Referential Integrity''' is a property of data stating references within it are valid.<ref>https://en.wikipedia.org/wiki/Referential_integrity</ref>. A referential integrity constraint is defined as part of an association between two entity types. The purpose of referential integrity constraints is to ensure that valid associations always exist <ref>https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/referential-integrity-constraint</ref>;
* Dynamic Query (no pre-define query);
+
* '''Dynamic Query''' (no pre-define query): Capabilities of a system allowing clients to build their own query and submit it to the system with no long-term impact in the information system. Thinking about relational databases this characteristic seems obvious (provided by SQL). Unfortunately, especially with the new trend of NoSQL, this functionality some type of databases or information system is not present and the query need to be pre-defined;
* Standard Abstraction (desiderata);
+
** '''Standard Abstraction''' (desiderata) as far as the relational databases respect SQL standard dialect, is a desiderata that the information system supports a standard family of query language;
* Subscription Notification Support.
+
* '''Subscription Notification''' support allows ''"full decoupling of the communicating entities in time, space, and synchronization"'' <ref>Patrick Th. Eugster, Pascal A. Felber, Rachid Guerraoui, and Anne-Marie Kermarrec. 2003. The many faces of publish/subscribe. ACM Comput. Surv. 35, 2 (June 2003), 114-131. DOI=http://dx.doi.org/10.1145/857076.857078</ref> which reflect the nature of loosely coupled nature of distributed interaction in large-scale applications (such as a Research Infrastructure). By providing this functionality we enable the possibility to construct event-based services and to improve the scalability of the system.
  
 
=== Non-Functional Requirements ===
 
=== Non-Functional Requirements ===
  
* High Availability (HA);
+
Wikipedia defines Non-Functional Requirements as ''"requirements that specify criteria that can be used to judge the operation of a system, rather than specific behaviors"''<ref>https://en.wikipedia.org/wiki/Non-functional_requirement</ref>. Unfortunately, there is no consensus in the scientific community for a non-functional requirements definition.
* Eventual Consistency
+
Martin Glinz <ref> M. Glinz. On non-functional requirements. In Proc. 15th IEEE Int. Requirements Eng. Conf., 2007.</ref> has defined a taxonomy to identify a non-functional requirement. In particular, a non-functional requirement can be:
* Horizontal Scalability;
+
''
* Multi-Tenancy, i.e. a single instance of the technology should be able to serve many “independent” contexts (between the same Application Domain) <ref>Please note that different Application domain must be managed by completely separated instances of the whole IS.
+
* An attribute is a performance requirement or a specific quality requirement;
 +
** A performance requirement is a requirement that pertains to a performance concern;
 +
** A specific quality requirement is a requirement that pertains to a quality concern other than the quality of meeting the functional requirements.
 +
* A constraint is a requirement that constrains the solution space beyond what is necessary for meeting the given functional, performance, and specific quality requirements.
 +
''
 +
 
 +
Under the above mentioned definition and the taxonomy fall:
 +
 
 +
* '''High Availability (HA)''' is a characteristic of a system, which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period. <ref>https://en.wikipedia.org/wiki/High_availability</ref>
 +
* '''Eventual Consistency''' is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value <ref>Werner Vogels. 2009. Eventually consistent. Commun. ACM 52, 1 (January 2009), 40-44. DOI: https://doi.org/10.1145/1435417.1435432</ref>
 +
* '''Horizontal Scalability'''. Scalability is the capability of a system, network, or process to handle a growing amount of work, or its potential to be enlarged to accommodate that growth. <ref>André B. Bondi. 2000. Characteristics of scalability and their impact on performance. In Proceedings of the 2nd international workshop on Software and performance (WOSP '00). ACM, New York, NY, USA, 195-203. DOI=http://dx.doi.org/10.1145/350391.350432</ref>. To scale horizontally (or scale out/in) means to add more nodes to (or remove nodes from) a system, such as adding a new computer to a distributed software application.
 +
* '''Multi-Tenancy''', i.e. a single instance of the technology should be able to serve many “independent” contexts (between the same Application Domain) <ref>Please note that different Application domain must be managed by completely separated instances of the whole IS.
 
</ref>;
 
</ref>;
 
* EUPL licence compatibility of all its components.
 
* EUPL licence compatibility of all its components.
Line 41: Line 82:
 
[[File:Information-system-architecture.png | 800px]]
 
[[File:Information-system-architecture.png | 800px]]
  
The constituent Components are:
+
The constituent components are:
  
* [[Facet_Based_Resource_Model#IS_Model | IS Model]]
+
* [[Facet Based Resource Model]]
* [[Facet_Based_Resource_Model#gCube_Model | gCube Model]]
+
** [[Facet_Based_Resource_Model#IS_Model | IS Model]]
 +
** [[gCube Model]]
 
* [[Information System Resource Registry]]
 
* [[Information System Resource Registry]]
* [[Information System Resource Registry | Resource Registry Client]]
+
** [[Information System Resource Registry#Resource_Registry Service | Resource Registry Service ]]
* [[Information System Resource Registry | Resource Registry Publisher]]
+
** [[Information System Resource Registry#Resource_Registry_Context_Client | Resource Registry Context Client]]
* [[Information System Subscription Notification]]
+
** [[Information System Resource Registry#Resource_Registry_Schema_Client | Resource Registry Schema Client]]
* Backend Database (i.e. [https://orientdb.com/ OrientDB]  as Graph Database)
+
** [[Information System Resource Registry#Resource_Registry_Publisher | Resource Registry Publisher]]
 +
** [[Information System Resource Registry#Resource_Registry_Client | Resource Registry Client]]
 +
* Backend Database (i.e. [https://orientdb.org/ OrientDB]  as Graph Database)
 +
* [[Information System Subscription Notification Service]]
  
 
== Notes ==
 
== Notes ==
  
 
<references />
 
<references />

Latest revision as of 09:36, 2 July 2021


The gCube Information System (IS) has been designed to support Research Infrastructure federation.

Definition

Several definitions of Information System (henceforth IS) exist. Each definition aims to capture either a specific role or a specific behavior in systems managing some kind of information.

It is quite common to define an IS as "any organized system for the collection, organization, storage and communication of information".

The Encyclopaedia Britannica defines an IS as "an integrated set of components for collecting, storing, and processing data and for providing information, knowledge, and digital products".

All the definitions convey on the characteristics of Information. Information consists of data that:

  • is accurate and timely,
  • is specific and organized for a purpose,
  • is presented within a context that gives it meaning and relevance,
  • can increase understanding and decrease uncertainty

According to the Business Dictionary, an IS is "a combination of hardware, software, infrastructure and trained personnel organized to facilitate planning, control, coordination, and decision making in an organization" In this context, trained personnel consists of

  • human resources
  • procedures for using, operating, and maintaining the information system
  • set of basic principles and associated guidelines, a.k.a policies, formulated and enforced to direct and limit actions in pursuit of long-term goals.

Looking at the MIT Press, an IS is "a software system to capture, transmit, store, retrieve, and manipulate data produced by software systems to provide access to information, thereby supporting people, organizations, or other software systems". This definition makes evident that software systems become producer and consumer of the Information System making it at the core of their business activities.

In the context of the research infrastructures [1] and the system of systems, we can define an information system (IS) as:

A software system

  • to capture, transmit, store, retrieve, and manipulate data produced by software systems
  • to provide access to information, organized for a purpose and within a contextual domain
    • used, accessed, and maintained according to well-known procedures operated under the limit of the (evolving) organization policies
  • to support people within an organization and other software systems

Requirements

The Analysis of the requirements of an IS capable of providing support for Research Infrastructure led to identify the functionality the system has to provide (functional requirements) and the constraint and performances it has to respect (non-functional requirements).

Functional Requirements

Functional Requirements have been defined as "A requirement that specifies a function that a system or system component must be able to perform"[2]

From functional point of view, we identified the following requirements:

  • Data Definition Language (DDL) for schemas definition (entities and relations);
  • Entity and Relation instances must be:
    • Univocally identifiable;
    • Selective/Partial updatable;
    • Validated against the Schema.
  • Referential Integrity is a property of data stating references within it are valid.[3]. A referential integrity constraint is defined as part of an association between two entity types. The purpose of referential integrity constraints is to ensure that valid associations always exist [4];
  • Dynamic Query (no pre-define query): Capabilities of a system allowing clients to build their own query and submit it to the system with no long-term impact in the information system. Thinking about relational databases this characteristic seems obvious (provided by SQL). Unfortunately, especially with the new trend of NoSQL, this functionality some type of databases or information system is not present and the query need to be pre-defined;
    • Standard Abstraction (desiderata) as far as the relational databases respect SQL standard dialect, is a desiderata that the information system supports a standard family of query language;
  • Subscription Notification support allows "full decoupling of the communicating entities in time, space, and synchronization" [5] which reflect the nature of loosely coupled nature of distributed interaction in large-scale applications (such as a Research Infrastructure). By providing this functionality we enable the possibility to construct event-based services and to improve the scalability of the system.

Non-Functional Requirements

Wikipedia defines Non-Functional Requirements as "requirements that specify criteria that can be used to judge the operation of a system, rather than specific behaviors"[6]. Unfortunately, there is no consensus in the scientific community for a non-functional requirements definition. Martin Glinz [7] has defined a taxonomy to identify a non-functional requirement. In particular, a non-functional requirement can be:

  • An attribute is a performance requirement or a specific quality requirement;
    • A performance requirement is a requirement that pertains to a performance concern;
    • A specific quality requirement is a requirement that pertains to a quality concern other than the quality of meeting the functional requirements.
  • A constraint is a requirement that constrains the solution space beyond what is necessary for meeting the given functional, performance, and specific quality requirements.

Under the above mentioned definition and the taxonomy fall:

  • High Availability (HA) is a characteristic of a system, which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period. [8]
  • Eventual Consistency is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value [9]
  • Horizontal Scalability. Scalability is the capability of a system, network, or process to handle a growing amount of work, or its potential to be enlarged to accommodate that growth. [10]. To scale horizontally (or scale out/in) means to add more nodes to (or remove nodes from) a system, such as adding a new computer to a distributed software application.
  • Multi-Tenancy, i.e. a single instance of the technology should be able to serve many “independent” contexts (between the same Application Domain) [11];
  • EUPL licence compatibility of all its components.

Architecture

Information-system-architecture.png

The constituent components are:

Notes

  1. The term ‘research infrastructures’ refers to facilities, resources and related services used by the scientific community to conduct top-level research in their respective fields, ranging from social sciences to astronomy, genomics to nanotechnologies https://ec.europa.eu/research/infrastructures/index_en.cfm?pg=about
  2. IEEE (1990). Standard Glossary of Software Engineer ing Terminology. IEEE Standard 610.12-1990.
  3. https://en.wikipedia.org/wiki/Referential_integrity
  4. https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/referential-integrity-constraint
  5. Patrick Th. Eugster, Pascal A. Felber, Rachid Guerraoui, and Anne-Marie Kermarrec. 2003. The many faces of publish/subscribe. ACM Comput. Surv. 35, 2 (June 2003), 114-131. DOI=http://dx.doi.org/10.1145/857076.857078
  6. https://en.wikipedia.org/wiki/Non-functional_requirement
  7. M. Glinz. On non-functional requirements. In Proc. 15th IEEE Int. Requirements Eng. Conf., 2007.
  8. https://en.wikipedia.org/wiki/High_availability
  9. Werner Vogels. 2009. Eventually consistent. Commun. ACM 52, 1 (January 2009), 40-44. DOI: https://doi.org/10.1145/1435417.1435432
  10. André B. Bondi. 2000. Characteristics of scalability and their impact on performance. In Proceedings of the 2nd international workshop on Software and performance (WOSP '00). ACM, New York, NY, USA, 195-203. DOI=http://dx.doi.org/10.1145/350391.350432
  11. Please note that different Application domain must be managed by completely separated instances of the whole IS.