Difference between revisions of "Information System"
Luca.frosini (Talk | contribs) (→Architecture) |
Luca.frosini (Talk | contribs) (→Architecture) |
||
(53 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
+ | [[Category: Developer's Guide]] | ||
+ | [[Category: Information System]] | ||
+ | {| align="right" | ||
+ | ||__TOC__ | ||
+ | |} | ||
− | |||
+ | The gCube Information System (IS) has been designed to support Research Infrastructure federation. | ||
== Definition == | == Definition == | ||
+ | Several definitions of Information System (henceforth IS) exist. Each definition aims to capture either a specific role or a specific behavior in systems managing some kind of information. | ||
+ | It is quite common to define an IS as ''"any organized system for the collection, organization, storage and communication of information"''. | ||
− | The | + | The Encyclopaedia Britannica defines an IS as ''"an integrated set of components for '''collecting''', '''storing''', and '''processing data''' and for '''providing information''', knowledge, and digital products"''. |
+ | All the definitions convey on the characteristics of Information. Information consists of data that: | ||
+ | * is '''''accurate''''' and '''''timely''''', | ||
+ | * is specific and '''''organized for a purpose''''', | ||
+ | * is presented '''''within a context''''' that gives it meaning and relevance, | ||
+ | * can increase understanding and '''''decrease uncertainty''''' | ||
− | IS | + | According to the Business Dictionary, an IS is ''"a combination of hardware, software, infrastructure and '''trained personnel''' organized to facilitate planning, control, coordination, and '''decision making in an organization'''"'' |
− | * | + | In this context, trained personnel consists of |
− | + | * human resources | |
− | * | + | * procedures for using, operating, and maintaining the information system |
− | * the policies | + | * set of basic principles and associated guidelines, a.k.a policies, formulated and enforced to direct and limit actions in pursuit of long-term goals. |
+ | |||
+ | Looking at the MIT Press, an IS is ''"a software system to capture, transmit, store, retrieve, and manipulate data '''produced by software systems''' to provide access to information, thereby supporting people, organizations, or '''other software systems'''"''. This definition makes evident that software systems become producer and consumer of the Information System making it at the core of their business activities. | ||
+ | |||
+ | In the context of the research infrastructures <ref>''The term ‘research infrastructures’ refers to facilities, resources and related services used by the scientific community to conduct top-level research in their respective fields, ranging from social sciences to astronomy, genomics to nanotechnologies'' [https://ec.europa.eu/research/infrastructures/index_en.cfm?pg=about https://ec.europa.eu/research/infrastructures/index_en.cfm?pg=about]</ref> and the system of systems, we can define an information system (IS) as: | ||
+ | |||
+ | A software system | ||
+ | * to capture, transmit, store, retrieve, and manipulate data '''produced by software systems''' | ||
+ | * to provide access to information, '''''organized for a purpose and within a contextual domain''''' | ||
+ | ** used, accessed, and maintained according to '''well-known procedures''' operated under the limit of the (evolving) '''organization policies''' | ||
+ | * to support people within an organization and '''other software systems''' | ||
== Requirements == | == Requirements == | ||
+ | |||
+ | The Analysis of the requirements of an IS capable of providing support for Research Infrastructure led to identify the functionality the system has to provide (functional requirements) and the constraint and performances it has to respect (non-functional requirements). | ||
=== Functional Requirements === | === Functional Requirements === | ||
− | * Data Definition Language (DDL) for schemas definition (entities and relations) | + | Functional Requirements have been defined as ''"A requirement that specifies a function that a system or system component must be able to perform"''<ref>IEEE (1990). Standard Glossary of Software Engineer |
− | * Entity and Relation instances must be: | + | ing Terminology. IEEE Standard 610.12-1990.</ref> |
+ | |||
+ | From functional point of view, we identified the following requirements: | ||
+ | |||
+ | * '''Data Definition Language''' ('''DDL''') for schemas definition (entities and relations); | ||
+ | * '''Entity and Relation''' instances must be: | ||
** Univocally identifiable; | ** Univocally identifiable; | ||
** Selective/Partial updatable; | ** Selective/Partial updatable; | ||
** Validated against the Schema. | ** Validated against the Schema. | ||
− | * | + | * '''Referential Integrity''' is a property of data stating references within it are valid.<ref>https://en.wikipedia.org/wiki/Referential_integrity</ref>. A referential integrity constraint is defined as part of an association between two entity types. The purpose of referential integrity constraints is to ensure that valid associations always exist <ref>https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/referential-integrity-constraint</ref>; |
− | * Dynamic Query (no pre-define query) | + | * '''Dynamic Query''' (no pre-define query): Capabilities of a system allowing clients to build their own query and submit it to the system with no long-term impact in the information system. Thinking about relational databases this characteristic seems obvious (provided by SQL). Unfortunately, especially with the new trend of NoSQL, this functionality some type of databases or information system is not present and the query need to be pre-defined; |
− | * Standard Abstraction (desiderata) | + | ** '''Standard Abstraction''' (desiderata) as far as the relational databases respect SQL standard dialect, is a desiderata that the information system supports a standard family of query language; |
− | * Subscription Notification | + | * '''Subscription Notification''' support allows ''"full decoupling of the communicating entities in time, space, and synchronization"'' <ref>Patrick Th. Eugster, Pascal A. Felber, Rachid Guerraoui, and Anne-Marie Kermarrec. 2003. The many faces of publish/subscribe. ACM Comput. Surv. 35, 2 (June 2003), 114-131. DOI=http://dx.doi.org/10.1145/857076.857078</ref> which reflect the nature of loosely coupled nature of distributed interaction in large-scale applications (such as a Research Infrastructure). By providing this functionality we enable the possibility to construct event-based services and to improve the scalability of the system. |
=== Non-Functional Requirements === | === Non-Functional Requirements === | ||
− | * High Availability (HA) | + | Wikipedia defines Non-Functional Requirements as ''"requirements that specify criteria that can be used to judge the operation of a system, rather than specific behaviors"''<ref>https://en.wikipedia.org/wiki/Non-functional_requirement</ref>. Unfortunately, there is no consensus in the scientific community for a non-functional requirements definition. |
− | * Eventual Consistency | + | Martin Glinz <ref> M. Glinz. On non-functional requirements. In Proc. 15th IEEE Int. Requirements Eng. Conf., 2007.</ref> has defined a taxonomy to identify a non-functional requirement. In particular, a non-functional requirement can be: |
− | * Horizontal Scalability | + | '' |
− | * Multi-Tenancy, i.e. a single instance of the technology should be able to serve many “independent” contexts (between the same Application Domain); | + | * An attribute is a performance requirement or a specific quality requirement; |
− | * EUPL licence compatibility | + | ** A performance requirement is a requirement that pertains to a performance concern; |
+ | ** A specific quality requirement is a requirement that pertains to a quality concern other than the quality of meeting the functional requirements. | ||
+ | * A constraint is a requirement that constrains the solution space beyond what is necessary for meeting the given functional, performance, and specific quality requirements. | ||
+ | '' | ||
+ | |||
+ | Under the above mentioned definition and the taxonomy fall: | ||
+ | |||
+ | * '''High Availability (HA)''' is a characteristic of a system, which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period. <ref>https://en.wikipedia.org/wiki/High_availability</ref> | ||
+ | * '''Eventual Consistency''' is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value <ref>Werner Vogels. 2009. Eventually consistent. Commun. ACM 52, 1 (January 2009), 40-44. DOI: https://doi.org/10.1145/1435417.1435432</ref> | ||
+ | * '''Horizontal Scalability'''. Scalability is the capability of a system, network, or process to handle a growing amount of work, or its potential to be enlarged to accommodate that growth. <ref>André B. Bondi. 2000. Characteristics of scalability and their impact on performance. In Proceedings of the 2nd international workshop on Software and performance (WOSP '00). ACM, New York, NY, USA, 195-203. DOI=http://dx.doi.org/10.1145/350391.350432</ref>. To scale horizontally (or scale out/in) means to add more nodes to (or remove nodes from) a system, such as adding a new computer to a distributed software application. | ||
+ | * '''Multi-Tenancy''', i.e. a single instance of the technology should be able to serve many “independent” contexts (between the same Application Domain) <ref>Please note that different Application domain must be managed by completely separated instances of the whole IS. | ||
+ | </ref>; | ||
+ | * EUPL licence compatibility of all its components. | ||
== Architecture == | == Architecture == | ||
− | + | [[File:Information-system-architecture.png | 800px]] | |
− | * [[Facet_Based_Resource_Model#IS_Model | IS Model]] | + | The constituent components are: |
+ | |||
+ | * [[Facet Based Resource Model]] | ||
+ | ** [[Facet_Based_Resource_Model#IS_Model | IS Model]] | ||
+ | ** [[gCube Model]] | ||
* [[Information System Resource Registry]] | * [[Information System Resource Registry]] | ||
− | * [[ | + | ** [[Information System Resource Registry#Resource_Registry Service | Resource Registry Service ]] |
− | * [[Information System Resource | + | ** [[Information System Resource Registry#Resource_Registry_Context_Client | Resource Registry Context Client]] |
− | * Backend Database (i.e. [https://orientdb. | + | ** [[Information System Resource Registry#Resource_Registry_Schema_Client | Resource Registry Schema Client]] |
+ | ** [[Information System Resource Registry#Resource_Registry_Publisher | Resource Registry Publisher]] | ||
+ | ** [[Information System Resource Registry#Resource_Registry_Client | Resource Registry Client]] | ||
+ | * Backend Database (i.e. [https://orientdb.org/ OrientDB] as Graph Database) | ||
+ | * [[Information System Subscription Notification Service]] | ||
+ | |||
+ | == Notes == | ||
+ | |||
+ | <references /> |
Latest revision as of 09:36, 2 July 2021
The gCube Information System (IS) has been designed to support Research Infrastructure federation.
Definition
Several definitions of Information System (henceforth IS) exist. Each definition aims to capture either a specific role or a specific behavior in systems managing some kind of information.
It is quite common to define an IS as "any organized system for the collection, organization, storage and communication of information".
The Encyclopaedia Britannica defines an IS as "an integrated set of components for collecting, storing, and processing data and for providing information, knowledge, and digital products".
All the definitions convey on the characteristics of Information. Information consists of data that:
- is accurate and timely,
- is specific and organized for a purpose,
- is presented within a context that gives it meaning and relevance,
- can increase understanding and decrease uncertainty
According to the Business Dictionary, an IS is "a combination of hardware, software, infrastructure and trained personnel organized to facilitate planning, control, coordination, and decision making in an organization" In this context, trained personnel consists of
- human resources
- procedures for using, operating, and maintaining the information system
- set of basic principles and associated guidelines, a.k.a policies, formulated and enforced to direct and limit actions in pursuit of long-term goals.
Looking at the MIT Press, an IS is "a software system to capture, transmit, store, retrieve, and manipulate data produced by software systems to provide access to information, thereby supporting people, organizations, or other software systems". This definition makes evident that software systems become producer and consumer of the Information System making it at the core of their business activities.
In the context of the research infrastructures [1] and the system of systems, we can define an information system (IS) as:
A software system
- to capture, transmit, store, retrieve, and manipulate data produced by software systems
- to provide access to information, organized for a purpose and within a contextual domain
- used, accessed, and maintained according to well-known procedures operated under the limit of the (evolving) organization policies
- to support people within an organization and other software systems
Requirements
The Analysis of the requirements of an IS capable of providing support for Research Infrastructure led to identify the functionality the system has to provide (functional requirements) and the constraint and performances it has to respect (non-functional requirements).
Functional Requirements
Functional Requirements have been defined as "A requirement that specifies a function that a system or system component must be able to perform"[2]
From functional point of view, we identified the following requirements:
- Data Definition Language (DDL) for schemas definition (entities and relations);
- Entity and Relation instances must be:
- Univocally identifiable;
- Selective/Partial updatable;
- Validated against the Schema.
- Referential Integrity is a property of data stating references within it are valid.[3]. A referential integrity constraint is defined as part of an association between two entity types. The purpose of referential integrity constraints is to ensure that valid associations always exist [4];
- Dynamic Query (no pre-define query): Capabilities of a system allowing clients to build their own query and submit it to the system with no long-term impact in the information system. Thinking about relational databases this characteristic seems obvious (provided by SQL). Unfortunately, especially with the new trend of NoSQL, this functionality some type of databases or information system is not present and the query need to be pre-defined;
- Standard Abstraction (desiderata) as far as the relational databases respect SQL standard dialect, is a desiderata that the information system supports a standard family of query language;
- Subscription Notification support allows "full decoupling of the communicating entities in time, space, and synchronization" [5] which reflect the nature of loosely coupled nature of distributed interaction in large-scale applications (such as a Research Infrastructure). By providing this functionality we enable the possibility to construct event-based services and to improve the scalability of the system.
Non-Functional Requirements
Wikipedia defines Non-Functional Requirements as "requirements that specify criteria that can be used to judge the operation of a system, rather than specific behaviors"[6]. Unfortunately, there is no consensus in the scientific community for a non-functional requirements definition. Martin Glinz [7] has defined a taxonomy to identify a non-functional requirement. In particular, a non-functional requirement can be:
- An attribute is a performance requirement or a specific quality requirement;
- A performance requirement is a requirement that pertains to a performance concern;
- A specific quality requirement is a requirement that pertains to a quality concern other than the quality of meeting the functional requirements.
- A constraint is a requirement that constrains the solution space beyond what is necessary for meeting the given functional, performance, and specific quality requirements.
Under the above mentioned definition and the taxonomy fall:
- High Availability (HA) is a characteristic of a system, which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period. [8]
- Eventual Consistency is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value [9]
- Horizontal Scalability. Scalability is the capability of a system, network, or process to handle a growing amount of work, or its potential to be enlarged to accommodate that growth. [10]. To scale horizontally (or scale out/in) means to add more nodes to (or remove nodes from) a system, such as adding a new computer to a distributed software application.
- Multi-Tenancy, i.e. a single instance of the technology should be able to serve many “independent” contexts (between the same Application Domain) [11];
- EUPL licence compatibility of all its components.
Architecture
The constituent components are:
- Facet Based Resource Model
- Information System Resource Registry
- Backend Database (i.e. OrientDB as Graph Database)
- Information System Subscription Notification Service
Notes
- ↑ The term ‘research infrastructures’ refers to facilities, resources and related services used by the scientific community to conduct top-level research in their respective fields, ranging from social sciences to astronomy, genomics to nanotechnologies https://ec.europa.eu/research/infrastructures/index_en.cfm?pg=about
- ↑ IEEE (1990). Standard Glossary of Software Engineer ing Terminology. IEEE Standard 610.12-1990.
- ↑ https://en.wikipedia.org/wiki/Referential_integrity
- ↑ https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/referential-integrity-constraint
- ↑ Patrick Th. Eugster, Pascal A. Felber, Rachid Guerraoui, and Anne-Marie Kermarrec. 2003. The many faces of publish/subscribe. ACM Comput. Surv. 35, 2 (June 2003), 114-131. DOI=http://dx.doi.org/10.1145/857076.857078
- ↑ https://en.wikipedia.org/wiki/Non-functional_requirement
- ↑ M. Glinz. On non-functional requirements. In Proc. 15th IEEE Int. Requirements Eng. Conf., 2007.
- ↑ https://en.wikipedia.org/wiki/High_availability
- ↑ Werner Vogels. 2009. Eventually consistent. Commun. ACM 52, 1 (January 2009), 40-44. DOI: https://doi.org/10.1145/1435417.1435432
- ↑ André B. Bondi. 2000. Characteristics of scalability and their impact on performance. In Proceedings of the 2nd international workshop on Software and performance (WOSP '00). ACM, New York, NY, USA, 195-203. DOI=http://dx.doi.org/10.1145/350391.350432
- ↑ Please note that different Application domain must be managed by completely separated instances of the whole IS.