Information System
The gCube Information System (IS) has been designed to support Research Infrastructure federation.
Definition
Several definitions of Information System (henceforth IS) exist. Each definition aims to capture either a specific role or a specific behavior in systems managing some kind of information.
It is quite common to define an IS as "any organized system for the collection, organization, storage and communication of information".
The Encyclopaedia Britannica defines an IS as "an integrated set of components for collecting, storing, and processing data and for providing information, knowledge, and digital products".
All the definitions convey on the characteristics of Information. Information consists of data that:
- is accurate and timely,
- is specific and organized for a purpose,
- is presented within a context that gives it meaning and relevance,
- can increase understanding and decrease uncertainty
According to the Business Dictionary, an IS is "a combination of hardware, software, infrastructure and trained personnel organized to facilitate planning, control, coordination, and decision making in an organization" In this context, trained personnel consists of
- human resources
- procedures for using, operating, and maintaining the information system
- set of basic principles and associated guidelines, a.k.a policies, formulated and enforced to direct and limit actions in pursuit of long-term goals.
Looking at the MIT Press, an IS is "a software system to capture, transmit, store, retrieve, and manipulate data produced by software systems to provide access to information, thereby supporting people, organizations, or other software systems". This definition makes evident that software systems become producer and consumer of the Information System making it at the core of their business activities.
In the context of the research infrastructures [1] and the system of systems, we can define an information system (IS) as:
A software system
- to capture, transmit, store, retrieve, and manipulate data produced by software systems
- to provide access to information, organized for a purpose and within a contextual domain
- used, accessed, and maintained according to well-known procedures operated under the limit of the (evolving) organization policies
- to support people within an organization and other software systems
Requirements
The Analysis of the requirements of an IS capable of providing support for Research Infrastructure led to identify the functionality the system has to provide (functional requirements) and the constraint and performances it has to respect (non-functional requirements).
Functional Requirements
Functional Requirements have been defined as "A requirement that specifies a function that a system or system component must be able to perform"[2]
From functional point of view, we identified the following requirements:
- Data Definition Language (DDL) for schemas definition (entities and relations);
- Entity and Relation instances must be:
- Univocally identifiable;
- Selective/Partial updatable;
- Validated against the Schema.
- Referential Integrity is a property of data stating references within it are valid.[3]. A referential integrity constraint is defined as part of an association between two entity types. The purpose of referential integrity constraints is to ensure that valid associations always exist [4];
- Dynamic Query (no pre-define query): Capabilities of a system allowing clients to build their own query and submit it to the system with no long-term impact in the information system. Thinking about relational databases this characteristic seems obvious (provided by SQL). Unfortunately, especially with the new trend of NoSQL, this functionality some type of databases or information system is not present and the query need to be pre-defined;
- Standard Abstraction (desiderata) as far as the relational databases respect SQL standard dialect, is a desiderata that the information system supports a standard family of query language;
- Subscription Notification support allows "full decoupling of the communicating entities in time, space, and synchronization" [5] which reflect the nature of loosely coupled nature of distributed interaction in large-scale applications (such as a Research Infrastructure). By providing this functionality we enable the possibility to construct event-based services and to improve the scalability of the system.
Non-Functional Requirements
Wikipedia defines Non-Functional Requirements as "requirements that specify criteria that can be used to judge the operation of a system, rather than specific behaviors"[6]. Unfortunately, there is no consensus in the scientific community for a non-functional requirements definition. Martin Glinz [7] has defined a taxonomy to identify a non-functional requirement. In particular, a non-functional requirement can be:
- An attribute is a performance requirement or a specific quality requirement;
- A performance requirement is a requirement that pertains to a performance concern;
- A specific quality requirement is a requirement that pertains to a quality concern other than the quality of meeting the functional requirements.
- A constraint is a requirement that constrains the solution space beyond what is necessary for meeting the given functional, performance, and specific quality requirements.
Under the above mentioned definition and the taxonomy fall:
- High Availability (HA) is a characteristic of a system, which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period. [8]
- Eventual Consistency is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value [9]
- Horizontal Scalability. Scalability is the capability of a system, network, or process to handle a growing amount of work, or its potential to be enlarged to accommodate that growth. [10]. To scale horizontally (or scale out/in) means to add more nodes to (or remove nodes from) a system, such as adding a new computer to a distributed software application.
- Multi-Tenancy, i.e. a single instance of the technology should be able to serve many “independent” contexts (between the same Application Domain) [11];
- EUPL licence compatibility of all its components.
Architecture
The constituent components are:
- Facet Based Resource Model
- Information System Resource Registry
- Backend Database (i.e. OrientDB as Graph Database)
- Information System Subscription Notification Service
Notes
- ↑ The term ‘research infrastructures’ refers to facilities, resources and related services used by the scientific community to conduct top-level research in their respective fields, ranging from social sciences to astronomy, genomics to nanotechnologies https://ec.europa.eu/research/infrastructures/index_en.cfm?pg=about
- ↑ IEEE (1990). Standard Glossary of Software Engineer ing Terminology. IEEE Standard 610.12-1990.
- ↑ https://en.wikipedia.org/wiki/Referential_integrity
- ↑ https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/referential-integrity-constraint
- ↑ Patrick Th. Eugster, Pascal A. Felber, Rachid Guerraoui, and Anne-Marie Kermarrec. 2003. The many faces of publish/subscribe. ACM Comput. Surv. 35, 2 (June 2003), 114-131. DOI=http://dx.doi.org/10.1145/857076.857078
- ↑ https://en.wikipedia.org/wiki/Non-functional_requirement
- ↑ M. Glinz. On non-functional requirements. In Proc. 15th IEEE Int. Requirements Eng. Conf., 2007.
- ↑ https://en.wikipedia.org/wiki/High_availability
- ↑ Werner Vogels. 2009. Eventually consistent. Commun. ACM 52, 1 (January 2009), 40-44. DOI: https://doi.org/10.1145/1435417.1435432
- ↑ André B. Bondi. 2000. Characteristics of scalability and their impact on performance. In Proceedings of the 2nd international workshop on Software and performance (WOSP '00). ACM, New York, NY, USA, 195-203. DOI=http://dx.doi.org/10.1145/350391.350432
- ↑ Please note that different Application domain must be managed by completely separated instances of the whole IS.