Integration and Interoperability Facilities Framework: Client Libraries Design Model
Contents
- 1 Objective
- 2 Design Model for Client Libraries
- 3 Management Model For Client Libraries
Objective
The scope of the activities can be confined by determining how Client Libraries and Clients are perceived within this framework layer and by defining the goals targeted within its evolution.
Client Libraries
Work withing CL framework focuses on a subset of the client libraries found within the system, those that mediate access to some of the system services. The objective of the task does not involve the evolution of services, nor of client libraries that offer functions other than access to services. For convenience, the reference to client libraries within the sphere of influence of the task is made as CLs.
Clients
The framework targets clients written in Java. It is expected that most such clients will be other components within the system but the framework will address also external clients that may find it convenient to use the CLs over generic REST/WS client libraries. In either case, zero assumptions are made on the clients, allowing them to range from pure clients (standalone applications within a dedicated JVM) to other managed services that run within some container.
Goals
The task aims at promoting consistency across CLs in all aspects that transcend the semantics of individual target services. For each cross-cutting concern the steps towards the framework integration are as follows:
- Identify best practices
- Codify practices in guidelines
- Document guidelines
- Monitor the adoption of guidelines across the CLs
Models
The framework distinguishes concerns that relate to CL design from those that relate to CL management and evolves two separate models for their structuring: the Design Model and the Management Model.
- Design Model: The Design Model for CL addresses cross-cutting design concerns within the system libraries, that include at least the following issues: scoped calls (how scope information is to be added to client calls), secure calls (how security information is to be added to client calls), endpoint management (how services ought to be referred to, discovered, and selected), addressing, discovery, replica management, caching, asynchronous operations (how asynchronous operations ought to be implemented), callbacks, futures, notifications, streamed/bulk operations (how streamed/bulk operations ought to be implemented), fault handling: how should faults be handled.
- Throughout these concerns, driving design principles are: simplicity, testability, evolvability and, where appropriate, standards compliance.
- Consistency is more readily and conveniently achieved through shared implementations of common solutions. In this sense, the work within the framework evolution will also be concerned with the delivery of new system components that support the development of CLs. It is expected that these Support Libraries will form a framework for CL development.
- Management Model: The model for CL management will address at least the following (inter-related) issues:
- module structure: relationship between CL modules, stub modules, and service modules
- build outputs: what secondary artifacts are associated with CLs
- release cycle: how are CLs released with respect to target services
- change management: how changes in target service API should be handled
- profiling and deployment: how should CLs be profiled for dynamic deployment
- distribution: how should CLs be packaged for distribution
Design Model for Client Libraries
Let foo
be a service within the system. In what follows, we discuss the design of a client-side API for foo
. In the process, we outline a generic model for similar APIs based on a small number of classes and interfaces. These compilation units are found in a org.gcube.common.clients.api
package and are implemented in a common-clients-api
library. Since we expect the client APIs to be packaged as libraries, we refer to them in the following simply as CLs.
Assumptions and Terminology
We work under the following assumptions and using the following terminology:
- services:
foo
is an HTTP service, in that it uses HTTP at least as its transport protocol [1]. At the time of writing, all system services are more specifically WS RPC services, i.e. use SOAP over HTTP to invoke service-specific APIs. In the future, system services may also be REST services, in the broad sense of stateless services that use HTTP as their application protocol.
- publication & discovery: deploying and starting
foo
at a given network address yields afoo
endpoint [2]. Deployment may be static or, more commonly within the system, dynamic. In both cases, clients are neither coded nor configured with static knowledge offoo
endpoints. Rather, they obtain them at runtime by sending queries to discovery services available within the system. Symmetrically,foo
is coded or configured to send information about its endpoints to publishing services available within the system;
- state:
foo
may be stateless, in that its endpoints may not maintain any form of state on behalf of clients. More commonly, however,foo
is stateful, i.e. its endpoints encapsulate data which they use and change upon processing client requests. At the time of writing, no service within the system is conversational, i.e. maintains the state of ongoing sessions with individual clients. Rather,foo
endpoints encapsulate longer-lived datasets on behalf of open-ended classes of clients [3]. Whenfoo
is a stateful SOAP service, in particular, its endpoints can be modelled as collections of one or more service instances, where all instances have the API offoo
but are bound to different datasets [4]. Afoo
instance is typically created by an endpoint of a companion factory service, e.g. afoo-factory
service, and its logical address is the address of the correspondingfoo
endpoint qualified with a reference to the dataset bound to the instance.foo-factory
is coded or configured to publish the addresses of foo instances, along with any properties of their bound datasets whereby the instances may be discovered by clients.
- replication:
foo
may be deployed at multiple network addresses for increased availability and scalability. The multiple endpoints may be deployed within a single administrative domain (site), and kept hidden behind a single public endpoint which invokes them with load-balancing and fault-tolerant strategies (single-site replication). Alternatively, they may be deployed at multiple sites and be independently published; Load-balancing is then performed at query-time by discovery services (multi-site replication) [5]. Iffoo
is stateful, it may replicate its endpoints without replicating its instances, i.e. differentfoo
endpoints collect different instances (endpoint replication). This separates workloads on different instances, but it does help with balancing the load across instances and it does increase their availability. Alternatively,foo
may employ mechanisms to directly replicate its instances, hence their state (instance replication). Immutable state may be replicated when instances are created. Alternatively,foo
may be designed to autonomically create replicas of instances at existing endpoints (e.g. using subscription services and notification services available within the system). Mutable state is synchronised over time across instances, with variable guarantees of consistency (e.g. partial, eventual). Alternatively, mutable state can be shared across instances via remote references into networked file-systems, databases, or access/storage services. The references can then be replicated as immutable state. In multi-site replication, clients discover replicas with queries for references to shared state.
- scoped requests:
foo
endpoints may operate in multiple scopes, where each scope partitions the local resources that are visible to its clients, as well as the remote resources that are visible to the operations that the endpoint executes on behalf of its clients. In particular, the operations of the endpoint may result in the creation of state in a given scope, locally to the endpoint or else remotely to it, by interaction with other services that create state on behalf of the endpoint and/or its clients. Service scoping requires that requests to the endpoints are scoped, marked with the scope within which they are intended to occur. Unscoped requests or requests made outside one of the endpoint’s scopes are rejected by it.
- secure requests:
foo
endpoints may perform a range of authentication and authorisation checks, including scope checks, in order to restrict access to its operations to distinguished clients. Service security requires that requests made to the endpoints be marked with adequate credentials. Unsecure requests or secure requests that fail authorisation checks are rejected by the endpoints.
- clients: a client of
foo
may be internal to the system (i.e. a system component in turn) or external to it. Clients often operate within a dedicated runtime, and in this case we refer to them as pure clients. In other cases, they share a common runtime and, likefoo
, they may be managed by some container. In particular, clients may be services in turn, and in this case we refer to them as a client services. Whether pure or services, clients make a number of calls tofoo
over their lifetime. Some of these calls may share the same scope and contribute to a broader task. In this case we say that the calls form a session and that the client is session-based, otherwise we say that the client is session-less[6]. Whether session-based or session-less, client calls may all occur in the same scope or else span multiple scopes. Single-scope clients are normally pure, whereas multi-scope clients tend to be client services.
- faults: clients that interact with
foo
endpoints may observe a wide range of failures, including: failures that occur in the client runtime, before remote calls tofoo
endpoints are issued; failures that occur in the attempt to communicate withfoo
endpoints; failures that occur in the runtime offoo
endpoints. We distinguish between the following types of failures:- errors are violations of the contract defined by
foo
which can imputed to faulty code or faulty configuration. Malformed inputs are examples of client-side errors, while bugs in the implementation offoo
are examples of service-side errors; - contingencies are predictable violations of contract pre-conditions. There may be no bugs in either client or service code, but the
foo
endpoint is in a state that prevents it to carry out the client’s request. Data that cannot be found or cannot be created are examples of contingencies; - outages are I/O failures of the external environment and include network failures, database failures and disk failures.
- errors are violations of the contract defined by
- If
foo
is designed for multi-site replication, contingencies and outcomes acquire implicitly additional semantics. We say that one such failure has retry-equivalent semantics, if it does not exclude successful interaction with otherfoo
endpoints orfoo
instances. We say otherwise that the failure is fatal for the interaction, i.e. no otherfoo
endpoint will be able to process the request successfully. Unavailable endpoints and other forms of connection timeouts are retry-equivalent outages, while lack of connectivity at the client-side is a fatal outage.
- ↑ teminology: services have been often described within the system as collections of one or more “port-types”, following the terminology endorsed by WSDL 1.x standards, and then abandoned in WSDL 2.x standards. For its wider adoption and technological independence, we prefer here to follow common terminology whereby a WSDL port-type defines a service in its own right.
- ↑ teminology: the term “running instance” has been used within the system to indicate the deployment and activation of a service at a given network address. We prefer here the term “service endpoint” to avoid confusion with "service instance", which is more commonly associated with stateful services
- ↑ teminology: statefulness is often understood only in relation to conversational state. Under this view, all system services are currently stateless. We prefer to to interpret the notion of statefulness so as to include back-end state, because all forms of state have related implications under service replication. Under this broader view, most system services are stateful.
- ↑ teminology: the system has traditionally used a different terminology for its stateful services. Service instances are called "WS-Resources", as WSRF is the set of standards with which they are uniformly exposed at the time of writing. We prefer here the term “service instance” for its wider usage. Note also that WS-Resources use WS-Lifetime, WS-ResourceProperties and WS-Notification protocols to expose, respectively, lifetime operations, the values of distinguished properties of their state, and subscriptions for/notifications of changes to the values of those properties. Some services capitalise on these standards and become stateful even when they expose a single instance. These stateful services are known as singleton services.
- ↑ Single-site replication is common in enterprise computing and normally requires static deployment and configuration. Multi-site replication reconciles with dynamic deployment and is more common within the system.
- ↑ The use of sessions client-side does not imply that
foo
endpoints maintains state for them, i.e. thatfoo
is a conversational service. In fact, it does not even imply that calls within a sessions are processed by a singlefoo
endpoint.
Goals and Principles
Within the previous assumptions, our model is motivated by a goal of consistency across different client APIs. In particular, the model will:
- decrease the overall learning curve associated with using the system;
- increase CL quality via sharing of best design practices;
- decrease CL development costs, both first-time costs and maintenance costs, via shared libraries;
- decrease CL documentation costs by reference to shared patterns and design components;
To achieve our goals, we base the model on a set of design principles. In no particular order, these include:
- generality: the model will endorse design solutions that do not limit its applicability to the range of services and clients outlined above;
- coverage: the model will address a wide range of issues that transcend the semantics of individual services, including scoping issues, security issues, replica discovery and management issues, and fault management issues;
- transparency: the model will endorse design solutions that simplify client usage, particularly with respect to requirements that are specific to our system;
- testability: the model will not endorse design solutions that reduce or unduly complicate the possibility of unit testing for clients;
Service Proxies
The design approach we consider for CLs is service-centric [1].
foo
is represented in client code with a single abstraction and clients invoke its methods to interact with remote foo
endpoints or foo
instances [2].
In common jargon, this abstraction is understood as a service proxy [3].
The proxy for foo
is defined by an interface:
interface Foo {...}
The interface lists methods used to interact with service endpoints, and the methods are implemented in a class DefaultFoo
to be used in production:
class '''DefaultFoo''' implements Foo {...}
The interface encourages clients to separate the use of proxies from their instantiation, without expecting clients to write adaptors for this purpose. A client component may use an injected proxies created elsewhere in client code. During testing, the component may be injected with a fake implementation of Foo which produces outputs and failures as required to drive the tests, e.g. a mock implementation or a stubbed implementation. Alternatively, the component may lookup proxies from a factory, and in this case it is the factory that may be configured during test setup so as to return fakes.
There are many well-known ways to design client components based on dependency injection and lookup (constructor injection, setter injection, manual injection, container-managed injection, concrete factories, abstract factories, ...). In all cases the availability of an interface enables clients to test their code independently from the network.
- ↑ An alternative approach is operation-centric, in that the service is represented indirectly by local models of the operations that comprise its API. We choose a service-centric approach for the familiarity of its programming model, and because it is simpler to implement and use against large service APIs.
- ↑ In the following, we avoid unnecessary distinctions between endpoints and instances, and use the term endpoint to refer to both.
- ↑ terminology: technically, we are dealing with a service façade rather than a proxy. This is because its API may differ substantially from the API of the service, as we discuss in detail later. We choose nonetheless the term proxy because it is more widely understood.
Proxy Lifetime
Proxies may be instantiated in either one of two modes:
- in direct mode, the proxies are bound to endpoints explicitly addressed by clients. This mode serves clients that obtain addressing information from interactions with other APIs. It may also be used to point tools towards statically known endpoints, or else during integration testing, typically to interact with endpoints deployed on local hosts.
- in discovery mode, the proxies are configured with a query for endpoints provided by clients. They are then responsible for submitting the query to the directory services of the system on behalf of clients, and for negotiating bindings to the resulting endpoint(s). This mode serves clients that have information which characterise the target endpoints and from which addressing information can derived.
The binding mode of proxies is defined with a BindingMode
instance, which carries configuration directives that control how proxies mediate access to the bound endpoint(s). By encapsulating configuration in a dedicated object, the CL simplifies the construction of proxies, as well as the evolution of their configuration.
BindingMode
is an abstract and package-protected class that gathers configuration common to both binding modes. Placed within the same package as DefaultFoo
, it expects mandatory configuration in constructors, offers public setters for default overrides, and exposes package-protected getters towards proxies.
For example, BindingMode
is defined as:
abstract class BindingMode { public void setTimeout(long timeout, TimeUnit unit) {...} long timeout() {...} }
Two concrete subclasses of BindingMode
, DirectMode
and DiscoveryMode
, carry instead mode-specific configuration:
public class DirectMode extends BindingMode {...} public class DiscoveryMode extends BindingMode {...}
We discuss below how clients instantiate DirectMode
and DiscoveryMode
. Clients pass DirectMode
and DiscoveryMode
instances to the only two public constructors of DefaultFoo
:
public DefaultFoo(DirectMode mode) {...} public DefaultFoo(DiscoveryMode mode) {...}
The following holds true:
- instantiation is a local operation. Calls to the bound endpoints will be issued only when clients invoke the
Foo
methods implemented by the instances;
- the lifetime of proxies terminates when it becomes eligible for garbage collection. In this respect, proxies behaves like standard Java objects and do not require any explicit termination signal from clients.
- the CL makes no assumption on the lifetime of proxies and allows proxies with:
- call lifetime: begins before a call to a
foo
endpoint and terminates immediately thereafter; - session lifetime: begins before the first session call to a
foo
endpoint, and terminates after the last session call to the same or anotherfoo
endpoint; - global lifetime: begins before the first call to a
foo
endpoint and terminates when the client does.
- call lifetime: begins before a call to a
- Thus clients may dedicate proxies to different calls, or else reuse the same proxy for an arbitrary number calls to
foo
endpoints, in multiple scopes and/or across multiple sessions.
- the flexibility discussed in the previous point does not come at the expense of safe and efficient bindings. There are two exceptions, however. The first is that proxies created in direct mode may only be safely used for calls in one of the scopes of their bound endpoints. Client that operate in multiple scopes should avoid reusing such proxies to call
foo
endpoints in different scopes. The second exception concerns proxies created in discovery mode for a stateful foo, and we will discuss it in more details below.
- since clients may reuse proxies, these retain only their configuration and treat it for the most part as immutable state. The CL gives this guarantee by making the configuration immutable, or by cloning the configuration with which proxies are instantiated. Proxies offer no methods to change their initial configuration. Immutability requirements may be relaxed for default overrides, such as timeouts, which clients may change after proxy instantiation on a per-call basis. This is particularly useful when proxies have session lifetime or global lifetime, i.e. subsume a number of calls which may require different configuration tuning. Clients are responsible for maintaining configuration changes across proxy lifetimes.
- since proxies are mostly immutable, clients may safely use any proxy from multiple threads. The CL is responsible for synchronising access to configuration components that may change, such as default overrides.
Direct Mode
DirectMode
has one or more public constructors that take the address of a foo
endpoint or, depending on the design of the service, a reference to a foo
instance available at a given endpoint.
The resulting DirectMode
instance can be used to create proxies that are bound for their entire lifetime to the addressed endpoint, i.e. cannot be used as proxies for other foo
endpoints.
Endpoint Addresses
If foo
is a REST service, a stateless WS service, or singleton WS service, the address of its endpoints can be univocally derived by the name and port of their network hosts. DirectMode
complements this information with service-specific constants and obtains the complete address of the endpoint (e.g context paths). DirectMode
validates the complete address and raise issues of well formed-ness with an IllegalArgumentException
.
DirectMode(String host, int port) throws IllegalArgumentException {...}
DirectMode
may also be instantiated with a java.net.URL
which subsumes the required addressing information:
DirectMode(URL address) throws IllegalArgumentException {...}
This constructor is used when clients obtain endpoint addresses from other APIs. DirectMode
remains responsible for validating or complementing the address for the target service. It may also be responsible for translating the address in the model expected by lower-level communication APIs which proxies may use in turn.
Endpoint References
If foo
is a stateful WS service, DirectMode
has a constructor that accepts host coordinates as well as an instance identifier. The API will solicit the identifier under the semantics which is most appropriate to service instances (e.g. sourceId
if instances encapsulate state about some data source):
DirectMode(String host, int port, String id) throws IllegalArgumentException {...}
DirectMode
has also a second constructor that accepts a javax.xml.ws.wsaddressing.W3CEndpointReference
whose reference parameters identify a service instance at a given address.
DirectMode(W3CEndpointReference reference) throws IllegalArgument Exception {...}
As above, DirectMode
is responsible for validating the reference and for translating it into the addressing model of any lower-level communication API that proxies may use in turn (e.g. EndpointReferenceType
in Axis’ generated stubs API).
Discovery Mode
DiscoveryMode
is instantiated with the information required to synthesise a query for foo
endpoints or foo
instances. DiscoveryMode
instances may then be used to create proxies that attempt to bind to query results.
If foo
is stateless, a default constructor suffices because the query is statically known (in no case should clients be exposed to service constants such as service class and name):
DiscoveryMode() {...}
If foo
is stateful, however, the query depends on state-related properties of the target instances, and different clients may need different queries. To ease evolution, the CL models queries with dedicated FooQuery
instances and DiscoveryMode
exposes a single constructor that takes such instances:
DiscoveryMode(FooQuery query) {...}
Instance Queries
FooQuery
contains no explicit reference to the concrete query syntax which the discovery services used by proxies may require, nor any reference to their query submission API. DefaultFoo
instances are responsible for synthesising a concrete query from the properties of foo
instances which are specified by the client.
The following holds true about queries:
- if clients create multiple
DiscoveryMode
instances, they may create multiple query instances or else share a single instance. Like for proxies, the CL makes no assumption on the lifetime of individual queries;
- since clients may reuse them, queries retain only immutable state. The instance properties specified for queries cannot be altered after their creation. The CL gives this guarantee by making queries immutable, and the queries offer no methods to change the instance properties with which they have been created.
At its simplest, queries may be immutable beans, e.g.:
FooQuery query = new FooQuery(...);
If there are many possibilities for query customisation, queries may declare a package-protected interface of mutators and expose it to a builder class, or to more sophisticated forms of fluent APIs. As an example:
class FooQueryBuilder { ..... public static FooQueryBuilder query() { return new FooQueryBuilder(); } public forXXX(....) {...} public withYYY(....) {...} ... public FooQuery build() {....access package-protected constructor...} } ... FooQuery query = query().forXXX(...).withYYY().....build();
Endpoint Management
Proxies attempt to bind to the service endpoints that satisfy the query with which they have been instantiated. They do so combining:
- a binding strategy: if
foo
uses multi-site replication for its endpoints or instances, queries may return more than one result. The discovery services will order results in order of increasing load and the proxies exploit the availability of multiple endpoints towards fault tolerance, i.e. to increase the chances of a successful call;
- a caching strategy: issuing queries to discovery services adds costs to client interactions with
foo
, and it is the responsibility of proxy instances to reduce these costs whenever possible. In the lack of optimisations, the proxies would issue queries before each and every call. Under full optimisation, the proxies issue queries only when they really have to, possibly never.
The binding strategy of proxies is driven by the range of faults discussed above. In particular it is defined by the following rules:
-
BR1
: submit the query with the directory services in the current scope and process the list discovered endpoints as follows:
-
BR2
: if the current endpoint raises a retry-equivalent failure (contingency or outage) then attempt to bind the next endpoint if one exists, otherwise return the failure; -
BR3
: if the current binding raise an unrecoverable contingency, an error, or a general outage then return it;
-
-
BR4
: log all the previous actions atINFO
level.
Notice that proxies may not always able to determine fault semantics. The semantics of contingencies is usually known to proxies, but differences between errors and outages, and between retry-equivalent outages and general outages may not transpire through the underlying communication APIs. Ambiguous cases should be dealt proportionally to their number and nature. If the API can disambiguate most faults, then an optimistic approach is appropriate and the proxies should default to applying BR2
. If instead the API does not provide enough information, then an optimistic approach is more indicated and the proxies should default to applying BR3
.
- the default number of retry attempts required in
BR3
may be overridden by clients by invoking thesetMaxRetries(int)
method.
The caching strategy of proxies is defined by the following rules:
-
CR1
: record the address of a bound endpoint, the so-called Last Good Endpoint (LGE), in a scope-indexed and query-indexed cache shared by all proxies;
-
CR2
: if the LGE is defined, attempt to bind; -
CR3
: if the LGE raises a failure, the remove it from the cache and:
-
CR4
: if the failure is an unrecoverable contingency or an outage then return the failure; -
CR5
: if the failure is a retry-equivalent failure (contingency or outage) then applyBR1
but excludes the LGE from the list of results;
-
-
CR5
: logs of all previous actions at DEBUG level;
Since the LGE cache is shared across proxy instances, a new instance may find an LGE in it and apply CR2
before BR1
, i.e. avoid query submission altogether. This optimisation is safe only the LGE is a plausible result of the query defined by the new instance, hence the requirement of a cache indexed by scopes and queries.
Notice that:
- since queries are keys into cache, they may need to be value objects, i.e. implement
hashcode()
andequals()
towards a notion of equivalence. This is not necessary whenfoo
is stateless, provided that the constant query shared by all proxies is implemented as a singleton object. It may be necessary whenfoo
is stateful, however, as clients may initialise different proxies with different instances of the same query. In this case, failing to implement queries as value objects may bypass the caching strategy, hence reduce the efficiency of proxies.
- if
foo
is stateful, the combination of binding and querying strategies may have undesired effects for session-based clients. This occurs whenfoo
instances are replicated across sites but the query used by the proxy returns instances that are not exact replicas. The calls issued by clients may then yield inconsistencies if, through caching, the binding strategy transparently rebinds instances in the middle of a session. Thus the problem may emerge only for session-based clients and under particular combinations of queries and calls. When the CL may not exclude these combinations through design, it may allow clients to disable fault-tolerance on given proxies by invoking asetSticky(boolean)
method on theirDiscoveryMode
configuration. In this case, the proxies would treat LGE failures as unrecoverable, i.e. applyCR3
andCR4
. This trades off a degree of fault-tolerance for safety. We return on this point later, in relation to instance-specific operations that are particularly prone to this problem.
It should also be noted that binding and caching strategies remain largely opaque to clients. Clients limits their involvement to:
- if
foo
is stateful, providing queries for service endpoints;
- observing and reacting to discovery faults, such the lack of suitable endpoints or the occurrence of faults in the interaction with the directory services;
We discuss failure handling in detail later on in the document.
Proxy API
After creating proxies, clients invoke their methods to call the foo
endpoints that are bound to the proxies. Calls may take zero or more inputs, produce zero or one output, and raise one or more faults. Foo
models inputs, outputs, and faults with the types that seem most convenient for its clients. The local types may differ substantially from those defined in the remote API of the service. Proxies are responsible for converting between local types and remote types. Even when the remote types seem adequate for Foo
clients, adapting them to equivalent local forms helps Foo
to insulate its clients from future changes to the remote API.
Local types are virtually unconstrained from a design perspective. For example, they may:
- be constructed in a variety of patterns, including standard constructors, copy constructors, factories, builders, and more sophisticated forms of fluent APIs. When useful, they may deserialised from various representations, from language serialisation formats to, say, XML formats;
- exhibit arbitrary behaviour, including validation behaviour at creation time or at any other point in their lifetime;
- implement arbitrary interfaces and participate in arbitrary hierarchies;
- use type parameters for type-safe reuse;
- be arbitrarily annotated;
- have non-trivial notions of equivalence, cloning behaviour, and useful String serialisations;
Similar freedom extends to the design of Foo
. Foo
may implement any interface, participate in any hierarchy, be arbitrarily annotated and parameterised. Furthermore, Foo may use method name overloading for calls that have related semantics but require a different number of inputs, or inputs of different types.
The API uses this freedom towards the goals of:
- clarity and fluency, by choosing types that simplify client programming;
- correctness, by choosing types that detect locally, and often even statically, constraint violations which would be only enforced remotely and dynamically by
foo
; - standardisation, by choosing types that are formal or de-facto standards for the semantics of the data, either in the context of the language (common Java interfaces, appropriate Exceptions, naming conventions, etc.) or in a broader context.
We discuss below how the methods of Foo
are designed to model calls to foo
endpoints. In particular, we look at choices of local types for inputs, outputs, and faults for prototypical calls,
including calls that require or produce data collections, asynchronous calls, and calls that access the state of stateful foo
instances.
Example
The possibilities for the design of Foo
are open ended. We illustrate some of options here using a fictional example. The example is intentionally convoluted to illustrate a wider range of options.
Assume foo
exposes a operation bar
which:
- expects a rather complex and potentially recursive XML data structure
Baz
in input; - returns a simpler complex data structure
Qux
wheneverBaz
satisfies a set of constraints, from simple constraint (some attributes must not benull
, other must benull
) to complex constraints (some simple elements must have correlated values) - raises an
InvalidBazFault
when the input structure isnull
, is syntactically or structurally malformed, or does not satisfy the expected set of constraints;
Foo
mediates calls to bar
with the following method:
Qux bar(Baz baz) throws IllegalArgumentException, ServiceException;
where:
-
Baz
is a class that uses the annotations ofJSR 222
(JAXB 2.0) to bind its instances to XML, and the annotations ofJSR 303
to declare validity constraints upon them which cannot be detected by the type-checker. The API offers aBazBuilder
to fluently constructBaz
instances across its plethora of mandatory and optional parameters, andBaz
instances expose a set of sophisticated methods that allow clients to flexibly navigate its potentially very deep and recursive structure, including a query method based on XPath expressions.Baz
instances overrideequals()
,hashcode()
andtoString()
to facilitate assertions in tests as well as debugging; - proxies throw:
- an
IllegalArgumentException
if the input isnull
or invalid, enforcingJSR 303
annotations for the purpose. Proxies short circuit a remote call that would certainly fail and throws a local exception instead. They make sure that anull
attribute violation is detected before the call (direct mode) or the query (discovery mode) are issued; - a generic
ServiceException
in correspondence with any other form of remote failure. We discuss below the semantics of this exception and more generically the rationale forFoo
’s approach to failure reporting.
- an
-
Qux
is a fairly simple bean class, also decorated with JAXB annotations so that where the XML representation included a collection of uniquely named values,Qux
exposes instead aMap
ofString
keys. Furthermore, theQux
instances returned bybar()
have been proxied, so that the invocations of some of its key methods can be intercepted, to some particular end. It also exposes methods that accept subscriptions and produce notifications in response to some key events of its lifetime.
Faults
Foo
proxies may need to report to clients the range of failures introduced above. The proxies cannot, and indeed should not, predict that strategies that clients will adopt to handle such failures. However, they may assume that:
- in production, clients will at least contain all forms of failure, i.e. fully log them and conveniently report them to users or clients further upstream. Silencing failures or thread terminations are typically undesirable outcomes. Failure containment is normally dealt within error handlers that act as ‘barriers’ or ‘points-of-last-defence’ high-up in the call stack.
- clients may have coping strategies for contingencies that go beyond simple failure containment. The may be able to actually recover from the failures, e.g. by retrying with different inputs or by selecting an alternative execution path, including calling another service or falling back to defaults. Typically, clients will recover as close as possible to the observation of the failure, though not necessarily in the immediate caller.
- clients are more likely to recover from contingencies than from outages. This is because contingencies are specific expectations set forth by the
foo
that clients should be prepared to handle somehow.
Based on these assumptions, Foo
aligns with modern practices in:
- using unchecked exceptions to report errors and outages. Clients that may only contain such failures in generic error handlers will be dispensed from the noisy, error-prone, brittle, and ultimately pointless task of explicitly catching and/or re-throwing exceptions along the call stack.
- using checked exceptions to report contingencies. Clients may then avail themselves of the services of the typechecker to be alerted of failures that they should have prepared for.
In any case, Foo
documents all the exceptions that its methods may throw, regardless of their type.
More specifically, Foo
’s methods report:
- all the errors that may be detected in the client runtime prior to calling a
foo
endpoint. In itsbar()
method above, for example,Foo
declares anIllegalArgumentException
in lieu of theInvalidBazFault
that service would raise if proxies actually called itsbar
operation;
- all the contingencies the
foo
declares to raise. If the service declares anUnknownBazFault
for itsbar
operation, for example, thenFoo
declares a corresponding checked exception for its methodbar()
, and proxies throw the exception upon receiving the fault from afoo
endpoint. Iffoo
declares a base class for a number of related contingencies, and if its operationbar
may throw all the subclasses of the base class, thenFoo
declares only the base class for its methodbar()
;
- a single
ServiceException
for any outage, or for any error that cannot be detected in the client runtime prior to calling afoo
endpoint.
ServiceException
marks the non-local semantics of Foo
’s methods and serves as a base class or else as a wrapper for any other exception that proxies may observe. In particular, ServiceException
is defined as follows:
package org.gcube.common.clients.api; class ServiceException extends RuntimeException { private static final long serialVersionUID = 1L; public ServiceException(Exception cause) { ! super(cause); } }
Proxies wrap in ServiceException
s any exception thrown by the underlying communication API. For example, if foo
is a JAX-WS Web Service, proxies wrap in a ServiceException
any WebServiceException
thrown by their JAX-WS-compliant API of choice. If foo
is a JAX-RPC Web Service, then proxies wrap in in a ServiceException
any RemoteException
or SOAPFaultException
thrown by their JAX-RPC-compliant API of choice. In all cases, DefaultFoo
documents what exceptions may cause the ServiceException
s that its proxies may throw.
Clients that may only contain errors and outages may conveniently catch ServiceException
s in their error handlers. Clients that wish to customise their containment strategies for particular outages, or that can even recover from them, may inspect the cause of ServiceException
s and/or directly catch the general subclasses of ServiceException
that we discuss next.
Common Faults
There are a number of ServiceException
s which do not specifically relate to foo
’s remote API but may arise in the interaction with any system service, including:
- the inability to bind to a given
foo
endpoint. This may be the endpoint configured on a proxy created in direct mode. It may also be an endpoint that the discovery services return to a proxy created in discovery mode, as the discovery services are not immediately notified of endpoints that become unavailable after their publication; - the possibility of calls that are unscoped, or else issued in a scope which is not legal for a given
foo
endpoint; - the possibility of arbitrary failures in queries for
foo
endpoints, such as those issued by proxies created in discovery mode;
Proxies that observe the unavailability of an endpoint throw NoSuchEndpointException
s, which are defined as follows:
package org.gcube.common.clients.api; class NoSuchEndpointException extends ServiceException { private static final long serialVersionUID = 1L; public NoSuchEndpointException(Exception cause) { super(cause); } }
Proxies that observe scope-related failures throw IllegalScopeException
s, which are defined as follows:
package org.gcube.common.clients.api; class IllegalScopeException extends ServiceException { private static final long serialVersionUID = 1L; public IllegalScopeException(String msg) { super(msg); } }
where the message of the exception indicates the lack of scope or the faulty scope.
Proxies that cannot interact with the discovery services throw DiscoveryException
s, which are defined as follows:
package org.gcube.common.clients.api; class DiscoveryException extends ServiceException { private static final long serialVersionUID = 1L; public DiscoveryException(Exception cause) { super(cause); } }
Bulk Inputs and Outputs
Proxies may need to call foo
operations that that take or return collections of values. Foo
may then rely in its API on custom interfaces or classes that encapsulate the collection values required or provided by foo
, e.g.:
Nodes nodes(Paths paths) throws ... ;
where Nodes
and Paths
are ad-hoc models of nodes and path to nodes of some tree-like data structure.
More commonly, however, Foo
defines methods that rely on the standard Java Collections API
. When methods return collections of values, Foo
choose List
s:
List<Node> nodes(...) throws ... ;
In returning List
s, Foo
is not necessarily conveying to clients that the order of Node
s is meaningful, or that the same Node
may occur twice within the List
. Rather, Foo
is following two principles: a) the type that best models a collection of values may only be defined by its consumers, on the basis of their own processing requirements; b) some types are more versatile than others in adapting to a wider range of processing requirements. In its ignorance of how clients will consume the collection, Foo
returns it as a List
for the versatility of the List
API, and in the assumption that when its clients are better served by other, more constrained Collection
types they can easily and cheaply derive them from List
s.
For methods that take collections however, Foo
acts as a consumer and chooses the Collection
type that most closely captures the required constraints at compile-time, e.g. a Set
if Foo
expects no duplicates:
List<Node> nodes(Set<Path> paths) throws ... ;
On the other hand, Foo
does not restrict the semantics of inputs more than it should. For example, if there are no particular requirements on input collections, Iterator
or Iterable
are the most flexible choices, as they make the API immediately usable with a broader set of abstractions than Collection
s:
List<Node> nodes(Iterable<Path> paths) throws ... ;
The choice between Iterable
and Iterator
is not clearcut. Iterable
can improve the fluency of both client and implementation code, but requires materialised collections. This may be desirable in itself as an indication that the collections will be materialised in memory and that very large streams coming from secondary storage or network are not expected. When streams are not large, however, Iterable
forces clients to accumulate their elements before they can use the API.
Asychronous Methods
Calls to foo
may be synchronous or asynchronous:
- synchronous calls block clients until they have been fully processed by
foo
endpoints and their output, or just an acknowledgement of completion, is returned to clients. This temporal coupling between clients and endpoints forces both to relinquish some control over their computational resources. Clients must suspend execution in the calling thread and endpoints cannot schedule their availability to answer. It also requires calls to be fully processed within communication timeouts. Synchronous calls are thus preferred when endpoints can process them quickly, i.e. when the time in which clients and endpoints synchronise is short. This is the case when calls generate short-lived process and require the exchange of limited amounts of data;
- asynchronous calls do not block clients, either because they return no output (i.e. the operations are one-way) or because their output can be produced and returned to clients at a later time. This leaves clients and endpoints in control of their computational resources, but it complicates the programming model at both sides. Asynchronous calls are preferred when endpoints can fully answer only after long-lived processes, including those required to exchange large datasets;
foo
may pursue the benefits of asynchrony by designing and implementing its operations for it. One way operations return immediately with an acknowledgement of reception. Operations that produce output may return the endpoint of another service that clients can poll to obtain the output, when this becomes available (polling). Alternatively, foo
may require that clients indicate an endpoint that foo
endpoints can call back to deliver the output (callbacks). In all cases, foo
execute the operations in background threads.
Foo
may pursue the benefits of asynchrony even if foo
does not. In other words, Foo
may offer asynchronous calls over synchronous remote operations. In practice, this amounts to calling endpoints in background threads. Polling and callbacks remain available as patterns for the delivery of output between threads, though their implementation is now local to clients. The approach does not cater for communication timeouts, hence for calls that generate long-lived processes at foo
endpoints. However, it allows clients to make further progress while the endpoints are busy processing their calls.
Polling And Callbacks
An asynchronous call that induces a long-lived process at the service endpoint may return immediately with a reference to the ongoing process. Clients may then use the reference to wait for the process to complete only when they need its outcome to make further progress. They may also poll the status of process and perform other work while it is still ongoing.
In Java, the standard model for such references is provided by Future
s. For example, Foo
defines the following method:
Future<String> barAsync(...) throws ... ;
which promises to return a String
when this becomes available. Clients use Future.get()
methods to block for the output, indefinitely or for a given amount of time. They can use Future.isDone()
to poll the availability of any output. They can also use Future.cancel()
to revoke the submission of a call (in case this has been scheduled but not issued yet) or, if the service allows it, to cancel the remote process.
We assume that barASync()
declares failures following the strategy discussed previously, with the understanding that these are failures that may occur only before foo
starts processing calls (including failures thrown by DefaultFoo
before calls are actually issued). Failures raised by foo in the context of processing calls will instead be delivered in Future.get()
methods, in accordance with the Future API
. In particular, unchecked ServiceException
s and checked contingencies will be found as the cause of ExecutionException
s thrown by Future.get()
methods.
If the underlying remote operation is one-way, Foo
defines barAsync()
as follows:
Future<?> barAsync(...) throws ...;
which returns a wildcard Future
that clients may use to cancel submissions/processes, as above, or that they ignore altogether in case fooAsync
is conceptually fire-and-forget.
In addition to polling, Foo
may also rely on callbacks to deliver call outputs to its clients. In this case, Foo
requires clients to provide a Callback
instance at call time, i.e. an instance of the following interface:
package org.gcube.common.clients.api; interface Callback<T> { public void onFailure(Throwable failure); public void done(T result); }
Specifically, Foo
may overload barAsync
as follows:
Future<?> barAsync(..., Callback<String> callback) throws ... ;
The method promises to return immediately with a wildcard Future
, which clients can use as above, and to deliver the outcome to the Callback
instance as soon as this becomes available. The delivery occurs through two different callbacks, depending on whether the outcome is a success (done()
) or a failure (onFailure()
).
Clients may entirely consume the output in the Callback
instance. Alternatively, they are responsible for exposing it directly or indirectly to other components.
Streams
With polling and callbacks, Foo
let its clients perform useful work as they wait for the output of long-lived processes that execute at foo
endpoints. The approach however does not directly address the case in which the output itself is a large dataset.
In this case, clients must still block waiting for the whole dataset to be transferred before they can start processing it. They also need to allocate enough local resources to contain the dataset in its entirety. Similar demands are faced by foo
, which needs to produce and hold the entire dataset before it can pass it to its clients. Thus large datasets may reduce the responsiveness of clients and the capacity of service endpoints.
foo
and its clients may avoid these issues if they produce and consume data as streams. A stream is a lazily-evaluated sequence of data elements. Clients consume the elements as these become available, and discard them as soon as they are no longer required. Similarly, endpoints produce the elements as clients consume them, i.e. on demand.
Streaming is used heavily throughout the system as the preferred method of asynchronous data transfers between clients and services. The gRS2 library provides the required API and the underlying implementation mechanisms, including paged transfers and memory buffers which avoid the cumulative latencies of many fine-grained interactions. The API allows services to “publish” streams, make them available at a network endpoint through a given protocol. Clients obtain references to such endpoints, i.e. stream locators, and clients resolve locators to iterate over the elements of the streams. Services produce elements as clients require them, i.e. on demand.
Data streaming is used in a number of use cases, including:
-
foo
streams the elements of a persistent dataset; -
foo
streams the results of a query over a persistent dataset; -
foo
derives a stream from a stream provided by the client;
The last is a case of circular streaming. The client consumes a stream which is produced by the service by iterating over another stream, which is produced by the client. Examples of circular streaming include:
- bulk lookups, e.g.
foo
streams the elements of a dataset which have the identifiers streamed by the client; - bulk updates, e.g.
foo
adds a stream of elements to a dataset and streams the outcomes back to the client;
More complex uses cases involve multiple streams, producers, and consumers.
The advantages of data streaming are offset by an increased complexity in the programming model. Consuming a stream can be relatively simple, but:
- the assumption of remote data puts more emphasis on correct failure handling at
foo
and its clients; - since streams may be partially consumed, resources allocated by
foo
for streaming need to be explicitly released; - consumers that act also as producers need to remain within the stream paradigm, i.e. avoid the accumulation of data in main memory as they transform elements of input streams into elements of outputs streams;
- implementing streams is typically more challenging that consuming streams. Filtering out some elements or absorbing some failures requires look-ahead implementations. Look-ahead implementations are notoriously error prone;
- stream implementations are typically hard to reuse (particularly look-ahead implementations);
Thus streaming raises significant opportunities as well as non-trivial programming challenges. The gRS2 API provides sophisticated primitives for data transfer, but it remains fairly low-level when it comes to producing and consuming streams.
The streams
library provides the abstractions required to simplify further stream-based programming in simple and complex scenarios. It implements a DSL for stream manipulation which is built around the Stream
interface, an extension of the familiar Iterator
interface. The DSL simplifies a range of stream transformations, making it easy to change, filter, group, and expand the elements of input streams into elements of output streams. The DSL also allows to configure failure handling policies and event notifications for stream consumption, and it simplifies the publication of streams as gCube ResultSets
.
Foo
relies on the DSL of the Streams
API whenever its methods need to take and/or return streams.
For example, if foo
can stream the results of a given query, for example, Foo<<code> may provide its clients with the following method:
Stream<Item> query(Query query) throws ... ;
where <code>Item and Query
model, respectively, the elements of a remote dataset and a query issued against that dataset, and where the output Stream
gives access to a remote gCube Resultset
produced by foo
. Clients are free to access the locator of the stream with Stream.locator()
and consume it with the lower-level gRS2 API, if required.
Similarly, if foo
can stream the elements with given identifiers, Foo
may define the following method:
Stream<Item> lookup(Stream<Key> ids) throws ... ;
where Key
models Item
identifiers. By taking a Stream
as input, Foo
promises to publish the stream on behalf of clients and to send the corresponding locator to foo
.
Since clients may want to remain in charge of publication, Foo
overloads lookup()
as follows:
Stream<Item> lookup(URI idRs) throws ... ;
i.e. accepts directly the locator to a gCube Resultset
of keys which has already been published by the client, or by some other party further upstream.
Both query()
and lookup()
model failures according to the strategy outlined above, with the understanding that these are failures that may occur only before foo starts producing streams (including failures thrown by DefaultFoo
before calls are actually issued). Failures raised by foo
in the context of producing streams instead be delivered during Stream<code> iteration, in accordance with the specification of the <code>Stream
API. In particular, unchecked ServiceException
s and checked contingencies will be found as the cause of StreamException
s.
Finally note that Foo
may return streams through polling and callbacks if foo
can start producing them only at the end of long-lived processes, e.g.:
Future<Stream> pollStream(...) throws ... ;
or
Future<?> callbackStream(...,Callback<Stream> callback) throws ... ;
Service Instances
As discussed above, stateful services share a number of design elements:
- there is a companion factory service, typically stateless, which creates service instances;
- service instances have a lifetime which may be independent from their endpoint’s;
- service instances have properties whereby they can be discovered;
If foo
is stateful its CL includes a proxy interface to its companion foo-factory
service, e.g. FooFactory
. FooFactory
and its implementation are designed with the same patterns and principles discussed so far for Foo
, including direct and discovery modes for its proxies. The relationship between FooFactory
and Foo<?code> is explicitly captured by one or more factory methods in <code>FooFactory
, e.g.:
public Foo create(...) throws ...;
create()
triggers the creation of a foo
instance and returns a Foo
proxy bound to that instance. Its design is otherwise governed by the principles already discussed for other proxy methods. In particular, create()
may be overloaded, may be asynchronous, and may return a collection if it results in the creation of multiple foo
instances. If and when appropriate, it may also be named to reflect more adequately the semantics of the operation (e.g. newSource()
or startJob()
).
As it is explicitly created, a foo
instance may be explicitly destroyed. This can be accomplished through a method destroy()
in the API of Foo
:
void destroy() throws ServiceException;
The method takes no input and has no outcome other than a potential failure. Its semantics is ultimately service-specific, though its side-effects typically include the release of computational resources at the target foo instance. In all cases, it is likely that foo
will place tight security requirements on its invocations.
Besides a lifetime, foo
instances have properties and, while the main role of such properties is to characterise instances for discovery purposes, there may be a requirement for exposing them to clients directly through proxies. In this case, one obvious option is to extend Foo
with accessor methods and, where applicable, mutator methods with appropriate bindings for property values. Often, a better option is to factor out accessors and mutators in a separate FooProperties
class and extend Foo
with the following methods:
FooProperties properties() throws ServiceException; void setProperties(FooProperties properties) throws ServiceException;
This capitalises on the standards adopted within the system to retrieve and update in bulk instance properties. Clients would obtain the all the instance properties with a single remote call, inspect them or change them locally as required, and then commit any change with another single remote call. FooProperties
will not define mutators for read-only properties, and setProperties()
can be excluded altogether if all the properties are read-only.
Notice that lifetime-related and property-related methods operates directly on the state of foo
instances. As such, they may generate undesired side-effects when clients invoke them on proxies created in discovery mode in the middle of a session. We have discussed the issue above in general terms, and indicated a minimal solution in the terms of ‘sticky session’ configuration on the proxies. Other, more structured and explicit solutions may be preferred if a large class of use cases assumes session-based clients and access to instance properties. For example, destroy()
, as well as property accessor and mutators methods, may be collected in a dedicated FooInstance
interface and Foo
may be extended with a method that returns a (private) implementation of the interface:
FooInstance toInstance() throws ServiceException;
Having to invoke toInstance()
on a proxy clarifies to clients that the operations they may invoke on the returned value are conceptually different from the other operations declared by Foo
, in that they explicitly operate on the state of the foo
instance currently bound to the proxy. Thus FooInstance
makes clients more aware of the binding and caching strategies of such clients, i.e. that the proxy may be bound to another instance during the session. This reduces the possibility that clients may overlook the possibility of inconsistencies, and improves the readability of their code. Of course, FooInstance makes also their code more verbose, and introduces noise in the use of proxies in direct mode.
Context Management
Proxies invoke the remote operations of foo
in a context which encompasses more information that the target service endpoint and the input parameters of the calls. In particular, calls occur always in a given scope and conditionally to the provision of credentials about the caller. An attempt to call foo
in no particular scope, or in a scope in which the target endpoint does not exist, as well as calls that are issued anonymously will be rejected, either by proxies or by their bound endpoints.
We discuss below how this contextual information is made available to proxies.
Scope Management
One way of providing DefaultFoo
instances with scope information is to require their immediate callers to specify one when the instances are created. Making scope explicit, however, induces clients to propagate scope information across their call stack, and this may easily prove intrusive for their design.
A less intrusive approach is to bind scope information to the threads in which DefaultFoo
instances issue remote calls. Clients remain responsible for making the binding, but they can do so further up the call stack, as early as scope information becomes available to them. Client components that execute on the stack thereafter need have no design dependencies on scope.
To implement this scheme, DefaultFoo
relies on the common-scope
library, which provides the tools required to bind and propagate scope as thread-local information. In particular, common-scope
models scope as plain String
s and includes a ScopeProvider
interface with methods to bind a scope with the current thread (ScopeProvider.set(String)
), obtain the scope bound to the current thread (ScopeProvider.get()
), and remove the scope bound to the current thread (ScopeProvider.remove()
). ScopeProvider
gives also access to a single instance of its default implementation, which can be shared between clients and DefaultFoo
(the constant ScopeProvider.instance
).
Thus a client component high up the call stack binds a scope to the current thread as follows:
String scope = ... ScopeProvider.instance.set(scope);
and, lower down the call stack, DefaultFoo
obtains the same scope as follows:
String scope = ScopeProvider.instance.get();
Note that:
- since the shared
ScopeProvider
is based on anInheritableThreadLocal
,DefaultFoo
may execute in any child thread of the bound thread;
- if the current thread and its ancestors are unbound, the shared
ScopeProvider
attempts to resolve scope from the system propertygcube.scope
. When clients operate in a single scope, this property can be set when the JVM is launched and clients can avoid compile-time dependencies onScopeProvider
altogether;
- clients that reuse threads to call foo in different scopes will need to explicitly unbind threads, and typically will do so in the same component that binds them;
Security Management
GCube security model is based on two pillars:
- SOA3 Security Framework, implementing username/password security factor
- Transport Layer Security (TLS), in particular HTTP on TLS (HTTPS) based on Public Key Infrastructure.
SOA3 Security Framework provides authentication and authorization based on username/password security factor and Attribute Based Authorization. Transport Layer Security is a cryptographic protocol derived from Secure Socket Layer (SSL): it is based on Public Key Cryptography [1], symmetric encryption [2] and message authentication code [3] and provides digital signature, privacy and integrity.
Clients can access gCube resources with different security mechanisms, in particular:
- username/password previously registered on the infrastructure
- client HTTPS authentication
- both
Username and password are sent by the client to the server inside the message in the HTTP Authentication Header: this mechanism is very similar to Scope Propagation described in the previous section. For this reason it is possible to adopt a similar design based on DefaultFoo
proxy which decouples Client's code from gCube based code.
Username and password can be set at Client level by the method:
CredentialManager.instance.set (username,password)
Since CredentialManager is a shared singleton class, based on InheritableThreadLocal
, DefaultFoo
is able to get those pieces of information using the method:
String headerString = CredentialManager.instance.getAuthHeader ()
and add the Authentication Header to the message. In order to obtain total decoupling with zero-dependencies, it is possible to set username and password in two JVM properties:
gcube.username
gcube.password
used by CredentialManager as last resort when no credentials have been set in current or parents threads. In this way non-gCube java clients can behave as secure gCube clients using only a proxy and without any client code modification.
The second pillar is classical HTTPS, which is strongly recommended to access protected resources on gCube infrastructure: ideally gCube Nodes should expose resources to external clients only by HTTPS. A gCube HTTPS client is configured as a normal JSSE based client, with the same features and options[4]. The following table is a summary of the most important properties to use JSSE:
Property | Description |
javax.net.ssl.keyStore | keystore location |
javax.net.ssl.keyStorePassword | keystore password |
javax.net.ssl.keyStoreType | keystore file format |
javax.net.ssl.trustStore | truststore location |
javax.net.ssl.trustStorePassword | truststore password |
javax.net.ssl.trustStoreType | truststore file format |
Client's JVM truststore must contain gCube's CA certificate, in order to trust the server.
As an alternative (or integration) of username/password approach, gCube supports certificate based login: in other words gCube Security Module can trust a request and grant access privileges basing on digital signature. The basic requirement is that the Distinguished Name of client's certificate has been associated with a valid user of the infrastructure at registration time. In this case, at client level, a valid client certificate (an user certificate is better, but also a host certificate works) should be added to client's JVM keystore.
If the Client doesn't present any valid certificate, even if HTTPS is used, an hybrid approach must be taken: if particular username and password should be set as described above.
- ↑ http://en.wikipedia.org/wiki/Public-key_cryptography
- ↑ http://en.wikipedia.org/wiki/Symmetric-key_algorithm
- ↑ http://en.wikipedia.org/wiki/Message_authentication_code
- ↑ http://docs.oracle.com/javase/6/docs/technotes/guides/security/jsse/JSSERefGuide.html
Session Management
Coding Guidelines
Naming Conventions
@TODO: introduce and motivate name conventions for interfaces, classes, packages.
Appendix A: Specifications
@TODO: briefly summarises model in terms of “may”, “should”, “must” specifications.
Appendix B: API
@TODO: list interfaces and classes defined by the model.
Appendix C: Framework Requirement and Guidelines
@TODO: identify scope for framework support.