Difference between revisions of "Archive Import Service"

From Gcube Wiki
Jump to: navigation, search
Line 1: Line 1:
The Archive Import Service (AIS) is in charge of defining collections and importing their content into the gCube infrastructure, by interacting with the collection management service, the content management service and other services at the content management layer, such as the Metadata Manager Service. While the functionality it offers is, from a logical point of view, well defined and rather confined, the AIS must be able to deal with a large number of different ways to define collections and offer extensibility features so to accommodate types of collections and ways to describe them not known/required at the time of its initial design. These needs impact on the architecture of the service and on its interface. From an architectural point of view, the AIS offers the possibility to add new functionality using pluggable software modules. At the interface level, high flexibility is ensured by relying on a scripting language, AISL, used to define import tasks, rather than on a interface with fixed parameters. The language is designed to support the most common tasks needed during the definition of an import task, is based on a flexible data model used to describe the resources to import, and is itslef extensible with new features. Beside the interface based on AISL, the AIS also offers other interfaces, called adapters, that ease the specification of common tasks and are also pluggable (i.e. new adapters can be added to the AIS). As importing collections might be an expensive task, resource-wise, the AIS offers features that can be used used to optimize import tasks. In particular, it supports incremental import of collections. The description that follows introduces first the rationale behind the functionality of the AIS and its overall architecture, then describes its scripting language AIS, its extensibility features, and the concepts related to incremental import. Finally, it presents the interface of the service.
 
  
 +
== Introduction ==
  
== Importing Resources ==
+
The role of the AISL Service is to import resources which are external to the gCube infrastructure into the infrastructure itself. "Importing", here and in the following, refers to the description
 +
of such resources inside the Information Organization stack of services, and not necessarily to the fact that the content of such resources is actually stored within facilities that belong to the
 +
infrastructure. Similarly, kind of resources that can be imported are not necessarily objects that exist physically. The association between an image and a file containing metadata referring to it might not exist physically, but can still be considered a
 +
resource that can be imported. The same holds for a collection of images. The word "external resource" is used to denote any resource outside the gCube infrastructure (independing of the fact that it has been imported already or not).
 +
The world "internal resource" is used to denote the entity that represents an external resource inside the gCube infrastructure. This is normally an object or a relationship of the info-object model,
 +
and is identified, inside the gCube infrastructure, by an Object Identifier.
  
Overall, the execution of an import task can be tought as divided in two main phases 1) Representation phase and 2) Import phase. During the first phase, the resources to be imported and their relationships are modeled using a graph-based model. This representation is used during the import phase by appropriate "importers", software modules  that encapsulate the logics needed to import specific kind of resources, interfacing with the appropriate services.
 
  
'''Representation phase'''
+
The service should support the import by providing a way to specify which resources should be imported and how, and offer facilities to automate this task whenever possible.
During this phase, the resources to be imported are modeled using a graph based data model. The model contains three main constructs:
+
collection, object and relationship, thus resembling the collection and information object models on which the content management services
+
are based. The constructs in the modeling language can be used to assemble a graph of resources, specifying their content and relationships.
+
Each construct has a type, and can be annotated with a number of properties. The type of the resource is a marker that is used by importers to select the resources they are dedicated. The properties are just name-value pairs. The values can be of any java type. Notice that only some types of resources and their properties are fixed in advance. In particular, the existing types are used to allow the import of content and metadata. To support the extension of functionality of the AIS, it is possible to define new types, annotated with different properties.
+
  
'''Import phase'''
 
During this phase, an import-engine dynamically loads and invokes a number of importers, modules that are responsible to import specific kinds of resources. Each importer inspects the representation graph, identifies the resources for which it is responsible, and imports them. Beside performing the import, the importer may also perform some further annotation of the resources in the graph. These annotations are used in later execution of tasks that involve the same resources, and may also be exploited by other importers. For example, in the case of importing metadata related to some content, the content objects to which the metadata objects refer should have already been stored in the content-management service, and should have been annotated with the identifiers used by that service. Similar considerations hold for content collections. Importers enabled to handle content-objects and metadata objects are already provided by the AIS.
 
Additional importers, dedicated to store specific kind of objects, can be added.
 
  
At the end of the import phase, the resource graph created during the representation phase and annotated during the import phase is stored persistently.
 
  
  
 +
== Import procedure ==
  
 +
The task of importing a set of external resources is articulated in two major steps. First, a description of the resources to import is built. This description is based on a custom data model,
 +
which is described later on in this document. The resulting description is essentially a graph labelled on nodes and arcs with key-value pairs. This description is called a graph of resources (GoR).
 +
Second, the GoR built during the first phase is processed by a chain of software modules called "importers". Each importer is dedicated to import resources interacting with a specific subpart of
 +
the Information Organization set of Services. For instance, the metadata importer is responsible for the import of metadata, and it handles their import by interacting with the Metadata Management Service.
 +
The precise way in which importers handle the import task, and in particular how they define a specific description of the resources they need to consume inside a GoR is left to the single importers.
 +
Details about the importers which are already included in the Archive Import Service are given below in this document. The creation of a graph of resources is done trhough the execution of a script written in the AISL language,
 +
which is described later on in this document.
  
== Interface to the AIS ==
+
 
 +
 
 +
 
 +
== Data Model ==
 +
 
 +
The data model handled by AISL features three main types of constructs:
 +
 
 +
- Collection
 +
- Resource
 +
- Relationship
 +
 
 +
A set of objects of these three main types, built by AISL script, form a so-called Graph of Resources (GoR). A graph of resources is a graph composed by nodes (resources) and edges (relationships). Furthermore,
 +
The collection construct allows to group nodes (resources) into sets. All constructs of the model can be annotated with properties, which are name/value pairs.
 +
 
 +
The three main types of constructs cannot be instantiated directly. Instead, objects of specific subtypes of these constructs must be instantiated. These subtypes are defined by specific plugins called importers that manage
 +
the import of different kinds of resources inside the gCube infrastructure. The precise semantics of the properties attached to types and the precise use of the constructs is not fixed in advance, but is determined by the specific importer that defines and manage a specific
 +
subtype. In particular, there is no direct correspondance between constructs in the GoR and how the resources are represented inside the gCube Information Organization facilities.
 +
 
 +
 
 +
 
 +
Beside being annoted with properties, each construct of the model must be assigned an external identifier. An external identifier is a string that uniquely identifies a certain external resource, and the model object
 +
that refers to it. This identification must hold across multiple, different invocations of the AIS.
 +
 
 +
For instance, consider the two files:
 +
1) http://mydomain.org/myimage.jpg
 +
2) http://myotherdomain.org/mymetadata.jpg
 +
 
 +
representing respectively an image and a file of metadata describing it. Here, we have three external resources to import: the files 1) and 2) and the association between the two.
 +
This set of resources can be represented by instantiating two model objects of type "resource", having specific type respectively equal to "content" and "metadata", and and a model object of
 +
type "relationship" and subtype "metadata". Furthermore, the GoR should contain two model objects of type collection and subtype respectively "content" and "metadata" which will contain the objects referring to resources 1) and 2).
 +
 
 +
When all these resources are created, they must be assigned an external identifier, which specifies their identity in the real worls. So, for instance, it would be possible to choose
 +
the string "http://mydomain.org/myimage.jpg" to identify the resource 1). The first time an import task is run that specifies this gor, the AIS will create an internal resource referring to
 +
the external resource 1). This internal resource will receive an (internal) object identifier. If another import task is executed, and another resource with external identifier "http://mydomain.org/myimage.jpg"
 +
is created, the AIS will treat this as the same resource, and so it will not create another internal resource but modify, if needed, the one already created.
 +
 
 +
 
 +
 
 +
== The AISL Language ==
 +
 
 +
AISL is a scripting language intended to create graphs of resources for subsequent import. Its main features are a tight integration with the AIS, in the sense that the creation of model objects
 +
are first citizens in the language, and the ability to treat in a way as much as possibile transparent to the user some tasks which are frequent during import, like accessing files at remote locations.
 +
Beside avoiding the complexity of full fledged programming languages like Java, its limited expressivity - tailored to import tasks only - prevents security issues related to the execution of code in remote systems.
 +
The language does not allow the definition of new functions/types of objects inside a program, but can be easily extended with new functionality by defining new functions as plugin modules. This feature is detailed later
 +
on in this document.
 +
 
 +
 
 +
Type System
 +
The language is currently non typed. Variables can be assigned with any kind of object from the Java type system. However, it is planned to enforce at least a partial static type checking
 +
in the future, to allow early detection of errors in the scripts. For this reason, the grammar already requires variables to be declared before their use.
 +
Even though the language process does not currently check for type compliance, it is strongly suggested that implementors try to use the appropriate types,
 +
in order to reduce the effort of reconverting scripts later on. The types supported by the language(i.e. for which the language allows an explicit variable declaration) are:
 +
 
 +
Primitive types:
 +
integer
 +
float
 +
boolean
 +
string
 +
list
 +
file
 +
 
 +
Model object types:
 +
collection
 +
resource
 +
relationship
 +
 
 +
Notice that AISL is not an object oriented language. Even if some of the types correspond to java types, there are no methods nor fields. Access to properties of objects is instead possible thorugh appropriate functions.
 +
So, for example, the size of a list value can be obtained by invoking the function listsize() on it. However, model object types have "properties" that resemble fields in object oreinted programming languages and
 +
that can be accessed through the familiar dot (".") notation.
 +
 
 +
Values of the primitive types are treated as in the Java language. As in Java, strings can be created by supplying a value between double quotes, and can be concatenated using
 +
the "+" operator. Values of the "list" type are similar to lists in the java language, but array-like selection of their elements is supported, and the language provides
 +
a built in constructor for lists (se below at the section type constructors). As in java, lists indexes are zero-based. Lists are currently heterogeneous
 +
(i.e. they can contain values of different types). In future releases, it is planned to provide facilities to enforce list-type checking. The "file" type represents local or remote external resources in a way that is independent of
 +
the specific protocol used to access these resources.
 +
 
 +
The types collection, resource and relationship correspond to the model object constructs introduced in the previous sections. Variables of these types are used to hold instances of objects of the resource
 +
graph data model.
 +
 
 +
 
 +
Syntax
 +
This section describes briefly the syntax of AISL and semantics of AISL constructs, focusing especially on aspects in which the language differs from the Java programming language syntax, to which it is close.
 +
A formal description of the syntax can be found in the appendix following this document.
 +
 
 +
An AISL script is a sequence of instructions, which are either variable declarations, conditional statements, loop statements and some kind of expressions like variable assignments and function invocations.
 +
Values for the various types in the AISL can be built through appropriate type constructors. Values can then be manipulated and composed using expressions of various kind, which are in most cases similar to the corresponsing expressions in the Java programming language.
 +
 
 +
 
 +
Type constructors
 +
 
 +
Primitive type constructors
 +
integer
 +
integer values are sequences of digits, starting with a non-zero digit, or a single '0' digit. In other words, they match the production
 +
#DECIMAL_LITERAL: "0"|(["1"-"9"] (["0"-"9"])*)
 +
 
 +
 
 +
float
 +
floating point values are expressed by a decimal decimal value eventually prefixed with an integer value:
 +
FLOATING_POINT_LITERAL: (["0"-"9"])+ "." (["0"-"9"])*
 +
 
 +
 
 +
boolean
 +
the words true and false are reserved words in AISL, and they are interpreded as the corresponding boolean values
 +
 
 +
 
 +
string
 +
string literals are sequences of characters between double quotes. Special characters like newline and tab are escaped and treted as in the Java programming language.
 +
 
 +
 
 +
list
 +
lists are built by enclosing a list of expressions separated by commas into curly brackets. For example:
 +
 
 +
list myList = {3+4, 56, "a"};
 +
 
 +
 
 +
file
 +
file objects are built by invoking the constructor functions getfile(string locator) and getFile(string locator, list<string> accessinformation) A locator is a string encoding a location and a protocol to be used to access the file.
 +
For instance the instruction:
 +
 
 +
file f= getFile("ftp://ftp.example.org/pub/share/myfile.xml);
 +
 
 +
builds a file object that accesses its content through the ftp protocol at the given location. The format of the locator string is not defined in advance, as it depnds on the specific protocol used. Currently supported protocols are ftp, http, file. They all accept an URL as locator.
 +
Different formats may be provided by different subtypes of the AISLFile class. This is described in more detail later on in this document. The two-arguments constructor allows  to pass in login information
 +
that might be needed to access remote resources.
 +
 
 +
 
 +
 
 +
Model Object Constructors
 +
These constructors allow to create elements of the resource graph which is later used for import by the service. Once created, the properties of an object can be modified but the object itself cannot be deleted.
 +
In other words, it sufficient to invoke one of these constructors for the object to be in the final graph of resources. In general, all constructors impose to provide:
 +
 
 +
- the type of construct to be created (i.e. collection, resource or relationship)
 +
- the specific subtype of the object. This subtype should be defined in the context of a specific importer, by subclassing
 +
appropriately the class definining the basic construct. This allows to perform checks during the parsing of the script, e.g. on the properties of constructs. For example, the type collection::content is a subtype of the type collection that defines the properties collectionName, isVirtual and isUser.
 +
- a unique "external identifier". This string value uniquely identifies a certain construct, so that it can be recognized during subsequent import phases.
 +
- in the case of resources, it is possible to supply to the constructor a list of collections to which the resource must belong
 +
- in the case of relationships, the resources that the relationship links must be specified.
 +
 
 +
Furthermore, the body of the constructor allow to initialize one or more of the properties eventually defined by the construct. The names, types and precise semantics of these properties are described in the section
 +
about importers.
 +
 
 +
 
 +
Examples of constructors are as follow:
 +
 
 +
 
 +
collection metadatacollection = collection::metadata["medspiration_test_metadata"]{
 +
collectionName = "medspiration_test_metadata",
 +
collectionDescription = "test for the AIS with medspiration data",
 +
relatedContentCollection=contentcollection,
 +
isUser=true,
 +
isIndexable=false,
 +
metadataName="dc",
 +
metadataLanguage="en",
 +
metadataSchemaURI="http://www.opendlib.com/resources/schemas/metadata_dc.xsd"
 +
};
 +
 
 +
 
 +
This constructor defines a metadata collection. Here the type is "collection", the subtype is "metadata", the external identifier is given by a static string ("medspiration_test_metadata").
 +
The body of the constructor initializes a number of properties specific of the "metadata" subtype. Notice that the object created by the constructor is then assigned to a variable of the appropriate
 +
type (collection).
 +
 
 +
 
 +
 
 +
resource::content[url] in ccoll{
 +
isVirtualImport=false,
 +
contentSourceLocator=url,
 +
documentName= name,
 +
hasMaterializedContent = false
 +
};
 +
 +
This constructor defines a resource of type content. The external identifier here is a variable (url) that must evaluate to a string. Furthermore, the object is specified to belong to a specific
 +
collection again using a variable (ccoll) holding an instance of a (content) collection.
 +
 
 +
 
 +
relationship rel= relationship::metadata(metadata, content)["metadatarel"+url]{};
 +
 
 +
this constructor specifies an relationship of subtype metadata. The couple of variables (metadata, content) specify the resources to and from which the relationship holds. The external identifier
 +
is computed as an expression evaluating to a string.
 +
 
 +
 
 +
 
 +
Expressions
 +
 
 +
Arithmetic Expressions
 +
numeric types (integer and float) can be combined using the same operators available in the Java programming language, i.e. the unary operators + and - and the binary operators +, -, /, * and %.
 +
These operators have the same precedence and semantics as in Java. If the operands of a binary operator have different type, the type of the result is always "float".
 +
 
 +
Relational Expressions
 +
The relational operators ==, !=, <, <=, >, >= have the same precedence and semantics as in java. They all evaluate to a boolean value and they can all be applied to numeric values. Furthermore, the operators
 +
== and != can be applied to all other types.
 +
 
 +
Boolean Expressions
 +
Boolean expressions are built from boolean values by applying the unary operator ! (not) and the binary operators | (or), & (and), ^ (exclusive or), whit the same precedence and semantics as in Java. Notice that
 +
differently from java AISL does not support the conditional boolean operators ||, && and ^^.
 +
 
 +
Selectors
 +
The elements of list-typed values can be obtained with the same syntax that in Java is used to access the elements of arrays. E.g.
 +
 
 +
list myList = {3+4, 56, "a"};
 +
integer myInt = myList[1];
 +
 
 +
Lists can be nested, and selectors can be combined:
 +
list myList = {3+4, {45, 10}, "a"};
 +
integer myInt = myList[1][0];
 +
 
 +
 
 +
the properties of model object typed values can be accessed by name with a dot notation. e.g.
 +
 
 +
 
 +
resource myContentResource = resource::content[url] in ccoll{
 +
isVirtualImport=false,
 +
contentSourceLocator=url,
 +
documentName=name,
 +
hasMaterializedContent = false
 +
};
 +
 
 +
myContentResource.documentName="test";
 +
 
 +
 
 +
Variable Declarations
 +
Variable Declarations contain a specification of an AISL type, a variable identifier and an optional initializer. E.g.
 +
 
 +
list myList = {3+4, 56, "a"};
 +
 
 +
 
 +
Functions
 +
Function invocation in AISL is analogue to function invocation in Java, except that all function have global visibility and there are no objects or classes thorugh which invoke a function.
 +
An example is:
 +
 
 +
...
 +
string mystring= "test";
 +
boolean matches= match(mystring, "t.*t");
 +
print(matches);
 +
...
 +
 
 +
This code snippet contains two function invocations, namely of the functions match and print (it prints "true"). AISL comes with a set of predefined functions, described below. New functions can be added to the language. This is described later on in this document.
 +
 
 +
 
 +
Predefined AISL Functions
 +
 
 +
Functions on file
 +
These predefined functions provide access to the properties of objects of type file.
 +
string    filename(file f) returns the name of the file
 +
integer    filesize(file f) returns the size of the file
 +
boolean    isdirectory(file f) returns true if this file is a directory
 +
boolean    isfile(file f) returns true if this file is a regular file
 +
list<file> children(file f) returns a list containing a file object for each of the children of f. The list returned is empty if this file is a regular file (i.e. not a directory) or is the protocol
 +
used to access the file does not model a hierarchical fileystem.
 +
list<file> descendants(file f) returns a list containing a file object for each of the descendants of f, obtained by recursively exploring all subdirectores. Notice that the list contains all files in the subtree rooted
 +
at f, not only its leaves (i.e.  it also contains all directories taht are descendants of f. The list returned is empty if this file is a regular file (i.e. not a directory) or is the protocol
 +
used to access the file does not model a hierarchical fileystem.
 +
 
 +
 
 +
Functions on string
 +
boolean match(string str, string patter) returns true if the given string matches the given regular expression pattern, false otherwise.
 +
boolean print(string str) prints str.
 +
 
 +
Functions on list
 +
integer listsize(list l) returns the size of the list l
 +
 
 +
Functions on dom objects
 +
xpath()
 +
xslt()
 +
string toString(object o) converts a given dom object into a string (i.e. to its XML serialization).
 +
 
 +
 
 +
 
 +
Control Flow Statements
 +
AISL contains three control flow statements: if, switch and foreach. The major syntactic difference between these statements and the corresponding ones in the java language is that
 +
instructions inside the constructs must be enclosed in curly brackets (even when they contain a single instruction). Notice that these statements are not terminated by a ";" character.
 +
For the rest, if and switch statements are completely similar to their Java counterparts, while the foreach statement has a special syntax.
 +
 
 +
Conditional Statements
 +
 
 +
If statements
 +
This statement has the same syntax and semantics as in Java, and takes the two forms:
 +
 
 +
if( conditional expression ){
 +
...
 +
}
 +
 
 +
and
 +
 
 +
if( conditional expression ){
 +
...
 +
}
 +
else{
 +
...
 +
}
 +
 
 +
 
 +
Switch Statement
 +
This statement has the same syntax and semantics as in java:
 +
 
 +
switch( expression ){
 +
case expression1:
 +
...
 +
break;
 +
 
 +
...
 +
 
 +
case expressionN:
 +
...
 +
break;
 +
 
 +
default:
 +
...
 +
break;
 +
}
 +
 
 +
 
 +
Loop statements
 +
The AISL language tries to avoid as much as possible unbounded loops. For this reason it does not have a while statement and has a foreach statement that only allows bounded loops.
 +
In particular, foreach allows to iterate over a range of integer values, with a fixed increment (or decrement).
 +
 
 +
 
 +
foreach loopvariable in [ expression to expression by expression]{
 +
...
 +
}
 +
 
 +
The three expressions appearing in the statement correspond to the minimum and maximum value of the range and to the increment. If no increment is given, its value is assumed to be one.
 +
The variable loopvariable is defined inside the foreach loop block only, and its value can be read but not assigned. Example.
 +
 
 +
foreach i in [0 to listsize(mylist)-1]{
 +
    print(mylist[i]);
 +
}
 +
 
 +
this code snippet will print the value of all objects in the list "mylist".
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
== Extension Mechanisms ==
 +
 
 +
The language can be extended by adding functions. Adding a new function amounts to two steps:
 +
1) Creating a new java class implementing the AISLFunction class. This is more easily done by subclassing the AbstractAISLFunction class. See below for further details.
 +
2) Registering the function in the "Functions" class. This step will be removed in later released, which will implement automatic plugin-like registration
 +
 
 +
 
 +
The AISLFunction interface provides a way to specify a number of signatures (number and type of arguments and return type) for an AISL function. It is design to allow
 +
for overloaded functions. The number and types of the parameters are used to perform a number of static checks on the invocation of the function. The method evaluate
 +
provides the main code to evaluated during an invocation of the function in an AISL script. In the case of overloaded functions, theis method should
 +
redirect to appropriate methods based on the number and types of the arguments.
 +
 
 +
public interface AISLFunction {
 +
public String getName();
 +
 +
public void setFunctionDefinitions(FunctionDefinition ... defs);
 +
public FunctionDefinition[] getFunctionDefinitions();
 +
 
 +
public  Object evaluate(Object[] args) throws Exception;
 +
 +
public interface FunctionDefinition{
 +
Class<?>[] getArgTypes();
 +
Class<?> getReturnType();
 +
}
 +
 +
}
 +
 
 +
A partial implementation of the AISLFunction interface is provided by the AbstractAISLFunction class. A developer can simply extend this class and then provide an appropriate constructor and
 +
implement the appropriate evaluate method. An example is given below. The function match returns a boolean value according to the match of a string value with a given regular expression pattern. Its signature is
 +
thus:
 +
 
 +
boolean match(string str, string pattern)
 +
 
 +
the class Match.class below is an implementation of this function. In the constructor, the function declares its name and its parameters. The method evaluate(Object[] args), which must be implemented
 +
to comply with the interface AISLFunction, performs some casting of the parameters and then redirects the evaluation to another evaluate function (Note, in this case, as the function is not overloaded,
 +
there is no actual need for a separate evaluate method, here it has been added for clarity).
 +
 
 +
 
 +
public class Match extends AbstractAISLFunction{
 +
 +
public Match(){
 +
setName("match");
 +
setFunctionDefinitions(
 +
new FunctionDefinitionImpl(Boolean.class, String.class, String.class)
 +
);
 +
}
 +
 +
public Object evaluate(Object[] args) throws Exception{
 +
return evaluate((String)args[0], (String)args[1]);
 +
 
 +
}
 +
 +
private Boolean evaluate(String str, String pattern){
 +
return str.matches(pattern);
 +
}
 +
 
 +
}
 +
 
 +
 
 +
 
 +
Importers
 +
The Archive Import Service perform the import of external resources by representing them in a Graph of Resources and passing this graph to a chain of software modules called "importers".
 +
Each importer is responsible for treating specific kind of resource (e.g. metadata), and essentially is the bridge between the archive import service and the services of the Information Organization stack
 +
responsible for managing certain kind of internal resources (collections, metadata, documents etc). The precise way in which the importer performs the import is thus dependent on the specific subsystem the
 +
importer will interact with. Similarly, different importers will need to obtain different information about the resources to import. For instance, to import a document it is necessary to have its content
 +
or an url at which the content can be accessed. To create a metadata collection, it is necessary to specify some properties like the location of the schema of the objects contained in the collection.
 +
These values are passed to an importer by annotating objects in a graph of resource with appropriate properties. In order to constrain the kind of properties that the model objects it manipulates must have,
 +
an importer must define a set of subtypes of the model object types. For instance, the metadata importer (described below) defines a subtype for each basic type of the Resource Model types:
 +
 
 +
collection::metadata,
 +
resource::metadata and
 +
relationship::metadata
 +
 
 +
Each of these subtypes has specific properties that are understood, used and manipulated by that importer. The way subtyping is accomplished is described in more detail later in the section
 +
"writing new importers". The types defined by an importer and its properties must be publicly available, as AISL script developers must known which are the properties available for them and what is their
 +
semantics. Furthermore, subsequent importers in the chain may also need to access some properties. For example, an importer for metadata needs to access also model objects representing content to
 +
get their object id (internal identifier). Notice that in general an AISL script will not necessarily assign all properties defined by a subtype.
 +
Some of these properties may be conditionally needed, while come will only be written by an importer at import time. For example, when a new content object is imported, the importer must record into the GoR object the OID
 +
of the newly created object. For this reason, the specification of the subtypes defined by importers must also provide information about what properties are mandatory (i.e. they must be assigned during the creation
 +
of a GoR) and which properties are private (i.e. they should NOT be assigned during the creation of the GoR).
 +
 
 +
 
 +
Built-in importers
 +
The AIS comes already with the capability to import documents and metadata. This is provided by two importers called the content importer and the metadata importer. The types defined by these importers are described below:
 +
 
 +
 
 +
Content Importer
 +
This importer defines two subtypes. In particular, it defines a new collection type and a new resource type:
 +
collection::content
 +
resource::content
 +
 
 +
The properties of these subtypes, their type and semantics are as follows:
 +
 
 +
[collection::content]
 +
collectionName : string  : mandatory -  The name of the collection.
 +
isUser        : boolean : mandatory -  Denotes if a collection is or not a user collection
 +
collectionId  : string  : private  -  The id assigned to the collection to the collection management service
 +
 
 +
 
 +
[resource::content]
 +
isVirtualImport        : boolean : mandatory
 +
documentName          : string  : mandatory
 +
hasMaterializedContent : boolean : mandatory
 +
contentSourceLocator  : string  :
 +
content                : file    :
 +
documentId            : string  : private  -  The id assigned to the collection by the storage management service
 +
 
 +
Note: the fields contentSourcelocator and content are alternative. They depend on the value of the field hasMaterializedContent
 +
 
 +
 
 +
Metadata Importer
 +
This importer defines three subtypes, one for each basic construct in the Resource Model. They are:
 +
 
 +
collection::metadata
 +
resource::metadata
 +
relationship::metadata
 +
 
 +
 
 +
 
 +
[collection::metadata]
 +
relatedContentCollection: collection : mandatory - The content collection containing the objects to which this metadata collection refers
 +
collectionName : string    : mandatory - The name of the collection
 +
collectionDescription : string    : mandatory - A description of the collection
 +
isUser : boolean    : mandatory - Indicates wheter this is a user collection
 +
isIndexable : boolean    : mandatory - Indicates wheter this collection is indexable
 +
metadataName : string    : mandatory - Name of the metadata schema in this collection
 +
metadataLanguage : string    : mandatory - Language of the metadata in this collection
 +
metadataSchemaURI : string    : mandatory - URI of the schema of the metadata in this collection
 +
collectionId : string    : private -  The id assigned to the metadata collection during the import
 +
 
 +
 
 +
[resource::metadata]
 +
content                : string    : mandatory  - the content of this metadata object
 +
objectID                : string    : private    - the id assigned to the metadata object during the import
 
   
 
   
The client interface to the AIS is based on a scripting language, called AISL (Archive Import Service Language). The language can be used directly to submit tasks to the service, but also to create parametric task-managers that are in charge of managing well defined tasks that only require a fixed number of parameters. The language is designed to create a representation graph, according to the model described above, in a user-friendly way.
 
  
 +
[relationship::metadata]
 +
This subtype does not defined any property. It denotes an edge from a metadata resource object to a content resource object
  
'''AISL'''
 
AISL is an interpreted scripting language with XML-based syntax. As most programming languages, AISL supports various flow-control structures, allows to manipulate variables and evaluate various kinds of expressions. However, the goal of an AISL script is not that of performing arbitrary computations, but to create a graph of resources. Representation objects (collections, objects and relationships) are first class entities in the language, that provides constructs to build them and assign them properties. Representation objects may resemble objects in oo-languages, in that their properties can be accessed as fields and assigned values, and a reference to representation-objects themselved can be stored in variables. A fundamental difference is that, once created, representation objects are never destroyed, even when the control-flow exits the scope in which they were created.
 
  
Rather than describing in detail the syntax of the language, for which we refer to the documentation of the AIS, we provide here an example of script to define and import a collection.
 
  
<pre>
 
<?xml version="1.0" ?>
 
<program>
 
<createCollection type="content" name="contentCollection">
 
<property name="collectionName" expr="'TestContentCollection'"/>
 
<property name="isVirtual" expr="false"/>
 
<property name="isUser" expr="true"/>
 
[...]
 
</createCollection>
 
  
<createCollection type="metadata" name="metadataCollection">
 
<property name="collectionName" expr="'TestMetadataCollection'"/>
 
<property name="collectionDescription" expr="'A test collection'"/>
 
[...]
 
<property name="relatedContentCollection" expr="$contentCollection"/>
 
</createCollection>
 
  
<define name=”file” expr=”first(file(http:some.web.site/docs/metadata.xml))”/>
 
<foreach name="a" in="xpath(dom($file),'//record')">
 
<createObject type="content" name="contentObject" collections="contentCollection">
 
<property name="documentName" expr="'testDocument'"/>
 
<property name="content"
 
            expr="file(string(first(xpath:($a,'//uri/text()'))))"/>
 
<property name="hasMaterializedContent" expr="false"/>
 
<property name="isVirtualImport" expr="false"/>
 
</createObject>
 
  
  
<createObject type="metadata" name="metadataObject"
 
collections="metadataCollection">
 
<property name="content" expr="string($a)"/>
 
[...]
 
</createObject>
 
<createRelationship type="metadata" from="metadataObject"
 
to="contentObject" name="rel"/>
 
</foreach>
 
</program>
 
</pre>
 
  
Here, the instructions createCollection, createObject and createRelationship define respectively collections and objects and assign them properties whose value are either constants and obtained by evaluating expressions. When a representation  construct is created this way, it is possible to specify 
 
a name for a variable that is initialized to its value and can be used (while in scope) to refer to the construct, for instance to pass it to other constructs (as in the case of creating a relationship, that requires a reference to two objects) or to alter its properties. The define instruction defines a variable an initializes it to the result of the evaluation of the file function, which returns a remote file (see below for more details on this). The foreach instruction executes a bounded cycle over a list of values resulting from the evaluation of an expression, which in this case is a list of DOM nodes obtained by evaluating an XPath expression over a document. Another Xpath expression is evaluated inside the cycle, this time over a DOM fragment whose value is stored in a variable.
 
  
The script downloads an XML file at a given URI, parses it, and for each <record> element present in it:
 
a)Creates a content-object whose content is the file at the address stored as a sub-element <uri> of the <record element>
 
b)Creates a metadata object whose content is the <record> element.
 
c)Associates them with a relationship
 
d)Stores the objects in appropriate collections.
 
  
  
AISL is not typed, i.e. variables can be assigned values of any kind, and is the responsibility of the programmer to ensure proper type concordance. Basically, AISL does not even define its own type system, but exploits the type system of the underlying Java language. The language offers constructs for building objects of some basic types, like strings, sets, files, and integers. However, expressions in AISL can return any Java object. Specific functions can be thus used to build objects of given types in a controlled way. For instance, the built-in "dom()" function accepts as input a file object and produces a DOM document object. Similarly, the "xpath" function takes a DOM object and an expression, and returns the result of the evaluation of an xpath over the DOM node object as a set of DOM Node objects. In general, thus, types cannot be directly manipulated from the language, except for a few cases, but the variables of the language can be assigned with any java type, and objects of given type can be built using functions. This means that addon-functions can produce as result objects of user-defined types. These objects can be stored in representation objects as properties, and later used by specific addon-importers. However, AISL itslef provides a few special data-types, not present in the standard java libraries. In particular, the type
 
AISLFile is designed to work in combination with some AISL built-in functions and to optimize the management of files during import, in particolar with regard to memory and disk storage resources consumption. An AISLFile object encapsulates information about a given file, such as file length, and allow access to its content. However, the file content is not necessarily stored locally. If the file is obtained through the file() function, then the download of its actual content is deferred, and only performed when needed. When describing a large number of resources, as in the case of large collections, it is not feasible (or anyway not efficient) to store locally to the archive import service all contents that need to be imported. This is especially true for content that has to be imported without having to be processed in any way before the import itself. Even for those files that might need some processing (for instance for extracting information), it might be desirable for the AISL script-writer to be able to import the file, use for the time he needs it (e.g. pass it to the xslt() function) and then free the memory resources used to maintain it, without having to deal directly with the details. The AISL file offers a transparent mechanism to handle access to the content of remote files accessible through a variety of protocols,
 
by having a placeholder (an AISFile object) that can be treated as a file. Internally, an AISLFile implements a caching strategy for documents whose content has to be accessed at resource-description time. For files whose contents have to be handled only at import time, it offers a way to encapsulate easily inside a representation object all the information needed to pass to other service like the storage management or the content management service.
 
  
  
'''Functions'''
 
To encapsulate complex operations, AISL provides functions, that work as in most programming languages. A number of built-in functions is already defined inside the language. These functions cover common needs arising during the definition of resource graphs, and are detailed below. The language can be expanded by adding new functions. Furthermore, AISL provides extensible-functions, meaning that their functionality can be extended to handle special kinds of arguments. For instance, the file() function allows to retrieve a list of files given a set of arguments that define their location. The built-in function is already able to deal with a number of protocols, including http, ftp and gridftp. However, the function can be extended to handle additional protocols, as explained in more thorough details later on. The motivation behind extensible functions is to keep the syntax of AISL as lean and transparent for the user as possible.
 
  
<table>
 
<tr>
 
<td>
 
'''Built-in functions:'''
 
</td>
 
</tr>
 
<tr>
 
<td>
 
first(),
 
get(index),
 
add(index, object),
 
remove(index),
 
size()
 
</td>
 
<td>
 
These functions all take as argument an object of type List and perfom some
 
operation on its elements, like getting the first object in the list, adding or removing an object, returning the size of the set.
 
</td>
 
</tr>
 
<tr>
 
<td>
 
dom()
 
</td>
 
<td>
 
this function takes as input an AISFile object and returns a DOM document obtained by parsing the file (or null if the parsing fail, e.g. if the file is not a valid XML file).
 
</td>
 
</tr>
 
<tr>
 
<td>
 
'''Built-in extensible functions:'''
 
</td>
 
</tr>
 
<tr>
 
<td>
 
file()
 
</td>
 
<td>
 
Accepts as input a string used to describe the location of some file, and returns a set of AISFile objects. There are no constraints set on the format of the string, which might be a URL or some other kind of identifier. Internally, the file function is able to deal with a number of communication protocols, including http, ftp, and gridftp.
 
</td>
 
</tr>
 
<tr>
 
<td>
 
string()
 
</td>
 
<td>
 
This function provides a way to perform a custom string serialization of objects. It accepts as input an object of any type and returns an object of type String. If a suitable extension is provided for it for a given Java type, then it will be used to process tat type of object. Otherwise, the fuction will just return the result of invoking the java toString() function on its argument. As built-in functionality, the string() function recognizes DOM objects and provides for them a serialization
 
as XML document.
 
</td>
 
</tr>
 
</table>
 
  
== Extensibility features ==
+
 
 +
== Appendix - Complete AISL Grammar in EBNF ==
 
   
 
   
The entire architecture of the AIS is design to offer an high degree of extensibility at various levels. All these mechanisms are based on a plug-in style approach: the service doesn't have to be recompiled, redeployed or even stopped whenever an extension is added to it.
 
  
At the level of the interface with the service, "adapters" can be incorporated in the service. In the simplest case, such adapters will just offer a simplified interface to certain functionality that has to be invoked often without many variations. For instance, in the case when the files to import are stored at certain directory accessible through ftp, or in the case when a single file available at some location describes the location of files and related metadata, it would be desirable to invoke the AIS with the bare number of parameters needed to perform the task, rather than having to write an AISL script from scratch.
+
Program ::= ( Instruction )*
  
The AISL language itself provides extension mechanisms. New functions can be defined for the language, and existing extensible functions can be extended to treat a larger number of argument types. The representation model used to describe resources is fully flexible: new types can be defined for collections, objects and relationships, and all these representation constructs can be attached with arbitrary properties of arbitrary types.
+
Instruction ::= ( VariableDeclaration ";" | Statement )
Regarding the importing phase, new objects types can be handled by definining pluggable importers. These importers will be invoked together with all other importers available to the AIS service, and handle specific types of representation objects. It is to be observed that the definition of new representation constructs and that of new importers are not disjoint activities. In order for new object types to have meaning, there must
+
be an appropriate importer which is able to handle them (otherwise they will just be ignored). In order for an importer to work with a specific
+
kind of object type, the importer must be aware of its interface, i.e. its properties. While properties attached to representation objects can be always accessed by name, in order to make easier the development of importers it is possible to define new representation constructs so that their properties can be accessed via getters and setters. During the description phase, the AISL interpreter will recognize if a new type is connected to a subclass of the related representation construct and build an object of the most specific class, so that later on the importers can work directly on objects of the types they recognize.
+
  
 +
Statement ::= StatementExpression ";"  |  SwitchStatement  |  IfStatement  |  ForeachStatement
  
'''Plugin mechanism'''
+
SwitchStatement ::= "switch" "(" Expression ")" "{" ( SwitchBlock )* "}"
In order to support these extensibility mechanism, the AIS exploits a simple plugin pattern. The modules (Java classes) that have to be made available must extend appropriate interfaces/classes defined inside the AIS framework, and be defined in specific Java packages. To be compiled properly, these classes must of course be linked against the rest of the code of the AIS.
+
After compilation, the resulting .class files can be made available to an AIS instance by putting them into a special folder under the control of the AIS instance. The classes will be dynamically loaded and used (partially using the Java reflection facilities).
+
  
In general, The main interfaces/classes that can be extended for producing  plugins are the following:
+
SwitchBlock ::= ( "case" Expression ":" ( Instruction )* "break;" | "default" ":" ( Instruction )* "break;" )
  
To extend the AISL language:
+
IfStatement ::= "if" "(" Expression ")" IfBlock ( "else" ElseBlock )?
AbstractRule
+
CompositeRule
+
Function
+
Processor
+
  
To extend the representation model:
+
ElseBlock ::= "{" ( Instruction )* "}"
RepresentationObject
+
RepresentationObjectCollection
+
Relationship
+
  
To define a new Importer:
+
IfBlock ::= "{" ( Instruction )* "}"
Importer
+
  
For a full description of these interfaces/classes, especially regarding the internal contracts/policies that the subclasses classes must follow, we refer the reader to the documentation of the AIS.
+
ForeachStatement ::= "foreach" <IDENTIFIER> "in" ( Expression | ForRange ) ForBlock
  
 +
ForRange ::= "[" Expression "to" Expression ( "," Expression )? "]"
  
 +
ForBlock ::= "{" ( Instruction )* "}"
  
== Support for incremental import ==
+
VariableDeclaration ::= Type VariableDeclarator ( "," VariableDeclarator )*
  
The archive import service must fully support incremental import. With the term incremental import, we denote the fact that if the same collection is imported twice, the import task should only perform the actions needed to update the collection with the changes occurred within the two imports, and not re-import the entire collection. Incremental import requires two features. First, it must be possible to specify that a collection is the same an another collection already imported.
+
VariableDeclarator ::= <IDENTIFIER> ( "=" Expression )?
For the existing importers, this is achieved by specifying inside the description of a collection the unique identifier of the collection in the related service. For instance, for content collections, this would be the collectionID attached to each collection by the Collection Management Service. After the description of the new collection has been created, the service will compare this description with that resulting from the previous import, and decide which objects must be imported, which must be substituted, and which must be deleted from the collection.
+
  
When comparing two collection, the import service must know how to decide wheter two objects present inside a collection are actually the same object. In order to support this behaviour, the AIS must support two concepts, that of external-object-identity and that of content-equality.
+
Type ::= BuiltinType
  
External object identity is an external object-identifier. Two objects are considered to be the same if and only if they have the same external identifier. Notice that the external identifier is distinct from the internal-object identity that is used by services to distinguish between objects. Of course, the AIS must ensure a correspondence between internal and external identifiers. Thus, if an object with a given external identifier has been stored with a given internal identifier, then another object is imported with the same external identifier then the AIS will not create a new object, but
+
BuiltinType ::= ( "boolean" | "int" | "float" | "list" ( "<" Type ">" )? | "file" | "string" | "collection" | "resource" | "relationship" )
(eventually) update the contents/properties of the existing object.
+
  
External Identity is sufficient to decide whether an object to be imported already exists in a given collection. If an object already exists, and it has changed, then it will be updated. Deciding whether an object has changed also requires additional knowledge. For many properties attached to an object, the comparison is straightforward. However, for the actual content of an object this is less immediate. The content of an object (that is a file) might reside at the same location, but differ from that previously imported. Furthermore, comparing the old content with the new content
+
StatementExpression ::= PrimaryExpression ( "=" Expression )?
is an expensive task: it requires at least to fetch the entire content of the new object and thus is as expensive as just reimporting the content itslef. For this reason the AIS also supports the concept of content-identity.
+
A content-identifier can be specified. If two identical objects (i.e. having the same external-object-identity) have the same content-identifier, than the content is not reimported. If, on the other hand, the identifier differs,
+
then the new content is re-imported. A content identifier can be a signature of the actual content, or any other string.
+
  
If a content identifier is not specified, the AIS the content-identifier is obtained exploiting the following information location of the content, size of
+
PrimaryExpression ::= ( Literal | Function | Variable | Constructor ) ( Selection )*
the content (when available, depends on the particular protocol used), date of last modification of the content (when available, depends on the protocol used), an hash signature of the content
+
(when available from the server, depends on the protocol used. For instance, it is possible to get an MD5 signature of a file from an HTTP server).
+
  
 +
Literal ::= ( <INTEGER_LITERAL> | <FLOATING_POINT_LITERAL> | <STRING_LITERAL> | BooleanLiteral )
  
 +
BooleanLiteral ::= ( "true" | "false" )
  
== Service Operations ==
+
Variable ::= Name
  
The operations of the archive Import Service Are described in the following:
+
Name ::= <IDENTIFIER>
+
 
<table>
+
Function ::= Name Arguments
<tr>
+
 
<td>
+
Arguments ::= "(" ( Expression )? ( "," Expression )* ")"
'''Signature'''
+
 
</td>
+
Constructor ::= ModelObjectConstructor  |  ListConstructor
<td>
+
 
'''Description'''
+
ModelObjectConstructor ::= CollectionConstructor |  ResourceConstructor    | RelationshipConstructor
</td>
+
 
</tr>
+
CollectionConstructor ::= "collection" "::" <IDENTIFIER> "[" Expression "]" "{" ( PropertyAssignment ( "," PropertyAssignment )* )? "}"
<tr>
+
 
<td>
+
ResourceConstructor ::= "resource" "::" <IDENTIFIER> "[" Expression "]" "in" CollectionsList "{" ( PropertyAssignment ( "," PropertyAssignment )* )? "}"
ImportTaskResponse  submitImportTask(String AISLScript)
+
 
</td>
+
CollectionsList ::= Expression ( "," Expression )*
<td>
+
 
This operation allows to submit an import task defined via the AISL scripting Lanaguage.
+
RelationshipConstructor ::= "relationship" "::" <IDENTIFIER> "(" Expression "," Expression ")" "[" Expression "]" "{" ( PropertyAssignment ( "," PropertyAssignment )* )? "}"
</td>
+
 
</tr>
+
PropertyAssignment ::= <IDENTIFIER> "=" Expression
<tr>
+
 
<td>
+
ListConstructor ::= "{" ( Expression )? ( "," Expression )* "}"
ImportTaskResponse submitAdapterTask(String adapter, String[] adapterParameters)
+
 
</td>
+
Selection ::= PropertySelection  |  ElementSelection
<td>
+
 
This operation allows to submit an import task to a specific adapter, which must have been previously registered as plugin. The parameters needed by the adapter to perform the task are specified via an array of string parameters. The number and semantics of these parameters are not defined in advance, and may differ for each single adapter.
+
PropertySelection ::= "." <IDENTIFIER>
</td>
+
 
</tr>
+
ElementSelection ::= "[" Expression "]"
<tr>
+
 
<td>
+
Expression ::= OrExpression
RegisterPluginResponse registerPlugin(String PluginURL)
+
 
</td>
+
OrExpression ::= ExclusiveOrExpression ( "|" ExclusiveOrExpression )*
<td>
+
 
This operation allows to remotely load a new plugin component into the AIS. The plugin can be loaded by specifiying an URL at which the compiled java class or jar file can be downloaded.
+
ExclusiveOrExpression ::= AndExpression ( "^" AndExpression )*
</td>
+
 
</tr>
+
AndExpression ::= EqualityExpression ( "&" EqualityExpression )*
</table>
+
 
 +
EqualityExpression ::= RelationalExpression ( ( "==" | "!=" ) RelationalExpression )*
 +
 
 +
RelationalExpression ::= AdditiveExpression ( ( "<" | ">" | "<=" | ">=" ) AdditiveExpression )*
 +
 
 +
AdditiveExpression ::= MultiplicativeExpression ( ( "+" | "-" ) MultiplicativeExpression )*
 +
 
 +
MultiplicativeExpression ::= UnaryExpression ( ( "*" | "/" | "%" ) UnaryExpression )*
 +
 
 +
UnaryExpression ::= ( ( "+" | "-" | "!" ) PrimaryExpression | PrimaryExpression )

Revision as of 21:40, 14 October 2008

Introduction

The role of the AISL Service is to import resources which are external to the gCube infrastructure into the infrastructure itself. "Importing", here and in the following, refers to the description of such resources inside the Information Organization stack of services, and not necessarily to the fact that the content of such resources is actually stored within facilities that belong to the infrastructure. Similarly, kind of resources that can be imported are not necessarily objects that exist physically. The association between an image and a file containing metadata referring to it might not exist physically, but can still be considered a resource that can be imported. The same holds for a collection of images. The word "external resource" is used to denote any resource outside the gCube infrastructure (independing of the fact that it has been imported already or not). The world "internal resource" is used to denote the entity that represents an external resource inside the gCube infrastructure. This is normally an object or a relationship of the info-object model, and is identified, inside the gCube infrastructure, by an Object Identifier.


The service should support the import by providing a way to specify which resources should be imported and how, and offer facilities to automate this task whenever possible.



Import procedure

The task of importing a set of external resources is articulated in two major steps. First, a description of the resources to import is built. This description is based on a custom data model, which is described later on in this document. The resulting description is essentially a graph labelled on nodes and arcs with key-value pairs. This description is called a graph of resources (GoR). Second, the GoR built during the first phase is processed by a chain of software modules called "importers". Each importer is dedicated to import resources interacting with a specific subpart of the Information Organization set of Services. For instance, the metadata importer is responsible for the import of metadata, and it handles their import by interacting with the Metadata Management Service. The precise way in which importers handle the import task, and in particular how they define a specific description of the resources they need to consume inside a GoR is left to the single importers. Details about the importers which are already included in the Archive Import Service are given below in this document. The creation of a graph of resources is done trhough the execution of a script written in the AISL language, which is described later on in this document.



Data Model

The data model handled by AISL features three main types of constructs:

- Collection - Resource - Relationship

A set of objects of these three main types, built by AISL script, form a so-called Graph of Resources (GoR). A graph of resources is a graph composed by nodes (resources) and edges (relationships). Furthermore, The collection construct allows to group nodes (resources) into sets. All constructs of the model can be annotated with properties, which are name/value pairs.

The three main types of constructs cannot be instantiated directly. Instead, objects of specific subtypes of these constructs must be instantiated. These subtypes are defined by specific plugins called importers that manage the import of different kinds of resources inside the gCube infrastructure. The precise semantics of the properties attached to types and the precise use of the constructs is not fixed in advance, but is determined by the specific importer that defines and manage a specific subtype. In particular, there is no direct correspondance between constructs in the GoR and how the resources are represented inside the gCube Information Organization facilities.


Beside being annoted with properties, each construct of the model must be assigned an external identifier. An external identifier is a string that uniquely identifies a certain external resource, and the model object that refers to it. This identification must hold across multiple, different invocations of the AIS.

For instance, consider the two files: 1) http://mydomain.org/myimage.jpg 2) http://myotherdomain.org/mymetadata.jpg

representing respectively an image and a file of metadata describing it. Here, we have three external resources to import: the files 1) and 2) and the association between the two. This set of resources can be represented by instantiating two model objects of type "resource", having specific type respectively equal to "content" and "metadata", and and a model object of type "relationship" and subtype "metadata". Furthermore, the GoR should contain two model objects of type collection and subtype respectively "content" and "metadata" which will contain the objects referring to resources 1) and 2).

When all these resources are created, they must be assigned an external identifier, which specifies their identity in the real worls. So, for instance, it would be possible to choose the string "http://mydomain.org/myimage.jpg" to identify the resource 1). The first time an import task is run that specifies this gor, the AIS will create an internal resource referring to the external resource 1). This internal resource will receive an (internal) object identifier. If another import task is executed, and another resource with external identifier "http://mydomain.org/myimage.jpg" is created, the AIS will treat this as the same resource, and so it will not create another internal resource but modify, if needed, the one already created.


The AISL Language

AISL is a scripting language intended to create graphs of resources for subsequent import. Its main features are a tight integration with the AIS, in the sense that the creation of model objects are first citizens in the language, and the ability to treat in a way as much as possibile transparent to the user some tasks which are frequent during import, like accessing files at remote locations. Beside avoiding the complexity of full fledged programming languages like Java, its limited expressivity - tailored to import tasks only - prevents security issues related to the execution of code in remote systems. The language does not allow the definition of new functions/types of objects inside a program, but can be easily extended with new functionality by defining new functions as plugin modules. This feature is detailed later on in this document.


Type System The language is currently non typed. Variables can be assigned with any kind of object from the Java type system. However, it is planned to enforce at least a partial static type checking in the future, to allow early detection of errors in the scripts. For this reason, the grammar already requires variables to be declared before their use. Even though the language process does not currently check for type compliance, it is strongly suggested that implementors try to use the appropriate types, in order to reduce the effort of reconverting scripts later on. The types supported by the language(i.e. for which the language allows an explicit variable declaration) are:

Primitive types: integer float boolean string list file

Model object types: collection resource relationship

Notice that AISL is not an object oriented language. Even if some of the types correspond to java types, there are no methods nor fields. Access to properties of objects is instead possible thorugh appropriate functions. So, for example, the size of a list value can be obtained by invoking the function listsize() on it. However, model object types have "properties" that resemble fields in object oreinted programming languages and that can be accessed through the familiar dot (".") notation.

Values of the primitive types are treated as in the Java language. As in Java, strings can be created by supplying a value between double quotes, and can be concatenated using the "+" operator. Values of the "list" type are similar to lists in the java language, but array-like selection of their elements is supported, and the language provides a built in constructor for lists (se below at the section type constructors). As in java, lists indexes are zero-based. Lists are currently heterogeneous (i.e. they can contain values of different types). In future releases, it is planned to provide facilities to enforce list-type checking. The "file" type represents local or remote external resources in a way that is independent of the specific protocol used to access these resources.

The types collection, resource and relationship correspond to the model object constructs introduced in the previous sections. Variables of these types are used to hold instances of objects of the resource graph data model.


Syntax This section describes briefly the syntax of AISL and semantics of AISL constructs, focusing especially on aspects in which the language differs from the Java programming language syntax, to which it is close. A formal description of the syntax can be found in the appendix following this document.

An AISL script is a sequence of instructions, which are either variable declarations, conditional statements, loop statements and some kind of expressions like variable assignments and function invocations. Values for the various types in the AISL can be built through appropriate type constructors. Values can then be manipulated and composed using expressions of various kind, which are in most cases similar to the corresponsing expressions in the Java programming language.


Type constructors

Primitive type constructors integer integer values are sequences of digits, starting with a non-zero digit, or a single '0' digit. In other words, they match the production

  1. DECIMAL_LITERAL: "0"|(["1"-"9"] (["0"-"9"])*)


float floating point values are expressed by a decimal decimal value eventually prefixed with an integer value: FLOATING_POINT_LITERAL: (["0"-"9"])+ "." (["0"-"9"])*


boolean the words true and false are reserved words in AISL, and they are interpreded as the corresponding boolean values


string string literals are sequences of characters between double quotes. Special characters like newline and tab are escaped and treted as in the Java programming language.


list lists are built by enclosing a list of expressions separated by commas into curly brackets. For example:

list myList = {3+4, 56, "a"};


file file objects are built by invoking the constructor functions getfile(string locator) and getFile(string locator, list<string> accessinformation) A locator is a string encoding a location and a protocol to be used to access the file. For instance the instruction:

file f= getFile("ftp://ftp.example.org/pub/share/myfile.xml);

builds a file object that accesses its content through the ftp protocol at the given location. The format of the locator string is not defined in advance, as it depnds on the specific protocol used. Currently supported protocols are ftp, http, file. They all accept an URL as locator. Different formats may be provided by different subtypes of the AISLFile class. This is described in more detail later on in this document. The two-arguments constructor allows to pass in login information that might be needed to access remote resources.


Model Object Constructors These constructors allow to create elements of the resource graph which is later used for import by the service. Once created, the properties of an object can be modified but the object itself cannot be deleted. In other words, it sufficient to invoke one of these constructors for the object to be in the final graph of resources. In general, all constructors impose to provide:

- the type of construct to be created (i.e. collection, resource or relationship) - the specific subtype of the object. This subtype should be defined in the context of a specific importer, by subclassing appropriately the class definining the basic construct. This allows to perform checks during the parsing of the script, e.g. on the properties of constructs. For example, the type collection::content is a subtype of the type collection that defines the properties collectionName, isVirtual and isUser. - a unique "external identifier". This string value uniquely identifies a certain construct, so that it can be recognized during subsequent import phases. - in the case of resources, it is possible to supply to the constructor a list of collections to which the resource must belong - in the case of relationships, the resources that the relationship links must be specified.

Furthermore, the body of the constructor allow to initialize one or more of the properties eventually defined by the construct. The names, types and precise semantics of these properties are described in the section about importers.


Examples of constructors are as follow:


collection metadatacollection = collection::metadata["medspiration_test_metadata"]{ collectionName = "medspiration_test_metadata", collectionDescription = "test for the AIS with medspiration data", relatedContentCollection=contentcollection, isUser=true, isIndexable=false, metadataName="dc", metadataLanguage="en", metadataSchemaURI="http://www.opendlib.com/resources/schemas/metadata_dc.xsd" };


This constructor defines a metadata collection. Here the type is "collection", the subtype is "metadata", the external identifier is given by a static string ("medspiration_test_metadata"). The body of the constructor initializes a number of properties specific of the "metadata" subtype. Notice that the object created by the constructor is then assigned to a variable of the appropriate type (collection).


resource::content[url] in ccoll{ isVirtualImport=false, contentSourceLocator=url, documentName= name, hasMaterializedContent = false };

This constructor defines a resource of type content. The external identifier here is a variable (url) that must evaluate to a string. Furthermore, the object is specified to belong to a specific collection again using a variable (ccoll) holding an instance of a (content) collection.


relationship rel= relationship::metadata(metadata, content)["metadatarel"+url]{};

this constructor specifies an relationship of subtype metadata. The couple of variables (metadata, content) specify the resources to and from which the relationship holds. The external identifier is computed as an expression evaluating to a string.


Expressions

Arithmetic Expressions numeric types (integer and float) can be combined using the same operators available in the Java programming language, i.e. the unary operators + and - and the binary operators +, -, /, * and %. These operators have the same precedence and semantics as in Java. If the operands of a binary operator have different type, the type of the result is always "float".

Relational Expressions The relational operators ==, !=, <, <=, >, >= have the same precedence and semantics as in java. They all evaluate to a boolean value and they can all be applied to numeric values. Furthermore, the operators == and != can be applied to all other types.

Boolean Expressions Boolean expressions are built from boolean values by applying the unary operator ! (not) and the binary operators | (or), & (and), ^ (exclusive or), whit the same precedence and semantics as in Java. Notice that differently from java AISL does not support the conditional boolean operators ||, && and ^^.

Selectors The elements of list-typed values can be obtained with the same syntax that in Java is used to access the elements of arrays. E.g.

list myList = {3+4, 56, "a"}; integer myInt = myList[1];

Lists can be nested, and selectors can be combined: list myList = {3+4, {45, 10}, "a"}; integer myInt = myList[1][0];


the properties of model object typed values can be accessed by name with a dot notation. e.g.


resource myContentResource = resource::content[url] in ccoll{ isVirtualImport=false, contentSourceLocator=url, documentName=name, hasMaterializedContent = false };

myContentResource.documentName="test";


Variable Declarations Variable Declarations contain a specification of an AISL type, a variable identifier and an optional initializer. E.g.

list myList = {3+4, 56, "a"};


Functions Function invocation in AISL is analogue to function invocation in Java, except that all function have global visibility and there are no objects or classes thorugh which invoke a function. An example is:

... string mystring= "test"; boolean matches= match(mystring, "t.*t"); print(matches); ...

This code snippet contains two function invocations, namely of the functions match and print (it prints "true"). AISL comes with a set of predefined functions, described below. New functions can be added to the language. This is described later on in this document.


Predefined AISL Functions

Functions on file These predefined functions provide access to the properties of objects of type file. string filename(file f) returns the name of the file integer filesize(file f) returns the size of the file boolean isdirectory(file f) returns true if this file is a directory boolean isfile(file f) returns true if this file is a regular file list<file> children(file f) returns a list containing a file object for each of the children of f. The list returned is empty if this file is a regular file (i.e. not a directory) or is the protocol used to access the file does not model a hierarchical fileystem. list<file> descendants(file f) returns a list containing a file object for each of the descendants of f, obtained by recursively exploring all subdirectores. Notice that the list contains all files in the subtree rooted at f, not only its leaves (i.e. it also contains all directories taht are descendants of f. The list returned is empty if this file is a regular file (i.e. not a directory) or is the protocol used to access the file does not model a hierarchical fileystem.


Functions on string boolean match(string str, string patter) returns true if the given string matches the given regular expression pattern, false otherwise. boolean print(string str) prints str.

Functions on list integer listsize(list l) returns the size of the list l

Functions on dom objects xpath() xslt() string toString(object o) converts a given dom object into a string (i.e. to its XML serialization).


Control Flow Statements AISL contains three control flow statements: if, switch and foreach. The major syntactic difference between these statements and the corresponding ones in the java language is that instructions inside the constructs must be enclosed in curly brackets (even when they contain a single instruction). Notice that these statements are not terminated by a ";" character. For the rest, if and switch statements are completely similar to their Java counterparts, while the foreach statement has a special syntax.

Conditional Statements

If statements This statement has the same syntax and semantics as in Java, and takes the two forms:

if( conditional expression ){ ... }

and

if( conditional expression ){ ... } else{ ... }


Switch Statement This statement has the same syntax and semantics as in java:

switch( expression ){ case expression1: ... break;

...

case expressionN: ... break;

default: ... break; }


Loop statements The AISL language tries to avoid as much as possible unbounded loops. For this reason it does not have a while statement and has a foreach statement that only allows bounded loops. In particular, foreach allows to iterate over a range of integer values, with a fixed increment (or decrement).


foreach loopvariable in [ expression to expression by expression]{ ... }

The three expressions appearing in the statement correspond to the minimum and maximum value of the range and to the increment. If no increment is given, its value is assumed to be one. The variable loopvariable is defined inside the foreach loop block only, and its value can be read but not assigned. Example.

foreach i in [0 to listsize(mylist)-1]{

    print(mylist[i]);

}

this code snippet will print the value of all objects in the list "mylist".




Extension Mechanisms

The language can be extended by adding functions. Adding a new function amounts to two steps: 1) Creating a new java class implementing the AISLFunction class. This is more easily done by subclassing the AbstractAISLFunction class. See below for further details. 2) Registering the function in the "Functions" class. This step will be removed in later released, which will implement automatic plugin-like registration


The AISLFunction interface provides a way to specify a number of signatures (number and type of arguments and return type) for an AISL function. It is design to allow for overloaded functions. The number and types of the parameters are used to perform a number of static checks on the invocation of the function. The method evaluate provides the main code to evaluated during an invocation of the function in an AISL script. In the case of overloaded functions, theis method should redirect to appropriate methods based on the number and types of the arguments.

public interface AISLFunction { public String getName();

public void setFunctionDefinitions(FunctionDefinition ... defs); public FunctionDefinition[] getFunctionDefinitions();

public Object evaluate(Object[] args) throws Exception;

public interface FunctionDefinition{ Class<?>[] getArgTypes(); Class<?> getReturnType(); }

}

A partial implementation of the AISLFunction interface is provided by the AbstractAISLFunction class. A developer can simply extend this class and then provide an appropriate constructor and implement the appropriate evaluate method. An example is given below. The function match returns a boolean value according to the match of a string value with a given regular expression pattern. Its signature is thus:

boolean match(string str, string pattern)

the class Match.class below is an implementation of this function. In the constructor, the function declares its name and its parameters. The method evaluate(Object[] args), which must be implemented to comply with the interface AISLFunction, performs some casting of the parameters and then redirects the evaluation to another evaluate function (Note, in this case, as the function is not overloaded, there is no actual need for a separate evaluate method, here it has been added for clarity).


public class Match extends AbstractAISLFunction{

public Match(){ setName("match"); setFunctionDefinitions( new FunctionDefinitionImpl(Boolean.class, String.class, String.class) ); }

public Object evaluate(Object[] args) throws Exception{ return evaluate((String)args[0], (String)args[1]);

}

private Boolean evaluate(String str, String pattern){ return str.matches(pattern); }

}


Importers The Archive Import Service perform the import of external resources by representing them in a Graph of Resources and passing this graph to a chain of software modules called "importers". Each importer is responsible for treating specific kind of resource (e.g. metadata), and essentially is the bridge between the archive import service and the services of the Information Organization stack responsible for managing certain kind of internal resources (collections, metadata, documents etc). The precise way in which the importer performs the import is thus dependent on the specific subsystem the importer will interact with. Similarly, different importers will need to obtain different information about the resources to import. For instance, to import a document it is necessary to have its content or an url at which the content can be accessed. To create a metadata collection, it is necessary to specify some properties like the location of the schema of the objects contained in the collection. These values are passed to an importer by annotating objects in a graph of resource with appropriate properties. In order to constrain the kind of properties that the model objects it manipulates must have, an importer must define a set of subtypes of the model object types. For instance, the metadata importer (described below) defines a subtype for each basic type of the Resource Model types:

collection::metadata, resource::metadata and relationship::metadata

Each of these subtypes has specific properties that are understood, used and manipulated by that importer. The way subtyping is accomplished is described in more detail later in the section "writing new importers". The types defined by an importer and its properties must be publicly available, as AISL script developers must known which are the properties available for them and what is their semantics. Furthermore, subsequent importers in the chain may also need to access some properties. For example, an importer for metadata needs to access also model objects representing content to get their object id (internal identifier). Notice that in general an AISL script will not necessarily assign all properties defined by a subtype. Some of these properties may be conditionally needed, while come will only be written by an importer at import time. For example, when a new content object is imported, the importer must record into the GoR object the OID of the newly created object. For this reason, the specification of the subtypes defined by importers must also provide information about what properties are mandatory (i.e. they must be assigned during the creation of a GoR) and which properties are private (i.e. they should NOT be assigned during the creation of the GoR).


Built-in importers The AIS comes already with the capability to import documents and metadata. This is provided by two importers called the content importer and the metadata importer. The types defined by these importers are described below:


Content Importer This importer defines two subtypes. In particular, it defines a new collection type and a new resource type: collection::content resource::content

The properties of these subtypes, their type and semantics are as follows:

[collection::content] collectionName : string  : mandatory - The name of the collection. isUser : boolean : mandatory - Denotes if a collection is or not a user collection collectionId : string  : private - The id assigned to the collection to the collection management service


[resource::content] isVirtualImport  : boolean : mandatory documentName  : string  : mandatory hasMaterializedContent : boolean : mandatory contentSourceLocator  : string  : content  : file  : documentId  : string  : private - The id assigned to the collection by the storage management service

Note: the fields contentSourcelocator and content are alternative. They depend on the value of the field hasMaterializedContent


Metadata Importer This importer defines three subtypes, one for each basic construct in the Resource Model. They are:

collection::metadata resource::metadata relationship::metadata


[collection::metadata] relatedContentCollection: collection : mandatory - The content collection containing the objects to which this metadata collection refers collectionName : string  : mandatory - The name of the collection collectionDescription : string  : mandatory - A description of the collection isUser : boolean  : mandatory - Indicates wheter this is a user collection isIndexable : boolean  : mandatory - Indicates wheter this collection is indexable metadataName : string  : mandatory - Name of the metadata schema in this collection metadataLanguage : string  : mandatory - Language of the metadata in this collection metadataSchemaURI : string  : mandatory - URI of the schema of the metadata in this collection collectionId : string  : private - The id assigned to the metadata collection during the import


[resource::metadata] content  : string  : mandatory - the content of this metadata object objectID  : string  : private - the id assigned to the metadata object during the import


[relationship::metadata] This subtype does not defined any property. It denotes an edge from a metadata resource object to a content resource object









Appendix - Complete AISL Grammar in EBNF

Program ::= ( Instruction )*

Instruction ::= ( VariableDeclaration ";" | Statement )

Statement ::= StatementExpression ";" | SwitchStatement | IfStatement | ForeachStatement

SwitchStatement ::= "switch" "(" Expression ")" "{" ( SwitchBlock )* "}"

SwitchBlock ::= ( "case" Expression ":" ( Instruction )* "break;" | "default" ":" ( Instruction )* "break;" )

IfStatement ::= "if" "(" Expression ")" IfBlock ( "else" ElseBlock )?

ElseBlock ::= "{" ( Instruction )* "}"

IfBlock ::= "{" ( Instruction )* "}"

ForeachStatement ::= "foreach" <IDENTIFIER> "in" ( Expression | ForRange ) ForBlock

ForRange ::= "[" Expression "to" Expression ( "," Expression )? "]"

ForBlock ::= "{" ( Instruction )* "}"

VariableDeclaration ::= Type VariableDeclarator ( "," VariableDeclarator )*

VariableDeclarator ::= <IDENTIFIER> ( "=" Expression )?

Type ::= BuiltinType

BuiltinType ::= ( "boolean" | "int" | "float" | "list" ( "<" Type ">" )? | "file" | "string" | "collection" | "resource" | "relationship" )

StatementExpression ::= PrimaryExpression ( "=" Expression )?

PrimaryExpression ::= ( Literal | Function | Variable | Constructor ) ( Selection )*

Literal ::= ( <INTEGER_LITERAL> | <FLOATING_POINT_LITERAL> | <STRING_LITERAL> | BooleanLiteral )

BooleanLiteral ::= ( "true" | "false" )

Variable ::= Name

Name ::= <IDENTIFIER>

Function ::= Name Arguments

Arguments ::= "(" ( Expression )? ( "," Expression )* ")"

Constructor ::= ModelObjectConstructor | ListConstructor

ModelObjectConstructor ::= CollectionConstructor | ResourceConstructor | RelationshipConstructor

CollectionConstructor ::= "collection" "::" <IDENTIFIER> "[" Expression "]" "{" ( PropertyAssignment ( "," PropertyAssignment )* )? "}"

ResourceConstructor ::= "resource" "::" <IDENTIFIER> "[" Expression "]" "in" CollectionsList "{" ( PropertyAssignment ( "," PropertyAssignment )* )? "}"

CollectionsList ::= Expression ( "," Expression )*

RelationshipConstructor ::= "relationship" "::" <IDENTIFIER> "(" Expression "," Expression ")" "[" Expression "]" "{" ( PropertyAssignment ( "," PropertyAssignment )* )? "}"

PropertyAssignment ::= <IDENTIFIER> "=" Expression

ListConstructor ::= "{" ( Expression )? ( "," Expression )* "}"

Selection ::= PropertySelection | ElementSelection

PropertySelection ::= "." <IDENTIFIER>

ElementSelection ::= "[" Expression "]"

Expression ::= OrExpression

OrExpression ::= ExclusiveOrExpression ( "|" ExclusiveOrExpression )*

ExclusiveOrExpression ::= AndExpression ( "^" AndExpression )*

AndExpression ::= EqualityExpression ( "&" EqualityExpression )*

EqualityExpression ::= RelationalExpression ( ( "==" | "!=" ) RelationalExpression )*

RelationalExpression ::= AdditiveExpression ( ( "<" | ">" | "<=" | ">=" ) AdditiveExpression )*

AdditiveExpression ::= MultiplicativeExpression ( ( "+" | "-" ) MultiplicativeExpression )*

MultiplicativeExpression ::= UnaryExpression ( ( "*" | "/" | "%" ) UnaryExpression )*

UnaryExpression ::= ( ( "+" | "-" | "!" ) PrimaryExpression | PrimaryExpression )