Difference between revisions of "Common-accounting-model ABANDONED"

From Gcube Wiki
Jump to: navigation, search
(Data-access)
 
(36 intermediate revisions by 5 users not shown)
Line 1: Line 1:
 +
[[Category:TO BE REMOVED]]
 +
 
== Scope ==
 
== Scope ==
 
This library contains the definition of the resource accounting record.
 
This library contains the definition of the resource accounting record.
 +
 +
Find <code>common-accounting-model</code> on [http://maven.research-infrastructures.eu/nexus/index.html#nexus-search;quick~common-accounting-model Nexus Repository Browser] for an artifact with the following coordinates:
 +
 +
<source lang="xml">
 +
<dependency>
 +
<groupId>org.gcube.accounting</groupId>
 +
<artifactId>common-accounting-model</artifactId>
 +
</dependency>
 +
</source>
  
 
== Data-model ==
 
== Data-model ==
The structure of a generic accounting record (Usage Record, UR) will be composed of a set of common fields for all resource types, in particular:
+
The structure of a generic accounting record (Usage Record, UR) is composed of a set of common and mandatory fields for all resource types, in particular:
  
 
* id : an unique identifier for the UR
 
* id : an unique identifier for the UR
* consumerId : the user actually consuming the resource (optional, for future purposes)
+
* consumerId : the user (or the Consumer Identity, that in the S2S communication is another service) actually consuming the resource
 
* createTime : when the UR was created
 
* createTime : when the UR was created
 
* startTime, endTime : the time window the UR refers to
 
* startTime, endTime : the time window the UR refers to
* resourceType : the type of resource the UR tracks
+
* resourceType : the type of resource the UR tracks: i.e. Job, Task, Service, StorageUsage, StorageStatus
* scope : the scope of the resource
+
* resourceScope : the scope of the resource
 
* resourceOwner : who owns the resource and/or who creates the UR
 
* resourceOwner : who owns the resource and/or who creates the UR
  
Line 16: Line 27:
  
 
== Resource Types ==
 
== Resource Types ==
The resource types we've identified are: Execution, Service, Data-access and Storage.
+
The resource types we've identified are: Execution, Service, StorageUsage, StorageStatus and Portlet.
  
 
=== Execution ===
 
=== Execution ===
Regarding the Execution resource type, there are two sub-types, according to the PE2ng's structure which is composed by two main layers. There is the Workflow layer that is more abstract, constructing workflow plans, supporting various adaptors and is aware of jobs as a whole. There is also the Execution layer, also a Service, where the actual execution takes place and is aware of more detailed stuff.
+
This specialization will be used to take into account information about services running jobs on the infrastructure (Workflow Engine, Execution Engine, Statistical Manager, Aquamaps).
  
Discriminating those layers:
+
For this resource type, there are two sub-types:
* Workflow layer is aware of:
+
Number of jobs submitted and adaptor that were used
+
Execution nodes that will be used (scale out) per job
+
  
* Execution layer is aware of:
+
==== Job ====
Statuses of execution jobs (success/fail/pending)
+
Contains the information about the overall job, that will be partitioned in N Tasks.
also GHN hosting node information of every execution node is available to Workflow, harvested through Registry, containing info such as location, cpu load (week, day, hour,...), memory, disk space etc.: the Workflow layer that is more abstract, constructing workflow plans, supporting various adaptors and is aware of jobs as a whole. There is also the Execution layer, also a Service, where the actual execution takes place and is aware of more detailed stuff.
+
  
==== Plan ====
+
Specific Job properties:
  
Specific Plan properties:
+
* jobId : an unique identifier for the job
 +
* jobQualifier : qualifies the job in terms of algorithm type or job type (e.g. search, data-transformation, etc)
 +
* jobName : name of the job
 +
* jobStart : the instant the job start running
 +
* jobEnd : the instant the job ends its execution
 +
* jobStatus: completed/failed
 +
* vmsUsed : number of the VMs (gHNs) used by the job.
 +
* wallDuration : duration between the instant the job start running and the instant the job ends its execution.
  
* cores : the number of a vm's cores that get occupied is based on either the process is multithreaded or not.
+
==== Task ====
* inputFilesNumber : this info could be extracted at workflow layer.
+
Contains the information about one slice of the overall Job.
* inputFilesSize : not know at workflow layer, before execution starts, as files are transferred from different sources. Available at execution layer.
+
* jobId, jobName, jobStart, jobEnd, jobStatus: This info could be extracted out of progress report of a job, or directly from every execution engine at execution layer.
+
* outputFilesNumber, outputFilesSize : same as input.
+
* overallNetworkIn, overallNetworkOut : depends on process demands.
+
* processors : number of processors used per job.
+
* wallDuration : duration between the instant the job started running and the instant the job ended its execution.
+
  
==== Execution Engine ====
+
Specific Task properties:
  
Specific Execution Engine properties:  
+
* jobId : reference to the Job that generated this Task
 
+
* refHost : hostname of the virtual machine (gHN)
* refHost : hostname of the vm
+
* refVM : virtual machine id (gHN)
* refVM : Execution Engine resource id or gHN id
+
* domain : domain of the virtual machine (gHN)
* usageStart : the earlier usage time of the Execution Engine
+
* usageStart : the earlier usage time of the Task
* usageEnd: the latest usage time of the Execution Engine
+
* usageEnd: the latest usage time of the Task
* usagePhase: Completed/Ready/Paused/Running/Cancel
+
* usagePhase: completed/failed
 +
* inputFilesNumber : number of input files to the Task
 +
* inputFilesSize : dimension of input files to the Task
 +
* outputFilesNumber : number of output files from the Task
 +
* outputFilesSize : dimension of output files from the Task
 +
* overallNetworkIn : overhead of the input traffic over the network to the Task
 +
* overallNetworkOut : overhead of the output traffic over the network from the Task
 +
* cores : number of cores per Task.
 +
* processors : number of processors per Task.
  
 
=== Service ===
 
=== Service ===
 +
This specialization will be used to take into account information about the services invocations.
  
 
Specific service attributes
 
Specific service attributes
  
* callerIP :  
+
* callerIP : IP address that originated the service call
* invocationCount :  
+
* callerScope : includes the service scope (for Service specialization the resourceScope field includes the infrastructure scope)
* averageInvocationTime :  
+
* refHost : hostname of the virtual machine (gHN)
* serviceClass :  
+
* refVM : virtual machine id (gHN)
* serviceName :
+
* domain : domain of the virtual machine (gHN)
 +
* invocationCount : number of invocations (aggregated information)
 +
* averageInvocationTime : average invocation time (aggregated information)
 +
* serviceClass : name of the service class
 +
* serviceName : name of the service
 +
 
 +
=== StorageUsage ===
 +
This specialization will be used to account the operations performed against different storage backends.
 +
 
 +
Specific storage usage attributes:
 +
* providerId: the identifier of the provider which is the target of a read/write operation
 +
* objectURI : the identifier of an information within the data source which is the target of a given read/write operation performed
 +
* operationType : the name of the read/write operation performed over a given source, i.e. GET, PUT, UPDATE, DELETE
 +
* qualifier : qualifies the data in terms of data (e.g. mime type for the Storage, domain for a database)
 +
* dataType : type of data accessed, i.e. STORAGE, TREE, GEO, DATABASE
 +
* dataVolume : quantity of data in terms of KB
 +
* dataCount : the number of objects within the data provider which are accessed/written.
 +
* callerIP : IP address that originated the service call (if appropriate)
  
=== Data-access ===
+
=== StorageStatus ===
 +
Identifies the exploitation (in terms of storage volume) of a storage resource type by an identifiable entity.
  
Specific Data-access properties:
+
Specific storage status attributes:
  
* sourceId :  
+
* providerId: the identifier of the provider which is the target of a read/write operation
* operation :  
+
* qualifier : qualifies the data in terms of data (e.g. mime type for the Storage, domain for a database)
* objectId :  
+
* dataType : type of data accessed, i.e. STORAGE, TREE, GEO, DATABASE
* numberOfObjects :
+
* dataVolume (Kbytes): quantity of data monitored at the time of the record creation
 +
* dataCount : number of objects
  
=== Storage ===
+
=== Portlet ===
 +
In some cases at service-side the information on the user identity is not available. Hence, it is needed to map the operations performed by the portlets.
 +
This specialization will be used by the portlets in order to take into account information about the mapping between the portal user and the reference to the operation he/she performed.
  
Specific storage attributes
+
Specific portlet attributes:
  
* operationType : GET, PUT (update or new file), DELETE
+
* user : the user actually performing the operation
* targetFile : remote full path of the storage resource
+
* operationId : an unique identifier for the operation
* fileDimension : storage resource dimension
+
* serviceClass: service class used by the client of the storage library at the initialization time of the library
+
* serviceName: service name used by the client of the storage library at the initialization time of the library
+
* hostname: hostname of the host where the storage library is invoked
+

Latest revision as of 13:36, 19 October 2016


Scope

This library contains the definition of the resource accounting record.

Find common-accounting-model on Nexus Repository Browser for an artifact with the following coordinates:

<dependency>
	<groupId>org.gcube.accounting</groupId>
	<artifactId>common-accounting-model</artifactId>
</dependency>

Data-model

The structure of a generic accounting record (Usage Record, UR) is composed of a set of common and mandatory fields for all resource types, in particular:

  • id : an unique identifier for the UR
  • consumerId : the user (or the Consumer Identity, that in the S2S communication is another service) actually consuming the resource
  • createTime : when the UR was created
  • startTime, endTime : the time window the UR refers to
  • resourceType : the type of resource the UR tracks: i.e. Job, Task, Service, StorageUsage, StorageStatus
  • resourceScope : the scope of the resource
  • resourceOwner : who owns the resource and/or who creates the UR

Furthermore, for each UR there will be a section to be filled with the specific properties per resource type (key-value pairs).

Resource Types

The resource types we've identified are: Execution, Service, StorageUsage, StorageStatus and Portlet.

Execution

This specialization will be used to take into account information about services running jobs on the infrastructure (Workflow Engine, Execution Engine, Statistical Manager, Aquamaps).

For this resource type, there are two sub-types:

Job

Contains the information about the overall job, that will be partitioned in N Tasks.

Specific Job properties:

  • jobId : an unique identifier for the job
  • jobQualifier : qualifies the job in terms of algorithm type or job type (e.g. search, data-transformation, etc)
  • jobName : name of the job
  • jobStart : the instant the job start running
  • jobEnd : the instant the job ends its execution
  • jobStatus: completed/failed
  • vmsUsed : number of the VMs (gHNs) used by the job.
  • wallDuration : duration between the instant the job start running and the instant the job ends its execution.

Task

Contains the information about one slice of the overall Job.

Specific Task properties:

  • jobId : reference to the Job that generated this Task
  • refHost : hostname of the virtual machine (gHN)
  • refVM : virtual machine id (gHN)
  • domain : domain of the virtual machine (gHN)
  • usageStart : the earlier usage time of the Task
  • usageEnd: the latest usage time of the Task
  • usagePhase: completed/failed
  • inputFilesNumber : number of input files to the Task
  • inputFilesSize : dimension of input files to the Task
  • outputFilesNumber : number of output files from the Task
  • outputFilesSize : dimension of output files from the Task
  • overallNetworkIn : overhead of the input traffic over the network to the Task
  • overallNetworkOut : overhead of the output traffic over the network from the Task
  • cores : number of cores per Task.
  • processors : number of processors per Task.

Service

This specialization will be used to take into account information about the services invocations.

Specific service attributes

  • callerIP : IP address that originated the service call
  • callerScope : includes the service scope (for Service specialization the resourceScope field includes the infrastructure scope)
  • refHost : hostname of the virtual machine (gHN)
  • refVM : virtual machine id (gHN)
  • domain : domain of the virtual machine (gHN)
  • invocationCount : number of invocations (aggregated information)
  • averageInvocationTime : average invocation time (aggregated information)
  • serviceClass : name of the service class
  • serviceName : name of the service

StorageUsage

This specialization will be used to account the operations performed against different storage backends.

Specific storage usage attributes:

  • providerId: the identifier of the provider which is the target of a read/write operation
  • objectURI : the identifier of an information within the data source which is the target of a given read/write operation performed
  • operationType : the name of the read/write operation performed over a given source, i.e. GET, PUT, UPDATE, DELETE
  • qualifier : qualifies the data in terms of data (e.g. mime type for the Storage, domain for a database)
  • dataType : type of data accessed, i.e. STORAGE, TREE, GEO, DATABASE
  • dataVolume : quantity of data in terms of KB
  • dataCount : the number of objects within the data provider which are accessed/written.
  • callerIP : IP address that originated the service call (if appropriate)

StorageStatus

Identifies the exploitation (in terms of storage volume) of a storage resource type by an identifiable entity.

Specific storage status attributes:

  • providerId: the identifier of the provider which is the target of a read/write operation
  • qualifier : qualifies the data in terms of data (e.g. mime type for the Storage, domain for a database)
  • dataType : type of data accessed, i.e. STORAGE, TREE, GEO, DATABASE
  • dataVolume (Kbytes): quantity of data monitored at the time of the record creation
  • dataCount : number of objects

Portlet

In some cases at service-side the information on the user identity is not available. Hence, it is needed to map the operations performed by the portlets. This specialization will be used by the portlets in order to take into account information about the mapping between the portal user and the reference to the operation he/she performed.

Specific portlet attributes:

  • user : the user actually performing the operation
  • operationId : an unique identifier for the operation