Infrastructure Monitoring
Contents
Messaging Infrastructure
The main purposes of such a monitoring Architecture is to give to the project members the possibility to:
- Perform local test on the distributed component of the infrastructure and maintain the historical information
- Store historical information about changes on the topology of the infrastructure ( new node or services added/removed/updated)
- Receive alarm/notification in case of problem on the distributed environment
- Help Site/VRE/VO administrators on the resolution of problems
The D4Science monitoring infrastructures is composed by 3 main components:
- gCube Local Monitor : that is deployed on each node (GHN) of the gCube infrastructure
- Message Broker: that receives and dispatch monitoring/notification messages
- gCube Consumer Monitor: that subscribes for messages from the Message Broker, check metrics,store messages and notifies Administrators
But at the base of a [1] MoM (Message Oriented Middleware) infrastructure we have the messages exchanged btw components.
A GUBEMessage ( defined as a Java Object) is composed by the following base fields:
- Topic Name
- Source GHN
- Scope
- Result ( optional)
The Topic Name corresponds to events that the consumer subscribes for.
gCube Local Monitor
The Local Monitor Interface, from a package point of view have been splitted into 2 components:
- An abstract Local Monitor Interface has been developed in the context of the [2]gCore Framework (gCF). The org.gcube.common.core.monitoring package models a Local monitor, a local probe and the base message format described above.
- An implementation of the abstract LocalMonitor ( GCUBELocalMonitor) that is composed by a set of Probes ( that can be loaded and extended dynamically):
- GHNDiskProbe
- GHNLastUpdateProbe
- GHNLoadProbe
- GHNMemoryProbe
- GHNInformationProbe
- GHNNotificationProbe
- RINotificationProbe
All the above probes exploits the [3] JMS ( Java Message Service) standards to contact the Message Brokers and send messages.
The GHN and RI probes exchange with the Broker a particular type of messages ( extensions of the base GCUBEMessage ) named respectively GHNMessage and RIMessage. Both of them contains a particular Object named Test, that represents the test performed on the GHN ( together with the result), or a Notification:
- TestType : corresponds to the probes that perform the test {DISK_QUOTA ,CPU_LOAD, MEMORY_AVAILABLE, LAST_UPDATE, NOTIFICATION, CPUINFO}
- Description: the test description ( or the Notification description )
- TestNumber: a test Unique Identifier
- TestResult: object that stores the test result ( In case of TestType NOTIFICATION noTestResult are expected)
- Priority: the test priority (can be HIGH,LOW)
The RIMessage contains also information (the ServiceClass and ServiceName ) of the Running Instance where the probe is running.
Topic Structure and Message Selectors
Logically probes can be grouped according to types of message that they produce ( GHN or RI Messages) and according their behavior:
- Notification Probes that exploits the gCF local event mechanism to consume events related to GHN/RI actions ( GHN Ready, RI/GHN scope changed)
- Test Probes that perform local tests on the gHN and send messages containing the test results
At message creation time depending on the type of messages and type of probes, we can have a different combination of Topic names and [4] Message Selector as described below( see couple (topicName/MessageSelector)) :
GHNMessage | RIMEssage | ||||
---|---|---|---|---|---|
TestProbe | scope.GHN.sourceGHN / MessageType='TEST' | scope.RI.sourceGHN / MessageType='TEST' | |||
NotificationProbe | scope.GHN.sourceGHN / MessageType='NOTIFICATION' | scope.RI.sourceGHN / MessageType='NOTIFICATION' |
The Monitoring Probes, following the above topic structure, send messages for each scope of the GHN / RI. For Example on the GHN running on pcd4science.cern.ch host and port 8080, that belongs to both /gcube and /gcube/devsec scopes, a GHNDiskProbe probe will send two messages with the following topic names:
- gcube.GHN.pcd4science_cern_ch:8080
- gcube.devsec.GHN.pcd4science_cern_ch:8080
and Message Selector:
- MessageType='TEST'
In the creation of the topic names, the '.' char ( that in the JMS destination syntax means topic structure separator) is replaced by "_" .
Configuration
In order to configure the GHN to run the gCube Local Monitor, at least one MessageBroker ( an Active MQ endpoint) must be configured in one of the ServiceMap related to the GHN scope as follows:
<ServiceMap> <Service name ="ISICAllQueryPT" endpoint ="http://dlib01.isti.cnr.it:8080/wsrf/services/diligentproject/informationservice/disic/DISICService"/> <Service name ="ISICAllRegistrationPT" endpoint ="http://dlib01.isti.cnr.it:8080/wsrf/services/diligentproject/informationservice/disic/DISICRegistrationService"/> ...................... <Service name ="MessageBroker" endpoint ="tcp://ui.grid.research-infrastructures.eu:6166"/> </ServiceMap>
One parameter can been added also to the [5]GHN configuration :
- testInterval: The interval in seconds between test executions ( default = 1800)
In case none of MessageBroker parameters present on GHN ServiceMaps, the gCube Local Monitor is not enabled on the GHN.
Message Broker
Following the work that has been done by the [6]WLCG Monitoring group at CERN on Monitoring using MoM systems, and to potentially make interoperable the EGEE and D4science Monitoring solution, the [7][EGEE MSG Broker component has been adopted in D4Science has standard Message Broker service.
The EGEE MSG Broker is based on the [8]Apache ActiveMQ message broker, a very powerful Open Source solution having the following main features:
- Message Channels
- Publish-Subscribe (Topics)
- Point-to-Point (Queue)
- Virtual Destination, WildCards
- Synchronous, Asynchronous sending
- Wide Range of supported protocol for clients
- Open Wire for high performance clients
- STOMP
- REST, JMS
- Extremely good performance and reliability
- Is it possible to check the [9] Performance Test executed by WLCG Monitoring group.
Installation
The Installation instruction for the EGEE MSG Broker can be found on EGEE MSG Wiki [10].
gCube Consumer Monitor
The Consumer Monitor is a gCube WSRF service, that can be deployed on gCube Enabled infrastructure to consume monitoring messages coming from Message Brokers. The main features of the service are:
- Subscribe to Monitoring messages at varius [11] scopes (Infrastructure/VO/VRE)
- Check Monitoring message test result against metrics
- Store Monitoring messages on DBs
- Notify Administrators in case of Notification/abnormal tests results.
In junction with the service a GUI is going to be implemented that can let the Administrators show the DB information and graphs.
The WSRF service does not expose a public operation at the moment, but in future it will be possible to introduce public operations to query the DB underneath and export Test results outside the infrastructure.
Following the GHN local Monitor Topic Structure, the Consumer Service, at startup time, create the so called Durable subscription towards this topic: the ActiveMQ server will hold messages for a client subscriber after it has formally subscribed. Durable topic subscriptions receive messages published while the subscriber is not active. Subsequent subscriber objects specifying the identity of the durable subscription can resume the subscription in the state it was left by the previous subscriber.
This means that using the same Subscription ID the Consumer-Service can resume the receipt of messages from the ActiveMQ server ( This is very powerful, and it's fondamental in case of a node-crash or Service Re-Deployment). Configuration
The Consumer Service can be configured to run ( and to subscribe for ) in one or more scopes. Following the Topic Structure described above, at start time the service subscribes for the following topics:
- <scope>.GHN.*
- <scope>.RI.*
In addiction the Service can be configured ( from configuration file) to use JMS message selectors. This means that for each scope 2*nofSelectors Durable Subscribers are created using the wildcard (.*) syntax for TopicNames (all topic names of the same scope and type are subscribed for). For each Durable Subscribers the Subscription ID takes the value of the the related Topic Name, in order to resume easily Subscription after Service crash or Re-Deployment.
The Consumer Service can be configured ( as any of the other gCube services) by adding/changing configuration parameters on the [12]JNDI service file. The following table describe the list of service parameters.
Parameter | Type | Description |
---|---|---|
DBFile | String | The FIle Name containing the DB Structure |
MailRecipients | String | The FIle containing the list of Fixed administrators mail , if present the list of admin mail is not downoaded from VOMS peridically |
NotifiybyMail | Boolean | Specify if the mail notification feature has to be turned on |
startScopes | String | List of scopes the Service belongs to |
httpServerBasePath | String | the container related base path for the embedded [13]Jetty Webserver |
httpServerPort | String | the port for the embedded [14]Jetty Webserver |
monitorRoleString | String | the Role on the VOMS related to Site/VO Admin ( to be used when the service downloads info from VOMS) |
UseEmbeddedBroker | Boolean | The Service can run an embedded ActiveMQ instance ( to be configured only for testing purpose, not suggested for Production environments) |
DailySummary | Boolean | Specify if the service has to create a daily report containing the messages received for each scope |
MessageSelectors | String | Specify if to use MessageSelectors on Broker Subscriptions |
A sample JNDI:
<!-- DB Structure file --> <environment name="DBFile" value="dbqueries.file" type="java.lang.String" override="false" /> <environment name="MailRecipients" value="recipients.txt" type="java.lang.String" override="false" /> <!-- Notify By Mail--> <environment name="NotifiybyMail" value="true" type="java.lang.Boolean" override="false" /> <environment name="startScopes" value="/gcube/devsec" type="java.lang.String" override="false" /> <environment name="httpServerBasePath" value="jetty/webapps" type="java.lang.String" override="false" /> <environment name="httpServerPort" value="6900" type="java.lang.String" override="false" /> <environment name="monitorRoleString" value="Role=VO-Admin" type="java.lang.String" override="false" /> <environment name="UseEmbeddedBroker" value="false" type="java.lang.Boolean" override="false" /> <environment name="MailSummary" value="true" type="java.lang.Boolean" override="false" /> <environment name="MessageSelectors" value="MessageType = 'TEST', MessageType = 'NOTIFICATION'" type="java.lang.String" override="false" />
DB Structure
The Database that stores the information related to messages, is composed by three tables:
- VO
- GHNMESSAGE
- RIMESSAGE
The VO table store the VOs the Service Monitors, The GHNMESSAGE table stores information about Running Instance messages, they can be both NOTIFICATION and TEST type :
- MessageId
- ServiceName
- ServiceClass
- GHNName
- description
- testType
- result
- scope
- date
- time
The GHNMESSAGE table structure cotains the same fields except for ServiceClass and ServiceName.
Software Dependencies
The Service depends on the following list of Third-party libraries: