Monitoring a gCube infrastructure With Nagios

From Gcube Wiki
Revision as of 10:46, 5 June 2012 by Andrea.manzi (Talk | contribs) (GHN monitoring plugins)

Jump to: navigation, search

Overview

Nagios [1] is a popular open source computer monitor, network monitoring and infrastructure monitoring software application. Nagios offers complete monitoring and alerting for servers, switches, applications, and services and is considered the defacto industry standard in IT infrastructure monitoring.

Nagios components

Nagios is composed by 2 main components the Nagios Server and Nagios plugins

Nagios Server

A Nagios server is an application running tests distributed on the infrastructures, it offers a powerful web interface which can be used by administrator to visualize / configure tests executions.

The installation instruction for Ubuntu,Fedora and OpenSuse can be found at [2]

Nagios Plugins

Nagios plugins are applications that can be executed by the Nagios server or directly in the monitored host. in the case of plugins executed on monitored host the Nagios Server can exploit several methods in order to retrieve the monitoring test results, this capability is available trough 3 different Nagios Addons:

  • NRPE [3] which allows remotely execute Nagios plugins on other Linux/Unix machines. This allows you to monitor remote machine metrics (disk usage, CPU load, etc.)
  • NRDP [4] is a flexible data transport mechanism and processor for Nagios. It is designed with a simple and powerful architecture that allows for it to be easily extended and customized to fit individual users' needs. It uses standard ports protocols (HTTP(S) and XML)
  • NSCA [5] allows to integrate passive alerts and checks from remote machines and applications with Nagios. Useful for processing security alerts, as well as deploying redundant and distributed Nagios setups.


At the moment the Nagios monitoring plugins in gCube are executed directly by the Nagios server, so none of the method described before is currectly exploited. The usage of an NRPE daemon on each node of the infrastructure is currently under investigation.

PNP4Nagios

PNP is an extension for Nagios that plots the performance data provided by the probes as long as they follow the Nagios plug-in development guidelines, guidelines LCGDM probes follow.

Installation and configuration

In this document the version available in the EPEL repositories will be used (0.4). pnp4nagios already provides some documentation for version 0.4, but as it seems not to be clear enough, all steps will be detailed here.

We used to require manual installation of the pnp4nagios and php packages, but they are now a dependency of nagios-plugins-lcgdm.

Configuring RRD

In nagios.cfg, you have to set the following parameters

process_performance_data=1
enable_environment_macros=1
service_perfdata_command=process-service-perfdata

Which

  • Enables the processing of performance data
  • Enablse the passing of environment variables (only for Nagios 3.x)
  • Specifies the service used to process the performance data

process-service-perfdata is already defined under /etc/nagios/objects/commands.cfg (or similar named file), but the default definition has to be changed

define command {
  command_name    process-service-perfdata
  command_line    /usr/bin/perl /usr/libexec/pnp4nagios/process_perfdata.pl
}

Once these modifications are done, restart Nagios.

# service nagios restart

Link between Nagios and pnp4nagios

just be sure that this line

action_url                      /nagios/html/pnp4nagios/index.php?host=$HOSTNAME$&srv=$SERVICEDESC$

is present in the generic-service.cfg definition. Reload Nagios and you will see a small start linking to the graph, next to each service, and in the detailed view as well.

Nagios Configuration in gCube

As said the current Nagios monitoring architecture in gCube does not require the installation of plugins on the monitored machine. The test are only executed by the Nagios server with some configuration to be addressed on the monitored service /host.

Base Configuration

Nagios configuration is stored in the so called 'object configuration files'. Those file contains the definition of host, host_groups, contact, services, etc.. Object definition can be split across several config files which have to be declared inside the /etc/nagios/nagios.conf as follows:

cfg_file=/etc/nagios/objects/myobjects.cfg

if configuration files are stored in a dedicated folder, the folder can be declared in the configuration to be included:

cfg_dir=/etc(nagios/objects/myobjectsfolder

given that, we prepared some base configuration for a gCube infrastructure that can be checkout from : [6]

In details the following 2 configuration files in order to group specific hosts and services need to be installed under /etc/nagios/objects

servicegroups.cfg

define servicegroup{

        servicegroup_name       mysql
        alias                   MYSQL Database Services

}

define servicegroup{

        servicegroup_name       psql
        alias                   MYSQL Database Services

}

define servicegroup{

        servicegroup_name       ghn
        alias                   ghn hosting node

}

define servicegroup{

        servicegroup_name       message broker
        alias                   message broker

}

define servicegroup{

        servicegroup_name       umd service
        alias                   umd service

}


and hostgroups.cfg

define hostgroup{

        hostgroup_name          GHN

        alias                   gCube Hosting Node
        }

define hostgroup{

        hostgroup_name          gCube Infra node

        alias                   gCube Infrastructural node
        }

define hostgroup{

        hostgroup_name          UMD node

        alias                   UMD node
        }


both files need to be included in the Nagios configuration (/etc/nagios/nagios.conf ) as follows

cfg_file=/etc/nagios/objects/hostgroups.cfg
cfg_file=/etc/nagios/objects/servicegroups.cfg

GHN monitoring plugins

From the same svn location [7], the base configuration files for GHNs monitoring are available.

For each monitored GHN the following host object need to be created inside the /etc/nagios/objects/gcube-hosts folder:

define host {
use     linux-server
host_name       nodexx.domain
alias   nodexx.domain
address xx.xx.xx..xx
hostgroups GHN
}

and for each monitored GHN a service object need to be configured inside the /etc/nagios/objects/gcube-services , corresponding to the container running on the host a the <port> parameter ( multiple containers can run on a single host and in that case multiple services need to be configured:

 

define service{
use     local-service
host_name       nodexx.domain
service_description     checkWSRF<port>
check_command   check_tcp!<port>
servicegroups ghn
notifications_enabled   1
}


both folders have to be included in the nagios configuration as follows:

cfg_dir=/etc/nagios/objects/gcube-hosts
cfg_dir=/etc/nagios/objects/gcube-services

Other services monitoring plugins

DB monitoring plugins

The plugins currently exploited in the infrastructure are the Mysql Plugin [8] and Psql[9] plugins. They are installed together with the installation of the Nagios server.

In order to properly configure the execution of this plugins, the following commands has to be defined in the configuration file : /etc/nagios/ojects/command.cfg

################################################################################
# MYSQL Commands
################################################################################

# command 'check_mysql_health'
define command{
command_name    check_mysql_health
command_line       <PATH>/check_mysql_health -H $HOSTADDRESS$ --user $ARG1$ -password $ARG2$ --mode $ARG3$
}
# command 'check_mysql_health_tresholds'
define command{
command_name      check_mysql_health_tresholds
command_line          <PATH>/check_mysql_health -H $HOSTADDRESS$ --user $ARG1$ -password $ARG2$ --mode $ARG3$ --warning $ARG4$ --critical $ARG5$
}
################################################################################
# PostgreSQL Commands
################################################################################


define command {
    command_name    check_postgres_size
    command_line    <PATH>/check_postgres.pl -H $HOSTADDRESS$ -u $ARG1$ -db  $ARG2$ --action database_size -w $ARG3$ -c $ARG4$
}
define command {
    command_name    check_postgres_locks

    command_line    <PATH>/check_postgres.pl -H $HOSTADDRESS$ -u $ARG1$ -db  $ARG2$--action locks w $ARG3$ -c $ARG4$
}