SMSCG Monitoring System

Author(s): Placi
Reviewer: Sergio
Last Modified: 20.05.2011 (by Placi)
Review Date: 31.08.2010

Table of Contents

  • Introduction
  • Installation and Configuration of Nagios for the Computing Element
  • Installation and Configuration of Nagios Server (at the site)
  • Testing
  • Updating Nagios Plugins (CE)

Introduction

The smscg_monitor_architecture.pdf document describes the monitoring system architecture of the SMSCG project. We are using Nagios and a handful of plugins we developed and customized for monitoring the grid infrastructure. In addition to Nagios we develop (still in progress) a centralized 'Grid Monitor' so operational monitoring aspects get covered as well. The installation and configuration of the later is not (yet) described here.

This document describes what sites need to install and configure in order to be monitored. For the moment this only requires Nagios to be set up at each SMSCG site. (all the other things are done centrally).

Overview of the Nagios Setting of the SMSCG Monitor

nagios_setup

We suggest the setting as shown in the figure. There are the following three components:

  • smscg Nagios Server: this is the central Nagios server that is operated by the SMSCG project. This server collects information from all sites. (We do not describe its installation here).
  • site Nagios Server : each site operates an own Nagios server. This server probes the Computing Element (CE) for its current state. The probing is done by executing Nagios plugins at the CE and having the results returned to the  site Nagios Server. Notice, the Nagios Server at the site can be behind a firewall, since communication with the smscg Nagios Server is done via the NSCA protocol.
  • Computing Element (CE): The Computing Element (aka ARC frontend), is currently the only grid component monitored at a plain site. In order to do so the CE must support the execution of Nagios plugins. 

Communication between the components is done by following two protocols:

  • NRPE (Nagios Remote Plugin Executor), is the protocol used by the Nagios server to trigger the execution of the Nagios plugins at the CE and to collect the results from their execution.
  • NSCA (Nagios Service Check Acceptor), is the protocol that pushes the results collected by the site Nagios Server to the central smscg Nagios server. Sites can configure the subset of information pushed (they have control of what they disclose) and since it's a push mechanism, no firewall settings need to be adapted.

In the following sections we guide you through the installation and configuration of the components at a site. We start with the Computing Element (CE) and proceed later to the Nagios Server (site).

Installation and Configuration of Nagios for the Computing Element (CE)

Add the following yum repository (preferably under /etc/yum.repos.d/smscg.repo), if you have an RPM based system.

[SMSCG]
name=SMSCG
baseurl=http://repo.smscg.ch/SMSCG_2.0/rpms

 

Don't forget to run yum update before continueing.

Installation at CE (RPM):

The installation is done via:

yum install nagios-plugins  
yum install nagios-nrpe (*)
yum --nogpgcheck install ngce_nagios_plugins

 

Note: the smscg-specific nagios plugins will be installed under /opt/smscg/nagios/plugins, together with a default configuration at /opt/smscg/nagios/etc/smscg_nrpe.cfg

(*) package name may differ depending on Linux distribution. For Example on CentOS would rather be:

yum install nrpe 

Installation at CE (non-RPM):

First make sure that nrpe-server is installed on your CE. Otherwise install this first.
(e.g. Debian: apt-get install nagios-nrpe-server)

The installation of the plugins is done via: 

wget http://repo.smscg.ch/SMSCG/nagios/ngce_nagios_plugins
tar xvfz ngce_nagios_plugins
cd ngce_nagios_plugins-<version>
python setup.py install
   

 Notice, that the smscg-specific nagios plugins will be installed under /opt/smscg/nagios/plugins, together with a default configuration at /opt/smscg/nagios/etc/smscg_nrpe.cfg

Configuration at CE:

You need to add the following line to the nrpe.cfg file on your system:  (Default locations of the nrpe.cfg file are: /etc/nagios/nrpe.cfg (centOS, debian, scientific linux), /usr/local/nagios/etc/nrpe.cfg (RHEL))

include=/opt/smscg/nagios/etc/smscg_nrpe.cfg

 

Also check if allowed_hosts is set to the IP-address of your site Nagios Server.

Installation and Configuration of  Nagios Server (at the site)

Nagios Server Installation

There are two possible situations:

  • a.) you do not have any Nagios server yet at your site, so you need to install one from scratch. We refer to the official Nagios documentation at http://nagios.sourceforge.net/docs/3_0/toc.htm for the installation and basic configuration.
  • b.) you already use Nagios for monitoring at your site. In that case you do not need to install a server. Please proceed with NSCA installation section.

NSCA Installation

The installation of the NSCA client:

  1. installation of send_nsca client at your local Nagios server using one of the following options:
    • rpm-installations do please use http://rpmfind.net/linux/rpm2html/search.php?query=send_nsca
    • yum install nsca
    • on debian you may install it via: apt-get install nsca 
    • download the source from http://sourceforge.net/project/showfiles.php?group_id=26589, then build & install
      (./configure, make, cp src/nsca_send to your $NAGIOS_LOCATION/bin)
  2. installattion of an 'eventhandler' and configuration templates:
    either:
    yum install server_nagios_nsca
    or:
    wget http://repo.smscg.ch/SMSCG/nagios/server_nagios_nsca.tar.gz
    tar xvfz server_nagios_nsca.tar.gz
    cd server_nagios_nsca-<version>
    sudo python setup.py install   # will install things in /opt/smscg/nagios

Configuration of Nagios Server (site)

The plugins we installed and configured at the CE need to be known by the Nagios server so it can invoke them. You can therefore download Nagios' service description for above plugins and store them where the other configurations are kept. (e.g. on /usr/local/nagios/etc/objects (SLC4), /etc/nagios3/conf.d/ (debian)). (We assume that the general host and service configs have already been configured for the CE).

cd /opt/smscg/nagios/etc/
vi site_settings.cfg -> edit valued in '<...>' brackets

 

NSCA Configuration

1. (in $NAGIOS_LOCATION/etc/send_nsca.cfg)

...
encryption_method=1


2. in your $NAGIOS_LOCATION/etc/nagios.cfg file set/change:

  obsess_over_services = 1
  obsess_over_hosts= 0
  ocsp_command = smscg_passive_check 

3. add the following entry in the $NAGIOS_LOCATION/etc/nagios.cfg:
  cfg_file=/opt/smscg/nagios/etc/commands.cfg
  cfg_file=/opt/smscg/nagios/etc/site_settings.cfg
  cfg_file=/opt/smscg/nagios/etc/arc_ce1.cfg

 

 Feedback from Heinz:

make sure the ${NAGIOS_INSTALL_PATH}/etc/command.cfg contains the following definition: 

define command{
 command_name    check_nrpe
 command_line    /usr/lib/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}

 For more information about NSCA:

Finishing Nagios Installation

Once you have installed and configured your Nagios server and CE, you should drop a mail to smscg-tech@swing-grid.ch, requesting your site to be added to https://monitor.smscg.ch/nagios.

Testing

If you installed Nagios you should have the reports from the plugins " grid-infosys, grid-manager, gridftp, gridsecurity, ARC Version, and trustanchors" on your Nagios Server.

A quick test for the NSCA passive submission:

  1. log in to your local Nagios Server
  2. issue following command. You many need to change the paths to your nsca command and configuration.
    echo -e 'testhost.smscg.ch;testservice;0;SERVICE OK\n' | /usr/local/nagios/bin/send_nsca -H nagios.smscg.ch -c /usr/local/nagios/etc/send_nsca.cfg -d ';'
     
  3. 1 data packet(s) sent to host successfully.


you may also start the server in  test mode to see errors with the config file: 

 /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

 

Troubleshooting: in case of problem read the log-files (for nrpe it's syslog) and try to submit active checks from your nagios-server.

Updating Nagios Plugins (CE)

New releases of SMSCG provided Nagios plugins can be installed by running:

 
yum check-update
yum --nogpgcheck update ngce_nagios_plugins   

 

 There is no need to change any configurations.