2012년 12월 7일 금요일

Nagios: Monitor processes in remote host using regular expression

I had encountered the problem that my VMs became repeatedly shutdown. Until now, I couldn't the solve this, but I had to work around of it. I took the second best for this was to detect the downtime of VMs and then started them by Nagios.

check_procs plugin is used to monitor specific process count. It is able to set command name with -C parameter to watch the status of the process and when also able to set arguments with -a.

check_procs can be executed the following: 
$ /usr/lib64/nagios/plugins/check_procs -c 1:1 -C altibase -a 'boot from admin'
PROCS OK: 1 process with command name 'altibase', args 'boot from admin'
-c parameter is to check the ranged of process count. Here, I set to raise critical error if the count wasn't within from 1 to 1, -C parameter is for command name and -a parameter for arguments of the command, this is useful to distinguish one when some processes were found with the same command name. 

Command usage: 
check_procs -w -c [-m metric] [-s state] [-p ppid] [-u user] [-r rss] [-z vsz] [-P %cpu] [-a argument-array] [-C command] [-t timeout] [-v]

However, check_procs only monitor processes in local server. To do monitor processes in remote manner, it needs to send check_procs command to the remote host through check_nrpe.

1) Define the new command name "check_process" in a monitored server.
$ vi /etc/nagios/nrpe.cfg
command[check_process]=/usr/lib64/nagios/plugins/check_procs $ARG1$
2) Define the new command in Nagios monitoring server
$ vi /etc/nagios/object/commands.cfg
# 'check_process' command definition
define command{
  command_name    check_process
  command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_process -a $ARG1$
  }
It was simple definition, I wanted to execute check_process with the argument, which is  passed though $ARG1& variable.  

Then I executed the command this on the monitoring server.
$ /usr/lib64/nagios/plugins/check_nrpe -H 192.168.20.174 -c check_process -a '-c 1:1 -C altibase -a 'boot from admin'' 
PROCS OK: 1 process with command name 'altibase', args 'boot' 
3) It worked, but the result was different
It resulted well, However, it was somewhat different with my expectation. It only with the argument 'boot', I needed to do it with the all arguments. 

So, I tried many times to apply all args but they were not succeeded at all. It  made my mind to find other ways. Indeed, check_procs supports finding process with regular expression, using --ereg-argument-array. 

I changed the command and it was succeeded: 
$ /usr/lib64/nagios/plugins/check_nrpe -H 192.168.20.174 -c check_process -a '-c 1:1 -C altibase --ereg-argument-array='boot.from.admin0''
PROCS OK: 1 process with command name 'altibase', regex args 'boot.from.admin'
In regular expression, A dot(.) means it represents any one word, for example, 'l..e' can be 'love' or 'life' etc. 

References: 


댓글 없음:

댓글 쓰기