On a recent forum post I asked how to add a test for a process that runs too long. Under normal circumstances this process should start, run for five minutes, then end. If it runs for over an hour, there's a problem.
Here are steps on how I set this up:
- identify the process you are measuring
- run proctime.py process. This python script provided by Zen Master cluther (Thank you very much!). Returns the number of seconds process has been running
- shell script that saves output of proctime.py process to a file. Cron job calls this every five minutes. Some of the nice things about doing this via cron and not calling the shell script from snmpd: a) the shell script can test for NULL and write a 0 instead, and b) snmpd is not waiting on the command to execute (cat a file is faster than executing the command)
- snmpd uses cat to dump the contents of that file
- In Zenoss, build a template that queries for a new Data Source: 1.3.6.1.4.1.2021.8.1.101.2 a.k.a. UCD-SNMP-MIB::extOutput.2
- In Zenoss, on that template, build a Threshold and a Graph from the Data Source as you please. I wanted both.
The data below shows that when I query HOSTNAME I get a value of 16 seconds that PROCESS has been running. (UCD-SNMP-MIB::extOutput.2 = 16)
[zenoss@bby1ems01 ~]$ snmpwalk -v2c -c itsasecret \ HOSTNAME 1.3.6.1.4.1.2021.8.1 | grep '.2' UCD-SNMP-MIB::extIndex.2 = INTEGER: 2 UCD-SNMP-MIB::extNames.2 = STRING: getPROCESS.proctime UCD-SNMP-MIB::extCommand.2 = STRING: /bin/cat /var/net-snmp/PROCESS.proctime.seconds UCD-SNMP-MIB::extResult.2 = INTEGER: 0 UCD-SNMP-MIB::extOutput.2 = STRING: 16 UCD-SNMP-MIB::extErrFix.2 = INTEGER: 0 UCD-SNMP-MIB::extErrFixCmd.2 = STRING: [zenoss@bby1ems01 ~]$
I hope this helps.
David
comments:
Similar to How to Monitor a SW RAID device --david_sloboda, Mon, 27 Oct 2008 19:41:52 -0500 reply
http://www.zenoss.com/community/docs/howtos/how-to-monitor-a-software-raid-device/