Monitoring Razuna (CF Server) with Nagios

#1

Monitoring of ColdFusion with Nagios

For Windows:
First, you’ll need to install the NRPE_?NT dae­mon on each of the Win­dows servers you have run­ning CF. Fol­low the instruc­tions within the zip to install. It just works.

Add Coldfusion host to hosts.cfg
If your CF servers aren’t already among the hosts you’re mon­i­tor­ing, add them to hosts.cfg. A typ­i­cal host def­i­n­i­tion looks sim­i­lar to the one below.

define host {
     use                   generic-host
     host_name             cfserver1
     alias                 main coldfusion server
     address               172.16.3.129
     parents               Internet_Zone
     check_command         check-host-alive
     max_check_attempts    3
     notification_interval 60
     notification_period   24x7
     notification_options  d,u,r
}

Add host groups for your Coldfusion server
Assum­ing all your CF servers are run­ning the same CF ver­sion, you can make a Nagios host group for those servers. This is much less work than adding indi­vid­ual servers to the ser­vice check (detailed below). If you have servers run­ning dis­parate CF ver­sions, set up a host group for each. Add your host group defin­tion to hostgroups.cfg. A typ­i­cal host group def­i­n­i­tion looks sim­i­lar to the one below.

define hostgroup {
     hostgroup_name cfmxservers
     alias          MCT ColdFusion MX Servers
     contact_groups webdev online
     members        cfserver1, cfserver2, cfserver3
}

Add Coldfusion commands to checkcommands.cfg
In a default Nagios instal­la­tion, checkcommands.cfg con­tains all the def­i­n­i­tions of the com­mands Nagios uses to inspect ser­vices run­ning on the hosts it mon­i­tors. The fol­low­ing list of com­mand def­i­n­i­tions should cover Cold­Fu­sion 7 (and will likely also cover other versions (not checked)). Add the fol­low­ing lines to checkcommands.cfg.

define command {
   command_name    check_coldfusion5
   command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v SERVICESTATE -l "Cold Fusion Application Server"
}
 
define command {
   command_name    check_coldfusionjrun
   command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v SERVICESTATE -l "Macromedia JRun CFusion Server"
}
 
define command {
   command_name    check_coldfusionmx
   command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v SERVICESTATE -l "ColdFusion MX Application Server"
}
 
define command {
   command_name    check_coldfusionmx_process
   command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v COUNTER -l "\\Process(jrun)\\Private Bytes","JRun is using %.f bytes" -w 891289600 -c 1073741824
#   comment    Warning 850Mb, Critical 1024Mb (1Gb) YMMV
}
 
define command {
   command_name    check_coldfusionmx_threads
   command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v COUNTER -l "\\Process(jrun)\\Thread Count","JRun is using %.f Threads" -w 90 -c 110
#   comment    Warning 90 threads, Critical 110 threads YMMV
}

The last two com­mands in the above list­ing check actual CF processes run­ning on the server. Note the com­ments. The val­ues we use may well be very dif­fer­ent in your envi­ron­ment. The eas­i­est way to check is to have a look at what your servers are doing and make an edu­cated guess at the needed lev­els in your setup. No harm done if you use these val­ues, you just may end up get­ting copi­ous noti­fi­ca­tions from Nagios that you don’t need (or con­versely, no noti­fi­ca­tions at all). Test and tweak.

Adding CF-??related ser­vices to services.cfg
Nagios’ mon­i­tor­ing model is cen­tered on the con­cept of ser­vices, so this step is per­haps one of the most impor­tant. You need to add ser­vice def­i­n­i­tions to services.cfg for each of the com­mand def­i­n­i­tions you built ear­lier. A cou­ple of sam­ple ser­vice def­i­n­i­tions are shown below.

define service {
    use                   NM-HTTP
    hostgroup_name        cfmxservers
    service_description   Check ColdFusion MX
    contact_groups        online,webdev
    check_period          24x7
    notification_interval 60
    notification_options  w,u,c,r
    notification_period   24x7
    check_command         check_coldfusionmx
    max_check_attempts    3
    normal_check_interval 1
    retry_check_interval  1
#   comment    Check if ColdFusion MX Server is responsive
}
 
define service {
    use                   NM-HTTP
    hostgroup_name        cfmxservers
    service_description   Check ColdFusion MX Process Threads
    contact_groups        webdev online
    check_period          24x7
    notification_interval 60
    notification_options  w,u,c,r
    notification_period   24x7
    check_command         check_coldfusionmx_threads
    max_check_attempts    3
    normal_check_interval 1
    retry_check_interval  1
}

Restart Nagios
At this point, Nagios should be ready, will­ing and able to mon­i­tor CF on your Windows-??based CF servers. Restart Nagios by whichever method you use. After restart­ing, if you log into the Nagios web inter­face, you should be able to see the CF ser­vices you set up being monitored.

ColdFusion installed on Linux
The above works good for Coldfusion installed on Windows. The below steps should help to get it going when Coldfusion is installed on Linux, but has not been tested!

Use “pstree” com­mand in bash script:

/usr/local/bin/coldfusion-threads-snmp.sh

This will print a CFChil­dren vari­able like:

CFChildren='/usr/bin/pstree | grep -e cfmx7.*cfmx7.*cfmx | sed -e 's/\*\[cfmx7\]$//' -e 's/? *..cfmx7.*.cfmx7...//''

and the results of the script can be exported within snmp; an entry in snmpd.conf like:
exec .1.3.6.1.4.1.2021.500 coldfusion-??threads /usr/local/bin/coldfusion-threads-snmp.sh will do it.