Tuesday, August 11, 2015

Check for processes across multiple machines...

My group at work needed to know if certain processes were running on a set of machines, and we didn't want to have to manually discover that the processes were running or not. The processes are running on Linux machines, and we use Splunk for capturing environment data, so I created a simple script to do the check and write it to a text file that Splunk consumes.

The minor issues I had were:

1. I needed to run the script as a specific user due to using ssh as that user to other machines.

Solution: Create a crontab entry for that user.  ie, crontab -u myuser -e

Then add the crontab entry:

*/15 * * * * /opt/myscripts/checkForProcesses.sh

2. I wanted to check for multiple processes but not all in one command string. I created separate "check" strings for each unique process. The issue I had was that the check string was interpreted as mutliple variables. "ps x | grep stuff | grep -v grep" was treated as "ps", "x", etc.

Solution: I passed the check string in as the last variable to the method. If the check string was the 3rd value being passed in, then I used the value like this: ${@:3}

The @ mean get all values, and the 3 says to start at value 3.


Here is a short version of the script with all details stripped out:

#!/bin/sh

SCRIPT_DIR=$(dirname $0)
LOG_DIR=/opt/logs
STATUS_FILE=/opt/logs/status/process_status.txt

PROC1_CHECK="ps x | grep myproc1 | grep -v grep | grep -v less | grep java"
PROC2_CHECK="ps x | grep myproc2 | grep -v grep | grep -v less | grep java"
WORKER_LIST=$(cat $SCRIPT_DIR/worker.list)

# get the date/time for Splunk
DATE_VAL=`date`

rm -f $STATUS_FILE

checkStatus()
{
    echo "Checking $1 for $2."
    STATUS=`ssh myuser@$1 ${@:3}`
    if [ "" == "$STATUS" ];
    then
        echo "$DATE_VAL : ERROR : $2 not running on $1." >> $STATUS_FILE
    else
        echo "$DATE_VAL : STATUS : $2 running on $1." >> $STATUS_FILE
    fi

}

#worker.list is a set of machine names to check for certain processes
for worker in $WORKER_LIST; do
    echo "Worker being checked is $worker"
    checkStatus $worker "MyProc1" $PROC1_CHECK
    checkStatus $worker "MyProc2" $PROC2_CHECK
done

1 comment: