Watchdog


#1

The watchdog service is really useful. Over the years it has always been great to get a text when the server goes belly up. One thing would make it even more useful, that it would activate when the server ran out of HD space. That for me, and many others is the most common cause of site outages. Usually this happens due to the backups filling all available space. Even if you prune them and re-upload then adjust policies etc, suddenly weeks down the road the system will create a new set filling the HD space again.


#2

I found the watchdog to be hit and miss. I am very happy with ‘monit’ these days. Fairly easy to set up too.


#3

Here’s what I do:

Create a file /usr/local/check_disk_space with this in it:

#!/bin/bash
# www.fduran.com
# script that will send an email to EMAIL when disk use in partition PART is bigger than %MAX
# adapt these 3 parameters to your case

MAX=90
EMAIL=foo@example.com # change this, obviously
PART=rootfs

USE=`df -h |grep $PART | awk '{ print $5 }' | cut -d'%' -f1`
if [ $USE -gt $MAX ]; then
  echo "Percent used: $USE" | mail -s "Running out of disk space" $EMAIL
fi

Add a job in /etc/cron.d:

0 * * * * /usr/local/bin/check_disk_space