Monitor critical temperatures in Ubuntu Server – Trusty, Lucid & Karmic
As my server is going to be living in a cupboard I was concerned what would happen if one of the cupboard vents got blocked somehow. The vents are installed in the top of the cupboard and could easily get covered over with a book or something. Whilst computers do have an inbuilt safety mechanism, and will automatically shut down if things get too hot, this mechanism doesn’t kick in until things get VERY hot, much hotter than I’d feel comfortable letting my server get to.
So, I decided to write some scripts which would monitor the server temperatures for me. The CPUs and hard drives are useful things to monitor so this is what these scripts will do. If either of the CPU cores or any one of the hard drives exceeds my pre-determined temperatures then the scripts will send me an email and will then shut the server down.
To keep things simple we’re going to be using two scripts: One script will monitor CPU temperatures and the other script will monitor hard drive temperatures.
NOTE: These instructions have been tested on Ubuntu Server Lucid 10.04, Karmic 9.10, Jaunty 9.04 and Intrepid 8.10. Furthermore the output from the utilities we’re going to install varies across different motherboards/processors and hard drives. You’ll need to tweak the instructions to work in your environment. The scripts do give you a few tips but obviously I’m unable to test the scripts on every motherboard/hard drive out there!
How to shut down the server if the CPU gets too hot
To make this script work we need to install lm-sensors.
So, from a Terminal/Putty Session issue the following two commands:
sudo apt-get update
sudo apt-get install lm-sensors
and answer Y when prompted.
We now need to configure the lm-sensors application. To do this you need to run as root. So from a Putty Session type the following:
sudo su
and type in your password if prompted. You’re now running as “root”. Next type:
sensors-detect
and hit the [Enter] key in response to each of the questions. At the end of the list of questions it will ask if you want to add the necessary modules to /etc/modules, answer YES. You now need to reboot the server for the changes to take effect.
Reminder: “sudo reboot -h now“
Now we have the necessary tools installed we next need a script to monitor the CPU temperatures. This script can be found here.
Instead of downloading the script you can create it via Putty:
- Highlight the whole script, right click and select Copy.
- Using Putty navigate into the folder where you’re going to store the script.
For example type cd /home/xxx/MyScripts where xxx is your username. - Next type vim CPUTempShutdown.sh (or your preferred script name) and press Enter. This will open the file for editing.
- Then press the [Insert] key once and add a couple of blank lines by pressing the [Enter] key. Next right click and the whole script will be pasted into the screen.
- Then press the [Esc] key once and type :wq to save and quit out of the script. If you make a mistake then issue :q! instead of :wq to abort your changes.
- Don’t forget to make the script executable: chmod a+x CPUTempShutdown.sh
Example script usage:
./CPUTempShutdown.sh 35 50
this will produce a warning when either CPU core reaches 35°C and will shut the server down if either of the cores reaches 50°C
How to shut the server down if any disk drive gets too hot
As well as monitoring the CPU temperatures I thought it would a good idea to monitor drive temperatures too. I figured that a fan failure or similar could cause the drives to heat up even though the CPU might stay relatively cool. Time for another script! This one can be downloaded here and the procedure for creating it is the same as for the script above.
To make this script work we need to install smartmontools.
So, from a Terminal/Putty Session issue the following two commands:
sudo apt-get update
sudo apt-get install smartmontools
and answer Y when prompted.
There is currently a bug whereby installing smartmontools requires postfix to also be installed. Simply select No configuration when prompted.
Please note that if you’ve configured any drives to spin down when idle then this script will prevent them from going idle since it accesses the drives to check the temperatures each time it runs, thus resetting the “idle counter”.
So, the script I’ve supplied takes an additional argument. As well as the “warning” temperature and “critical” temperature the third argument allows you to specify the particular drive you want to monitor. So, specify an a if you only want to check the drive called /dev/sda. Specify a b if you want to check the drive called /dev/sdb and so on.
For example:
sudo ./DriveTempShutdown.sh 30 45 b
this will produce a warning when drive /dev/sdb reaches 30°C and will shut the server down if this drive reaches 45°C
If you issue:
sudo ./DriveTempShutdown.sh 30 45
this will produce a warning when ANY drive reaches 30°C and will shut the server down if ANY drive reaches 45°C.
The script assumes 6 drives are installed in your server so tweak that line in the script if you have more or less drives installed than this. The line in the script is: MyList=’a b c d e’
Notice I’m running the script as root.
Run a script as a cron job using Webmin
I’ve set my scripts up as cron jobs using Webmin and the scripts run every minute of every day. So every minute of every day I’m checking the temperature of the CPU cores and disk drives. Well, because I’m spinning down drives when idle I’m only checking the primary (system) drive every minute. All other drives are checked every hour because I’ve set the idle time to be 30 minutes. So, the script will only check drives which are not in standby every hour and will not awaken any drives that are asleep. It would be most unusual for an idle drive to overheat. If you’ve not set any drives to spin down when idle then you can check the temperatures of all drives every minute.
To set up the above scripts as cron jobs within Webmin launch Webmin then click on System and then Scheduled Cron Jobs. Then click Create a new scheduled cron job at the top of the screen that opens.
Click the button next to the Execute cron job as and choose your username for the CPU monitoring script and choose root for the drive monitoring script since root access is required for this script to work.
Type the full path of the script into the Command box along with the additional arguments. So for the CPU monitoring job enter something along the lines of /home/htkh/MyScripts/CPUTempShutdown.sh 35 45 >/dev/null 2>&1, replacing htkh with your own username, MyScripts with the name of the scripts folder you created and CPUTempShutdown.sh with the script name.
The >/dev/null 2>&1 parameter will discard any output the script may produce since all required output is piped to a file.
The 35 is the warning temperature (35°C) and the 45 is the critical temperature (45°C). It’s this critical temperature which will cause the server to shut down. You will obviously need to experiment with these two numbers and adjust them to suit your environment.
For the drive monitoring job you would type something along the lines of:
/home/htkh/MyScripts/DriveTempShutdown.sh 35 45 a >/dev/null 2>&1 where a is the drive you want to monitor.
In the When to Execute section choose Times and dates selected below .. and select All for everything. Then click the Create button.
As mentioned above I am checking only the system drive every minute. All other drives are checked every hour. So, for this hourly job I’ve created an additional cron job but this time I am not specifying the 3rd argument. So, I’d put /home/htkh/MyScripts/DriveTempShutdown.sh 35 45 >/dev/null 2>&1 in the Command box and in the When to Execute section I chose Simple Schedule and selected Hourly from the drop down list.