how to use a shell script to supervise a program?
I've searched around but haven't quite found what I'm looking for. In a nutshell I have created a bash script to run in a infinite while loop, sleeping and checking if a process is running. The only problem is even if the process is running, it says it is not and opens another instance.
I know I should check by process name and not process id, since another process could j开发者_StackOverflow中文版ump in and take the id. However all perl programs are named Perl5.10.0 on my system, and I intend on having multiple instances of the same perl program open.
The following "if" always returns false, what am I doing wrong here???
while true; do
if [ ps -p $pid ]; then
echo "Program running fine"
sleep 10
else
echo "Program being restarted\n"
perl program_name.pl &
sleep 5
read -r pid < "${filename}_pid.txt"
fi
done
Get rid of the square brackets. It should be:
if ps -p $pid; then
The square brackets are syntactic sugar for the test
command. This is an entirely different beast and does not invoke ps
at all:
if test ps -p $pid; then
In fact that yields "-bash: [: -p: binary operator expected" when I run it.
Aside from the syntax error already pointed out, this is a lousy way to ensure that a process stays alive.
First, you should find out why your program is dying in the first place; this script doesn't fix a bug, it tries to hide one.
Secondly, if it is so important that a program remain running, why do you expect your (at least once already) buggy shell script will do the job? Use a system facility that is specifically designed to restart server processes. If you say what platform you are using and the nature of your server process. I can offer more concrete advice.
added in response to comment:
Sure, there are engineering exigencies, but as the OP noted in the OP, there is still a bug in this attempt at a solution:
I know I should check by process name and not process id, since another process could jump in and take the id.
So now you are left with a PID tracking script, not a process "nanny". Although the chances are small, the script as it now stands has a ten second window in which
- the "monitored" process fails
- I start up my week long emacs process which grabs the same PID
- the nanny script continues on blissfully unaware that its dependent has failed
The script isn't merely buggy, it is invalid because it presumes that PIDs are stable identifiers of a process. There are ways that this could be better handled even at the shell script level. The simplest is to never detach the execution of perl
from the script since the script is doing nothing other than watching the subprocess. For example:
while true ; do
if perl program_name.pl ; then
echo "program_name terminated normally, restarting"
else
echo "oops program_name died again, restarting"
fi
done
Which is not only shorter and simpler, but it actually blocks for the condition that you are really interested in: the run-state of the perl program. The original script repeatedly checks a bad proxy indication of the run state condition (the PID) and so can get it wrong. And, since the whole purpose of this nanny script is to handle faults, it would be bad if it were faulty itself by design.
I totally agree that fiddling with the PID is nearly always a bad idea. The while true ; do ... done
script is quite good, however for production systems there a couple of process supervisors which do exactly this and much more, e.g.
- enable you to send signals to the supervised process (without knowing it's PID)
- check how long a service has been up or down
- capturing its output and write it to a log file
Examples of such process supervisors are daemontools or runit. For a more elaborate discussion and examples see Init scripts considered harmful. Don't be disturbed by the title: Traditional init scripts suffer from exactly the same problem like you do (they start a daemon, keep it's PID in a file and then leave the daemon alone).
I agree that you should find out why your program is dying in the first place. However, an ever running shell script is probably not a good idea. What if this supervising shell script dies? (And yes, get rid of the square braces around ps -p $pid
. You want the exit status of ps -p $pid
command. The square brackets are a replacement for the test
command.)
There are two possible solutions:
Use cron to run your "supervising" shell script to see if the process you're supervising is still running, and if it isn't, restart it. The supervised process can output it's PID into a file. Your supervising program can then cat this file and get the PID to check.
If the program you're supervising is providing a service upon a particular port, make it an inetd service. This way, it isn't running at all until there is a request upon that port. If you set it up correctly, it will terminate when not needed and restart when needed. Takes less resources and the OS will handle everything for you.
That's what kill -0 $pid is for. It returns success if a process with pid $pid exists.
精彩评论