Problem with suspending and resuming job
I have a driver script which manages a job string which can run jobs in parallel or sequentially based on a dependency graph. For example:
Job Predecessors
A null
B A
C A
D B
E D, C
F E
The driver starts A in the background and waits for it to complete by suspending itselfusing bash built-in suspend
. On completion, job A sends a SIGCONT
to the driver which would then start B and C in the background and suspend itself again, and so on.
The driver has a set -m
so job control is enabled.
This works fine when the driver is itself started in background. However, when the driver is invoked in the foreground, the first call to suspend works fine. The second call seems to turn into an 'exit
' which reports a "There are stopped jobs
" and does not exit. The third call to suspend also turns into an 'exit
' and kills the driver and all children [as it should considering this is the second converted call to 'exit
'].
And this is my question: Is this expected behavior? If so, why and how do I work around it?
Thanks.
Code fragments below:
Driver:
for step in $(hash_keys 'RUNNING_HASH')
do
proc=$(hash_find 'RUNNING_HASH' $step)
if [ $proc ]
then
# added the grep to ensure the process is found
ps -p $proc | grep $proc > /dev/null 2>&1
if [ $? -eq 0 ]
then
log_msg_to_stderr $SEV_DEBUG "proc $proc running: suspending execution"
suspend
# execu开发者_运维技巧tion resumes here on receipt of SIGCONT
log_msg_to_stderr $SEV_DEBUG "signal received: continuing execution"
break
fi
fi
done
Job:
## $$ is the driver's PID
kill -SIGCONT $$
I have to think you are over-complicating things playing with job control and suspend, etc. Here is an example program which keeps 5 children running at all times. Once a second it looks to see if anyone went away (much more efficiently than ps|grep, BTW) and starts up a new child if necessary.
#!/usr/bin/bash
set -o monitor
trap "pkill -P $$ -f 'sleep 10\.9' >&/dev/null" SIGCHLD
totaljobs=15
numjobs=5
worktime=10
curjobs=0
declare -A pidlist
dojob()
{
slot=$1
time=$(echo "$RANDOM * 10 / 32768" | bc -l)
echo Starting job $slot with args $time
sleep $time &
pidlist[$slot]=`jobs -p %%`
curjobs=$(($curjobs + 1))
totaljobs=$(($totaljobs - 1))
}
# start
while [ $curjobs -lt $numjobs -a $totaljobs -gt 0 ]
do
dojob $curjobs
done
# Poll for jobs to die, restarting while we have them
while [ $totaljobs -gt 0 ]
do
for ((i=0;$i < $curjobs;i++))
do
if ! kill -0 ${pidlist[$i]} >&/dev/null
then
dojob $i
break
fi
done
sleep 10.9 >&/dev/null
done
wait
Do the worker jobs exit when they're finished? If so, rather than using suspend
and SIGCONT, how about simply using wait $PIDS
in the driver script?
精彩评论