开发者

Problem with suspending and resuming job

I have a driver script which manages a job string which can run jobs in parallel or sequentially based on a dependency graph. For example:

Job              Predecessors

A                null
B                A
C                A
D                B
E                D, C
F                E

The driver starts A in the background and waits for it to complete by suspending itselfusing bash built-in suspend. On completion, job A sends a SIGCONT to the driver which would then start B and C in the background and suspend itself again, and so on.

The driver has a set -m so job control is enabled.

This works fine when the driver is itself started in background. However, when the driver is invoked in the foreground, the first call to suspend works fine. The second call seems to turn into an 'exit' which reports a "There are stopped jobs" and does not exit. The third call to suspend also turns into an 'exit' and kills the driver and all children [as it should considering this is the second converted call to 'exit'].

And this is my question: Is this expected behavior? If so, why and how do I work around it?

Thanks.

Code fragments below:

Driver:

            for step in $(hash_keys 'RUNNING_HASH')
            do
                    proc=$(hash_find 'RUNNING_HASH' $step)
                    if [ $proc ]
                    then
                            # added the grep to ensure the process is found
                            ps -p $proc | grep $proc > /dev/null 2>&1
                            if [ $? -eq 0 ]
                            then
                                    log_msg_to_stderr $SEV_DEBUG "proc $proc running: suspending execution"
                                    suspend 
                                    # execu开发者_运维技巧tion resumes here on receipt of SIGCONT
                                    log_msg_to_stderr $SEV_DEBUG "signal received: continuing execution"
                                    break
                            fi
                    fi
            done

Job:

## $$ is the driver's PID
kill -SIGCONT $$


I have to think you are over-complicating things playing with job control and suspend, etc. Here is an example program which keeps 5 children running at all times. Once a second it looks to see if anyone went away (much more efficiently than ps|grep, BTW) and starts up a new child if necessary.

#!/usr/bin/bash

set -o monitor
trap "pkill -P $$ -f 'sleep 10\.9' >&/dev/null" SIGCHLD

totaljobs=15
numjobs=5
worktime=10
curjobs=0
declare -A pidlist

dojob()
{
  slot=$1
  time=$(echo "$RANDOM * 10 / 32768" | bc -l)
  echo Starting job $slot with args $time
  sleep $time &
  pidlist[$slot]=`jobs -p %%`
  curjobs=$(($curjobs + 1))
  totaljobs=$(($totaljobs - 1))
}

# start
while [ $curjobs -lt $numjobs -a $totaljobs -gt 0 ]
 do
  dojob $curjobs
 done

# Poll for jobs to die, restarting while we have them
while [ $totaljobs -gt 0 ]
 do
  for ((i=0;$i < $curjobs;i++))
   do
    if ! kill -0 ${pidlist[$i]} >&/dev/null
     then
      dojob $i
      break
     fi
   done
   sleep 10.9 >&/dev/null
 done
wait


Do the worker jobs exit when they're finished? If so, rather than using suspend and SIGCONT, how about simply using wait $PIDS in the driver script?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜