开发者

How to limit number of threads/sub-processes used in a function in bash

My question is how change this code so it will use only 4 threads/sub-processes?

TESTS="a b c d e"

for f开发者_如何学编程 in $TESTS; do
  t=$[ ( $RANDOM % 5 )  + 1 ]
  sleep $t && echo $f $t &
done
wait


Interesting question. I tried to use xargs for this and I found a way.

Try this:

seq 10 | xargs -i --max-procs=4 bash -c "echo start {}; sleep 3; echo done {}"

--max-procs=4 will ensure that no more than four subprocesses are running at a time.

The output will look like this:

start 2
start 3
start 1
start 4
done 2
done 3
done 1
done 4
start 6
start 5
start 7
start 8
done 6
done 5
start 9
done 8
done 7
start 10
done 9
done 10

Note that the order of execution might not follow the commands in the order you submit them. As you can see 2 started before 1.


Quick and dirty solution: insert this line somewhere inside your for loop:

while [ $(jobs | wc -l) -ge 4 ] ; do sleep 1 ; done

(assumes you don't already have other background jobs running in the same shell)


I have found another solution for this question using parallel (part of moreutils package.)

parallel -j 4 -i bash -c "echo start {}; sleep 2; echo done {};" -- $(seq 10)

-j 4 stands for -j maxjobs

-i uses the parameters as {}

-- delimits your arguments

The output of this command will be:

start 3
start 4
start 1
start 2
done 4
done 2
done 3
done 1
start 5
start 6
start 7
start 8
done 5
done 6
start 9
done 7
start 10
done 8
done 9
done 10


You can do something like this by using the jobs builtin:

for f in $TESTS; do
  running=($(jobs -rp))
  while [ ${#running[@]} -ge 4 ] ; do
    sleep 1   # this is not optimal, but you can't use wait here
    running=($(jobs -rp))
  done
  t=$[ ( $RANDOM % 5 )  + 1 ]
  sleep $t && echo $f $t &
done
wait


GNU Parallel is designed for this kind of tasks:

TESTS="a b c d e"
for f in $TESTS; do
  t=$[ ( $RANDOM % 5 )  + 1 ]
  sem -j4 sleep $t && echo $f $t
done
sem --wait

Watch the intro videos to learn more:

http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1


This tested script runs 5 jobs at a time and will restart a new job as soon as it does (due to the kill of the sleep 10.9 when we get a SIGCHLD. A simpler version of this could use direct polling (change the sleep 10.9 to sleep 1 and get rid of the trap).

#!/usr/bin/bash

set -o monitor
trap "pkill -P $$ -f 'sleep 10\.9' >&/dev/null" SIGCHLD

totaljobs=15
numjobs=5
worktime=10
curjobs=0
declare -A pidlist

dojob()
{
  slot=$1
  time=$(echo "$RANDOM * 10 / 32768" | bc -l)
  echo Starting job $slot with args $time
  sleep $time &
  pidlist[$slot]=`jobs -p %%`
  curjobs=$(($curjobs + 1))
  totaljobs=$(($totaljobs - 1))
}

# start
while [ $curjobs -lt $numjobs -a $totaljobs -gt 0 ]
 do
  dojob $curjobs
 done

# Poll for jobs to die, restarting while we have them
while [ $totaljobs -gt 0 ]
 do
  for ((i=0;$i < $curjobs;i++))
   do
    if ! kill -0 ${pidlist[$i]} >&/dev/null
     then
      dojob $i
      break
     fi
   done
   sleep 10.9 >&/dev/null
 done
wait


This is my "parallel" unzip loop using bash on AIX:

for z in *.zip ; do
  7za x $z >/dev/null
  while [ $(jobs -p|wc -l) -ge 4 ] ; do
    wait -n
  done
done

Notes:

  • jobs -p (bash function) lists jobs of immediate parent
  • wait -n (bash function) waits for any (one) background process to finish
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜