
Doing parallel processing in bash?

I've thousands of png files which I like to make smaller with pngcrush. I've a simple find .. -exec job, but it's sequential. My machine has quite some resources and I'd make this in paral开发者_JAVA百科lel.

The operation to be performed on every png is:

pngcrush input output && mv output input

Ideally I can specify the maximum number of parallel operations.

Is there a way to do this with bash and/or other shell helpers? I'm Ubuntu or Debian.

You can use xargs to run multiple processes in parallel:

find /path -print0 | xargs -0 -n 1 -P <nr_procs> sh -c 'pngcrush $1 temp.$$ && mv temp.$$ $1' sh

xargs will read the list of files produced by find (separated by 0 characters (-0)) and run the provided command (sh -c '...' sh) with one parameter at a time (-n 1). xargs will run <nr_procs> (-P <nr_procs>) in parallel.

You can use custom find/xargs solutions (see Bart Sas' answer), but when things become more complex you have -at least- two powerful options:

  1. parallel (from package moreutils)
  2. GNU parallel

With GNU Parallel http://www.gnu.org/software/parallel/ it can be done like:

find /path -print0 | parallel -0 pngcrush {} {.}.temp '&&' mv {.}.temp {} 

Learn more:

  • Watch the intro video for a quick introduction: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
  • Walk through the tutorial (man parallel_tutorial). You command line will love you for it.




验证码 换一张
取 消

