开发者

Using make to execute independent tasks in parallel

I have a bunch of commands I would like to execute in parallel. The commands are nearly identical. They can be expected to take about the same time, and can run completely independently. They may look like:

command -n 1 > log.1
command -n 2 > log开发者_StackOverflow中文版.2
command -n 3 > log.3
...
command -n 4096 > log.4096

I could launch all of them in parallel in a shell script, but the system would try to load more than strictly necessary to keep the CPU(s) busy (each task takes 100% of one core until it has finished). This would cause the disk to thrash and make the whole thing slower than a less greedy approach to execution.

The best approach is probably to keep about n tasks executing, where n is the number of available cores.

I am keen not to reinvent the wheel. This problem has already been solved in the Unix make program (when used with the -j n option). I was wondering if perhaps it was possible to write generic Makefile rules for the above, so as to avoid the linear-size Makefile that would look like:

all: log.1 log.2 ...
log.1:
        command -n 1 > log.1
log.2:
        command -n 2 > log.2
...

If the best solution is not to use make but another program/utility, I am open to that as long as the dependencies are reasonable (make was very good in this regard).


Here is more portable shell code that does not depend on brace expansion:

LOGS := $(shell seq 1 1024)

Note the use of := to define a more efficient variable: the simply expanded "flavor".


See pattern rules

Another way, if this is the single reason why you need make, is to use -n and -P options of xargs.


First the easy part. As Roman Cheplyaka points out, pattern rules are very useful:

LOGS = log.1 log.2 ... log.4096
all: $(LOGS)

log.%:
    command -n $* > log.$*

The tricky part is creating that list, LOGS. Make isn't very good at handling numbers. The best way is probably to call on the shell. (You may have to adjust this script for your shell-- shell scripting isn't my strongest subject.)

NUM_LOGS = 4096

LOGS = $(shell for ((i=1 ; i<=$(NUM_LOGS) ; ++i)) ;  do  echo log.$$i ; done)


xargs -P is the "standard" way to do this. Note depending on disk I/O you may want to limit to spindles rather than cores. If you do want to limit to cores note the new nproc command in recent coreutils.


With GNU Parallel you would write:

parallel command -n {} ">" log.{} ::: {1..4096}

10 second installation:

(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash

Learn more: http://www.gnu.org/software/parallel/parallel_tutorial.html https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜