开发者

How to use threads to replace looping a subroutine in perl/pdl

I have a perfectly good perl subroutine written as part of a perl module. Without going into too many details, it takes a string and a short list as arguments (often taken from terminal) and spits out a value (right now, always a floating point, but this may not always be the case.)

Right now, the list portion of my argument takes two values, say (val1,val2). I save the output of my subroutine f开发者_开发技巧or hundreds of different values for val1 and val2 using for loops. Each iteration takes almost a second to complete--so completing this entire process takes hours.

I recently read of a mystical (to me) computational tool called "threading" that apparently can replace for loops with blazing fast execution time. I have been having trouble understanding what these are and do, but I imagine they have something to do with parallel computing (and I would like to have my module as optimized as possible for parallel processors.)

If I save all the values I would like to pass to val1 as a list, say @val1 and the same for val2, how can I use these "threads" to execute my subroutine for every combination of the elements of val1 and val2? Also, it would be helpful to know how to generalize this procedure to a subroutine that also takes val3, val4, etc.


Update:

I do not use PDL so I did not know a thread in PDL does not correspond exactly to the notion of threading I have been talking about. See PDL threading and signatures:

First we have to explain what we mean by threading in the context of PDL, especially since the term threading already has a distinct meaning in computer science that only partly agrees with its usage within PDL.

However, I think the explanation below is still useful to you as one would need to know what threading in the regular sense is to understand how PDL threads are different.

Here is the Threads entry on Wikipedia for background.

Using threads cannot make your program magically faster. If you have multiple CPUs/cores and if the computations you are carrying out can be divided into independent chunks, using threads can allow your program to carry more than one computation at a time and cut down on the total execution time.

The easiest case is when the subtasks are embarrassingly parallel requiring no communication/coordination between threads.

Regarding possible performance gains, consider the following program:

#!/usr/bin/perl

use strict; use warnings;
use threads;

my ($n) = @ARGV;

my @threads = map { threads->create(\&act_busy) } 1 .. $n;

$_->join for @threads;

sub act_busy {
    for (1 .. 10_000_000) {
        my $x = 2 * 2;
    }
}

On my dual core laptop running Windows XP:

C:\> timethis t.pl 1
TimeThis :  Elapsed Time :  00:00:02.375
C:\> timethis t.pl 2
TimeThis :  Elapsed Time :  00:00:02.515
C:\> timethis t.pl 3
TimeThis :  Elapsed Time :  00:00:03.734
C:\> timethis t.pl 4
TimeThis :  Elapsed Time :  00:00:04.703
...
C:\> timethis t.pl 10
TimeThis :  Elapsed Time :  00:00:11.703

Now, compare that to:

#!/usr/bin/perl

use strict; use warnings;

my ($n) = @ARGV;

act_busy() for 1 .. $n;

sub act_busy {
    for (1 .. 10_000_000) {
        my $x = 2 * 2;
    }
}
C:\> timethis s.pl 10
TimeThis :  Elapsed Time :  00:00:22.312


As Sinan says, the "threading" you were probably thinking of is "PDL threading", now renamed (as of 2.075) to "broadcasting" to match the general terminology (see docs). It allows you to replace something like this:

$x = sequence(5);
$x->set($_, $x->at($_)+2) for 0..$x->dim(0)-1;

with just this, since "+=" fundamentally operates on one thing (a zero-dimensional scalar), so with more dimensions than a scalar (such as this 1-dimensional sequence) it can "broadcast":

$x += 2; # does whole ndarray at once

This is also faster because unlike the for loop, it doesn't have to keep leaving and re-entering the Perl environment (aka "Perl-land"), but can stay in extremely fast "C-land" to do the calculations with no overhead.

The motivation behind its original name was that these "broadcasted" calculations are all independent, and therefore "embarrassingly parallel", so can be automatically parallelised. See doc - as of 2.059, PDL by default sets parallel processing to happen automatically, on the number of CPU cores available.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜