MATLAB parfor is slower than for -- what is wrong?

2023-01-05 10:49 问答作者：

the code I'm dealing with has loops like the following:

bistar = zeros(numdims,numcases); 
parfor hh=1:nt       
  bistar = bistar +  A(:,:,hh)*data(:,:,hh+1)' ;
end

for small nt (10).

After timing it, it is actually 100 times slower than using the regular loop!!! I know that parfor can do parallel sums, so I'm not sure why this isn't working.

I run

matlabpool

with the out-of-the-box configurations before running my code.

I'm relatively new to matlab, and just started to use the parallel 开发者_JAVA百科features, so please don't assume that I'm am not doing something stupid.

Thanks!

PS: I'm running the code on a quad core so I would expect to see some improvements.

Making the partitioning and grouping the results (overhead in dividing the work and gathering results from the several threads/cores) is high for small values of nt. This is normal, you would not partition data for easy tasks that can be performed quickly in a simple loop.

Always perform something challenging inside the loop that is worth the partitioning overhead. Here is a nice introduction to parallel programming.

The threads come from a thread pool so the overhead of creating the threads should not be there. But in order to create the partial results n matrices from the bistar size must be created, all the partial results computed and then all these partial results have to be added (recombining). In a straight loop, this is with a high probability done in-place, no allocations take place.

The complete statement in the help (thanks for your link hereunder) is:

If the time to compute f, g, and h is large, parfor will be significantly faster than the corresponding for statement, even if n is relatively small.

So you see they mean exactly the same as what I mean, the overhead for small n values is only worth the effort if what you do in the loop is complex/time consuming enough.

Parforcomes with a bit of overhead. Thus, if nt is really small, and if the computation in the loop is done very quickly (like an addition), the parfor solution is slower. Furthermore, if you run parforon a quad-core, speed gain will be close to linear for 1-3 cores, but less if you use 4 cores, since the last core also needs to run system processes.

For example, if parfor comes with 100ms of overhead, and the computation in the loop takes 5ms, and if we assume that speed gain is linear up to 4 cores with a coefficient of 1 (i.e. using 4 cores makes the computation 4 times faster), nt needs to be about 30 for you to achieve a speed gain with parfor (150ms with for, 132ms with parfor). If you were to run only 10 iterations, parfor would be slower (50ms with for, 112ms with parfor).

You can calculate the overhead on your machine by comparing execution time with 1 worker vs 0 workers, and you can estimate speed gain by making a liner fit through the execution times with 1 to 4 workers. Then you'll know when it's useful to use parfor.

Besides the bad performance because of the communication overhead (see other answers), there is another reason not to use parfor in this case. Everything which is done within the parfor in this case uses built-in multithreading. Assuming all workers are running on the same PC there is no advantage because a single call already uses all cores of your processor.

继续阅读：parallel-processing parfor performance

MATLAB parfor is slower than for -- what is wrong?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？