开发者

Parallel processing and splitting a list into two - does it speed up the entire operation?

I have an operation that requires hundreds of thousands of matrices (2D int arrays) to be checked until every matrix has been processed, or a certain condition is met. Essentially

For each matrix m in matricesList:
   if Function(m) returns true: exit
   else continue

I am not interested in optimizing Function(). What I am curious to know is whether or not I can split the list of matrices into two lists (one for each processor) and have the operation be performed on it. If either process gets "True" as a result from Function() I would like both processes to terminate and return true.

Would splitting the list and having each list be processed at the same 开发者_如何转开发time? Would there be a significant enough performance bonus in doing this? What are some problems I might run into if I choose to implement this?

I am not sure if Function() would cause problems with the parallelism, but all it does is iterate through the matrix m and comparing it to another matrix. If the condition it is looking for is met, then it returns true.


As long as the Function() method itself doesn't generate too much contention (resource locking) - which from your description it doesn't sound like, and the expense of the function method (in resources) is great enough (basically there is overhead for partitioning the list, creating additional threads, etc. - so your gain has to amortize this cost) then yes.

c# has some built in methods to handle situations like this, you could use PLINQ

var y = matricesList.AsParallel().Where(m =>Function(m)).First();

(this will evaluate every item on the list and return the first value matched) - so if there is an evaluation side effect that you desire this is what you want

If you want the PLINQ to abort after it hits one match you just add a .Take(1) in

var y = matricesList.AsParallel().Where(m =>Function(m)).Take(1).First();

or the Task Parallel Library

    Parallel.ForEach(matricesList, (m, parallelLoopState) =>
                            {
                                if (Function(m))
                                {
                                    parallelLoopState.Stop();
                                }
                            });

This will cause each item on the list to be evaluated until Function(m) returns true, then abort and stop evaluating further items.

Both methods will handle the proper partitioning, using the correct number of threads for the number of processors on the system, etc.


If you are purely CPU-bound (i.e. there is no unrelated cost we don't know about such as disk/network IO which might not scale linearly), and if each test is independent (no lock conflicts), then pretty-much yes this will help. It is rarely quite linear (due to overheads), but close enough. If you are on a web-server you might want to stay fairly reserved as it is already highly threaded, but if you have the whole machine, use all the cores you have! Indeed, Parallel.ForEach etc will do all this for you if you don't want the code pain.

We've used tricks like this successfully here on the stackoverflow site, although since t is a web-server we limit the parallelism to 2 threads per request, which neatly halves the time. In out case, due to the need to combine the results afterwards using more treads was actually counter-productive - 2 was our sweet-spot. Bit if you don't need to combine the results in any clever way you can up the parallelism a bit.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜