When to use Partitioner class?
Can anyone suggest typ开发者_如何学Cical scenarios where Partitioner
class introduced in .NET 4.0 can/should be used?
The Partitioner
class is used to make parallel executions more chunky. If you have a lot of very small tasks to run in parallel the overhead of invoking delegates for each may be prohibitive. By using Partitioner
, you can rearrange the workload into chunks and have each parallel invocation work on a slightly larger set. The class abstracts this feature and is able to partition based on the actual conditions of the dataset and available cores.
Example: Imagine you want to run a simple calculation like this in parallel.
Parallel.ForEach(Input, (value, loopState, index) => { Result[index] = value*Math.PI; });
That would invoke the delegate for each entry in Input. Doing so would add a bit of overhead to each. By using Partitioner
we can do something like this
Parallel.ForEach(Partitioner.Create(0, Input.Length), range => {
for (var index = range.Item1; index < range.Item2; index++) {
Result[index] = Input[index]*Math.PI;
}
});
This will reduce the number of invokes as each invoke will work on a larger set. In my experience this can boost performance significantly when parallelizing very simple operations.
Range partition, as suggested by Brian Rasmussen, is one type of partitioning that should be used when the work is CPU intensive, tends to be small (relative to a virtual method call), many elements must be processed, and is mostly constant when it comes to run time per element.
The other type of partition that should be considered is chunk partitioning. This type of partitioning is also known as a load-balancing algorithm because a worker thread will rarely sit idle while there is more work to do - which is not the case for a range partition.
A chunk partition should be used when the work has some wait states, tends to require more processing per element, or each element can have significantly different work processing times.
One example of this might be reading into memory and processing of 100 files with vastly different sizes. A 1K file will be processed in much less time than a 1mb file. If a range partition is used for this, then some threads could sit idle for some time because they happened to process smaller files.
Unlike a range partition, there is no way to specify the number of elements to be processed by each task - unless you write your own custom partitioner. Another downside to using a chunk partition is that there may be some contention when it goes back to get another chunk since an exclusive lock is used at that point. So, clearly a chunk partition should not be used for short amounts of CPU intensive work.
The default chunk partitioner starts off with a chunk size of 1 element per chunk. After each thread processes three 1-element chunks, the chunk size is incremented to 2 elements per chunk. After three 2-element chunks have been processed by each thread, then the chunk size is incremented again to 3 elements per chunk, and so on. At least this is the way it works according to Dixin Yan, (see the Chunk partitioning section) who works for Microsoft.
By the way, the nice visualizer tool in his blog appears to be the Concurrency Visualizer profile tool. The docs for this tool claim that it can be used to locate performance bottlenecks, CPU under-utilization, thread contention, cross-core thread migration, synchronization delays, DirectX activity, areas of overlapped I/O, and other information. It provides graphical, tabular, and textual data views that show the relationships between the threads in an app and the system as a whole.
Other resources:
MSDN: Custom Partitioners for PLINQ and TPL
Part 5: Parallel Programming - Optimizing PLINQ by Joseph Albahari
To parallelize an operation on a data source, one of the essential steps is to partition the source into multiple sections that can be accessed concurrently by multiple threads. PLINQ and the Task Parallel Library (TPL) provide default partitioners that work transparently when you write a parallel query or ForEach loop. For more advanced scenarios, you can plug in your own partitioner.
Read more here:
精彩评论