Parallel.ForEach questions
I am using a Parallel.ForEach loop in C# / VS2010 to do processing and I have a couple of questions.
First of all I have a process that needs to extract information from a remote webservice and then needs to build images (GDI) on the fly.
I have a class that encapsulates all of the functionality into a single object with two main methods Load() and CreateImage() with all the GDI management / WebRequests "blackboxed" inside this object.
I then create a GenericList that contains all the objects that need to be processed and I iterate through the list using the following code:
try
{
Parallel.ForEach(MyLGenericList, ParallelOptions, (MyObject, loopState) =>
{
MyObject.DoLoad();
MyObject.CreateImage();
MyObject.Dispose();
if (loopState.ShouldExitCurrentIteration || loopState.IsExceptional)
loopState.Stop();
开发者_运维知识库 });
}
catch (OperationCanceledException ex)
{
// Cancel here
}
catch (Exception ex)
{
throw ex;
}
Now my questions are:
- Given that there could be ten thousand items in the list to parse, is the above code the best way to approach this? Any other ideas more then welcome
- I have an issue whereby when I start the process the objects are created / loaded and images created very fast but after around six hundred objects the process starts to crawl. It doesn eventually finish, is this normal?
Thanks in advance :) Adam
I am not sure that downloading data in parallel is a good idea since it will block a lot of threads. Split your task into a producer and a consumer instead. Then you can parallelize each of them separately.
Here is an example of a single producer and multiple consumers.
(If the consumers are faster than the producer you can just use a normal foreach instead of parallel.ForEach)
var sources = BlockingCollection<SourceData>();
var producer = Task.Factory.CreateNew(
() => {
foreach (var item in MyGenericList) {
var data = webservice.FetchData(item);
sources.Add(data)
}
sources.CompleteAdding();
}
)
Parallel.ForEach(sources.GetConsumingPartitioner(),
data => {
imageCreator.CreateImage(data);
});
(the GetConsumingPartitioner extension is part of the ParallelExtensionsExtras)
Edit A more complete example
var sources = BlockingCollection<SourceData>();
var producerOptions = new ParallelOptions { MaxDegreeOfParallelism = 5 };
var consumerOptions = new ParallelOptions { MaxDegreeOfParallelism = -1 };
var producers = Task.Factory.CreateNew(
() => {
Parallel.ForEach(MyLGenericList, producerOptions,
myObject => {
myObject.DoLoad()
sources.Add(myObject)
});
sources.CompleteAdding();
});
Parallel.ForEach(sources.GetConsumingPartitioner(), consumerOptions,
myObject => {
myObject.CreateImage();
myObject.Dispose();
});
With this code you can optimize the amount of parallel downloads while keeping the cpu busy with the image processing.
The Parallel.ForEach
method with the default settings works best when the work that the loop body does is CPU bound. If you are blocking or hand off the work to another party synchronously, the scheduler thinks that the CPU still isn't busy and keeps cramming more tasks, trying hard to use all the CPUs in the system.
In your case you need to just pick a reasonable number of overlapping downloads to occur in parallel and set that value in your ForEach
options because you aren't going to saturate the CPUs with your loop.
精彩评论