开发者

Is this use of Parallel.ForEach() thread safe?

Essentially, I am working with this:

var data = input.AsParallel();
List<String> output = new List<String>();

Parallel.ForEach<String>(data, line => {
    String outputLine = ""; 
    // ** Do something with "line" and store result in "outputLine" **

    // Additionally, there are so开发者_开发技巧me this.Invoke statements for updating UI

    output.Add(outputLine);
});

Input is a List<String> object. The ForEach() statement does some processing on each value, updates the UI, and adds the result to the output List. Is there anything inherently wrong with this?

Notes:

  • Output order is unimportant

Update:

Based on feedback I've gotten, I've added a manual lock to the output.Add statement, as well as to the UI updating code.


Yes; List<T> is not thread safe, so adding to it ad-hoc from arbitrary threads (quite possibly at the same time) is doomed. You should use a thread-safe list instead, or add locking manually. Or maybe there is a Parallel.ToList.

Also, if it matters: insertion order will not be guaranteed.

This version is safe, though:

var output = new string[data.Count];

Parallel.ForEach<String>(data, (line,state,index) =>
{
    String outputLine = index.ToString();
    // ** Do something with "line" and store result in "outputLine" **

    // Additionally, there are some this.Invoke statements for updating UI
    output[index] = outputLine;
});

here we are using index to update a different array index per parallel call.


Is there anything inherently wrong with this?

Yes, everything. None of this is safe. Lists are not safe for updating on multiple threads concurrently, and you can't update the UI from any thread other than the UI thread.


The documentation says the following about the thread safety of List<T>:

Public static (Shared in Visual Basic) members of this type are thread safe. Any instance members are not guaranteed to be thread safe.

A List(Of T) can support multiple readers concurrently, as long as the collection is not modified. Enumerating through a collection is intrinsically not a thread-safe procedure. In the rare case where an enumeration contends with one or more write accesses, the only way to ensure thread safety is to lock the collection during the entire enumeration. To allow the collection to be accessed by multiple threads for reading and writing, you must implement your own synchronization.

Thus, output.Add(outputLine) is not thread-safe and you need to ensure thread safety yourself, for example, by wrapping the add operation in a lock statement.


When you want the results of a parallel operation, the PLINQ is more convenient than the Parallel class. You started well by converting your input to a ParallelQuery<T>:

ParallelQuery<string> data = input.AsParallel();

...but then you fed the data to the Parallel.ForEach, which treats it as a standard IEnumerable<T>. So the AsParallel() was wasted. It didn't provide any parallelization, only overhead. Here is the correct way to use PLINQ:

List<string> output = input
    .AsParallel()
    .Select(line =>
    {
        string outputLine = ""; 
        // ** Do something with "line" and store result in "outputLine" **
        return outputLine;
    })
    .ToList();

A few differences that you should have in mind:

  1. The Parallel runs the code on the ThreadPool by default, but it's configurable. The PLINQ uses exclusively the ThreadPool.
  2. The Parallel by default has unlimited parallelism (it uses all the available threads of the ThreadPool). The PLINQ uses by default at most Environment.ProcessorCount threads.

Regarding the order of the results, PLINQ doesn't preserve the order by default. In case you want to preserve the order, you can attach the AsOrdered operator.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜