Is this use of Parallel.ForEach() thread safe?
Essentially, I am working with this:
var data = input.AsParallel();
List<String> output = new List<String>();
Parallel.ForEach<String>(data, line => {
String outputLine = "";
// ** Do something with "line" and store result in "outputLine" **
// Additionally, there are so开发者_开发技巧me this.Invoke statements for updating UI
output.Add(outputLine);
});
Input is a List<String>
object. The ForEach()
statement does some processing on each value, updates the UI, and adds the result to the output
List
. Is there anything inherently wrong with this?
Notes:
- Output order is unimportant
Update:
Based on feedback I've gotten, I've added a manual lock
to the output.Add
statement, as well as to the UI updating code.
Yes; List<T>
is not thread safe, so adding to it ad-hoc from arbitrary threads (quite possibly at the same time) is doomed. You should use a thread-safe list instead, or add locking manually. Or maybe there is a Parallel.ToList
.
Also, if it matters: insertion order will not be guaranteed.
This version is safe, though:
var output = new string[data.Count];
Parallel.ForEach<String>(data, (line,state,index) =>
{
String outputLine = index.ToString();
// ** Do something with "line" and store result in "outputLine" **
// Additionally, there are some this.Invoke statements for updating UI
output[index] = outputLine;
});
here we are using index
to update a different array index per parallel call.
Is there anything inherently wrong with this?
Yes, everything. None of this is safe. Lists are not safe for updating on multiple threads concurrently, and you can't update the UI from any thread other than the UI thread.
The documentation says the following about the thread safety of List<T>
:
Public static (Shared in Visual Basic) members of this type are thread safe. Any instance members are not guaranteed to be thread safe.
A List(Of T) can support multiple readers concurrently, as long as the collection is not modified. Enumerating through a collection is intrinsically not a thread-safe procedure. In the rare case where an enumeration contends with one or more write accesses, the only way to ensure thread safety is to lock the collection during the entire enumeration. To allow the collection to be accessed by multiple threads for reading and writing, you must implement your own synchronization.
Thus, output.Add(outputLine)
is not thread-safe and you need to ensure thread safety yourself, for example, by wrapping the add operation in a lock
statement.
When you want the results of a parallel operation, the PLINQ is more convenient than the Parallel
class. You started well by converting your input
to a ParallelQuery<T>
:
ParallelQuery<string> data = input.AsParallel();
...but then you fed the data
to the Parallel.ForEach
, which treats it as a standard IEnumerable<T>
. So the AsParallel()
was wasted. It didn't provide any parallelization, only overhead. Here is the correct way to use PLINQ:
List<string> output = input
.AsParallel()
.Select(line =>
{
string outputLine = "";
// ** Do something with "line" and store result in "outputLine" **
return outputLine;
})
.ToList();
A few differences that you should have in mind:
- The
Parallel
runs the code on theThreadPool
by default, but it's configurable. The PLINQ uses exclusively theThreadPool
. - The
Parallel
by default has unlimited parallelism (it uses all the available threads of theThreadPool
). The PLINQ uses by default at mostEnvironment.ProcessorCount
threads.
Regarding the order of the results, PLINQ doesn't preserve the order by default. In case you want to preserve the order, you can attach the AsOrdered
operator.
精彩评论