Parallelism: Subtly different floating point results?
I'm trying to debug my parallelism library for the D programming language. A bug report was recently filed that indicates that the low-order bits of some floating point operations that are performed using tasks are non-deterministic across runs. (If you read the report, note that parallel reduce works under the hood by creating tasks in a deterministic way.)
This doesn't appear to be a rounding mode issue, because I tried setting the rounding mode manually. I'm also pretty sure this is not a concurrency bug. The library is well-tested (including passing a Jinx stress test), the issue is always confined to the low-order bits, and it happens even on single-core machines, where low-level memory model issues are less of a problem. What are some other reasons开发者_如何学C why floating point results might differ depending on what thread the operations are scheduled on?
Edit: I'm doing some printf debugging here and it seems like the results for the individual tasks are sometimes different across runs.
Edit # 2: The following code reproduces this issue in a much simpler way. It sums the terms of an array in the main thread, then starts a new thread to execute the exact same function. The problem is definitely not a bug in my library, because this code doesn't even use my library.
import std.algorithm, core.thread, std.stdio, core.stdc.fenv;
real sumRange(const(real)[] range) {
writeln("Rounding mode: ", fegetround); // 0 from both threads.
return reduce!"a + b"(range);
}
void main() {
immutable n = 1_000_000;
immutable delta = 1.0 / n;
auto terms = new real[1_000_000];
foreach(i, ref term; terms) {
immutable x = ( i - 0.5 ) * delta;
term = delta / ( 1.0 + x * x ) * 1;
}
immutable res1 = sumRange(terms);
writefln("%.19f", res1);
real res2;
auto t = new Thread( { res2 = sumRange(terms); } );
t.start();
t.join();
writefln("%.19f", res2);
}
Output:
Rounding mode: 0
0.7853986633972191094
Rounding mode: 0
0.7853986633972437348
Another Edit
Here's the output when I print in hex instead:
Rounding mode: 0
0x1.921fc60b39f1331cp-1
Rounding mode: 0
0x1.921fc60b39ff1p-1
Also, this only seems to happen on Windows. When I run this code on a Linux VM, I get the same answer for both threads.
ANSWER: It turns out that the root cause is that floating point state is initialized differently on the main thread than on other threads on Windows in D. See the bug report I just filed.
Here's a paper that explains the many reasons the same C code can lead to slightly different results. In your case, the most likely reason is CPU-internal instruction reordering.
It's simply wrong to expect floating-point calculations to be deterministic down to the low-order bits. That's not what floating-point numbers were designed to fulfill.
精彩评论