开发者

Strange float behaviour in OpenMP

I am running the following OpenMP code

        #pragma omp parallel shared(S2,nthreads,chunk) private(a,b,tid)
    {
        tid = omp_get_thread_num();
        if (tid == 0)
        {
            nthreads = omp_get_num_threads();
            printf("\nNumber of threads = %d\n", nthreads);
        }
        #pragma omp for 开发者_如何学Cschedule(dynamic,chunk) reduction(+:S2)
        for(a=0;a<NREC;a++){
            for(b=0;b<NLIG;b++){
                S2=S2+cos(1+sin(atan(sin(sqrt(a*2+b*5)+cos(a)+sqrt(b)))));
            }
        } // end for a
    } /* end of parallel section */

And for NREC=NLIG=1024 and higher values, in a 8 core board, I get up to 7 speedup. The problem is that if I compare the final results for variable S2, it differs between 1 to 5% to the exact results obtained in the serial version. What could be the reason? Should I use some specific compilation options to avoid this strange float behaviour ?


The order of additions/subtractions of floating-point numbers can affect the accuracy.

To take a simple example, let's say that your machine stores 2 decimal digits, and that you're computing the value of 1 + 0.04 + 0.04.

  • If you do the left addition first, you get 1.04, which is rounded to 1. The second addition will give 1 again, so the final result is 1.

  • If you do the right addition first, you get 0.08. Added to 1, this gives 1.08 which is rounded to 1.1.

For maximum accuracy, it's best to add values from small to large.

Another cause could be that float registers on the CPU may contain more bits than floats in main memory. Hence, if some intermediate result is cached in a register, it is more accurate, but if it gets swapped out to memory it gets truncated.

See also this question in the C++ FAQ.


It is known that machine floating-point operations are flawed when two large values are subtracted (or two large values with different signs are added) yielding the small difference as a result. Thus, summing an oscillated-sign sequences may introduce severe error on each iteration. Another flawed case is when magnitudes of two operands differ much - the lesser operand virtually cancels itself.
It might be useful to separate positive and negative operands, and perform summation of each group separately, then add (subtract) the group results.
If accuracy is crucial, it would probably require the need of pre-sorting of each of the groups, and perform two sums inside each. First sum will go from the center towards the largest (head), second will go from the smallest (tail) towards the center. Resultant group sum will be the sum of the partial runs.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜