Trouble with nested loops and openmp

2023-02-05 08:42 问答作者：

I am having trouble applying openmp to a nested loop like this:

        #pragma omp parallel shared(S2,nthreads,chunk) private(a,b,tid)
    {
        tid = omp_get_thread_num();
        if (tid == 0)
        {
            nthreads = omp_get_num_threads();
            printf("\nNumber of threads = %d\n", nthreads);
        }
        #pragma omp for schedule(dynamic,chunk)
        for(a=0;a<NREC;a++){
            for(b=0;b<NLIG;b++){
                S2=S2+cos(1+sin(atan(sin(sqrt(a*2+b*5)+cos(a)+sqrt(b)))));
            }
        } // end for a
    } /*开发者_JAVA技巧 end of parallel section */

When I compare the serial with the openmp version, the last one gives weird results. Even when I remove #pragma omp for, the results from openmp are not correct, do you know why or can point to a good tutorial explicit about double loops and openmp?

This is a classic example of a race condition. Each of your openmp threads is accessing and updating a shared value at the same time, and there's no guaantee that some of the updates won't get lost (at best) or the resulting answer won't be gibberish (at worst).

The thing with race conditions is that they depend sensitively on the timing; in a smaller case (eg, with smaller NREC and NLIG) you might sometimes miss this, but in a larger case, it'll eventually always come up.

The reason you get wrong answers without the #pragma omp for is that as soon as you enter the parallel region, all of your openmp threads start; and unless you use something like an omp for (a so-called worksharing construct) to split up the work, each thread will do everything in the parallel section - so all the threads will be doing the same entire sum, all updating S2 simultatneously.

You have to be careful with OpenMP threads updating shared variables. OpenMP has atomic operations to allow you to safely modify a shared variable. An example follows (unfortunately, your example is so sensitive to summation order it's hard to see what's going on, so I've changed your sum somewhat:). In the mysumallatomic, each thread updates S2 as before, but this time it's done safely:

#include <omp.h>
#include <math.h>
#include <stdio.h>

double mysumorig() {

    double S2 = 0;
    int a, b;
    for(a=0;a<128;a++){
        for(b=0;b<128;b++){
            S2=S2+a*b;
        }
    }

    return S2;
}


double mysumallatomic() {

    double S2 = 0.;
#pragma omp parallel for shared(S2)
    for(int a=0; a<128; a++){
        for(int b=0; b<128;b++){
            double myterm = (double)a*b;
            #pragma omp atomic
            S2 += myterm;
        }
    }

    return S2;
}


double mysumonceatomic() {

    double S2 = 0.;
#pragma omp parallel shared(S2)
    {
        double mysum = 0.;
        #pragma omp for
        for(int a=0; a<128; a++){
            for(int b=0; b<128;b++){
                mysum += (double)a*b;
            }
        }
        #pragma omp atomic
        S2 += mysum;
    }
    return S2;
}

int main() {
    printf("(Serial)      S2 = %f\n", mysumorig());
    printf("(All Atomic)  S2 = %f\n", mysumallatomic());
    printf("(Atomic Once) S2 = %f\n", mysumonceatomic());
    return 0;
}

However, that atomic operation really hurts parallel performance (after all, the whole point is to prevent parallel operation around the variable S2!) so a better approach is to do the summations and only do the atomic operation after both summations rather than doing it 128*128 times; that's the mysumonceatomic() routine, which only incurs the synchronization overhead once per thread rather than 16k times per thread.

But this is such a common operation that there's no need to implment it yourself. One can use an OpenMP built-in functionality for reduction operations (a reduction is an operation like calculating a sum of a list, finding the min or max of a list, etc, which can be done one element at a time only by looking at the result so far and the next element) as suggested by @ejd. OpenMP will work and is faster (it's optimized implementation is much faster than what you can do on your own with other OpenMP operations).

As you can see, either approach works:

$ ./foo
(Serial)      S2 = 66064384.000000
(All Atomic)  S2 = 66064384.000000
(Atomic Once) S2 = 66064384.00000

The problem isn't with double loops but with variable S2. Try putting a reduction clause on your for directive:

#pragma omp for schedule(dynamic,chunk) reduction(+:S2)

继续阅读：c loops openmp

Trouble with nested loops and openmp

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？