Synchronisation construct inside pragma for

2022-12-31 05:32 问答作者：

I have a program block like:

    for (iIndex1=0; iIndex1 < iSize; iIndex1++)
    {
        for (iIndex2=iIndex1+1; iIndex2 < iSize; iIndex2++)
        {   
            iCount++开发者_StackOverflow;
            fDist =(*this)[iIndex1].distance( (*this)[iIndex2] );
            m_oPDF.addPairDistance( fDist );

            if ((bShowProgress) && (iCount % 1000000 == 0))
                xyz_exception::ui()->progress( iCount, (size()-1)*((size()-1))/2 );

        }
    }
} 
}

I have tried parallelising the inner and outer loop and by putting iCount in a critical region. What would be the best approach to parallelise this? If I wrap iCount with omp single or omp atomic then the code gives an error and I figured out that would be invalid inside omp for. I guess I am adding many extraneous stuffs to paralellise this. Need some advice...

Thanks,

Sayan

If I interpret your intentions correctly you want to use iCount to tell your program when (every 10^6 operations) to update a UI ? And iCount is global, all the threads are to share the value and you want to maintain its consistency ?

I would search for a way to replace this global counter with counters private to each thread and have the threads send a message to update the UI independently of each other. If you insist on using a global counter, you are going to have to, somehow, synchronise across threads, which will be a performance hit. Yes, you could write your program that way but I don't recommend it.

If you don't like the idea of all the threads sending messages to the UI perhaps just one thread could do that; if one thread is 1/4 of the way through the program, so are the other threads (approximately).

Thanks again Mark. I tried the approaches that you have suggested. I have put reduction(+:iCount) and also tried wrapping iCount++ around pragma critical, and yes it is a performance hit (also I could see no speedup). Also, I have let one thread handle iCount, but the approaches I made results in no speedup.

I expected that if I put a pragma for around the inner loop, and declare iCount as a reduction variable, I would notice at least some speedup. My aim is the parallel execution of these statements for an Index1, Index2 pair:

        fDist =(*this)[iIndex1].distance( (*this)[iIndex2] );
        m_oPDF.addPairDistance( fDist );

which could noticeably impact the program run time.

Many thanks Mark. I removed iCount and made the outer loop parallel, but I am digging the code since I am observing no speedup still when compared to the serial version.

I would like to take this opportunity to get a basic fact clarified...in a nested loop environment like the above...which one could be generally better:

Making the inner loop parallel

pragma omp parallel
for(...i...)
pragma omp for
for(...j...)
Making the outer loop parallel, (just a ...pragma parallel for... before the outer loop)
Using Collapse (for Omp 3.0)

Thanks
Sayan

继续阅读：openmp synchronization

Synchronisation construct inside pragma for

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？