Synchronisation construct inside pragma for
I have a program block like:
for (iIndex1=0; iIndex1 < iSize; iIndex1++)
{
for (iIndex2=iIndex1+1; iIndex2 < iSize; iIndex2++)
{
iCount++开发者_StackOverflow;
fDist =(*this)[iIndex1].distance( (*this)[iIndex2] );
m_oPDF.addPairDistance( fDist );
if ((bShowProgress) && (iCount % 1000000 == 0))
xyz_exception::ui()->progress( iCount, (size()-1)*((size()-1))/2 );
}
}
}
}
I have tried parallelising the inner and outer loop and by putting iCount in a critical region. What would be the best approach to parallelise this? If I wrap iCount with omp single or omp atomic then the code gives an error and I figured out that would be invalid inside omp for. I guess I am adding many extraneous stuffs to paralellise this. Need some advice...
Thanks,
Sayan
If I interpret your intentions correctly you want to use iCount to tell your program when (every 10^6 operations) to update a UI ? And iCount is global, all the threads are to share the value and you want to maintain its consistency ?
I would search for a way to replace this global counter with counters private to each thread and have the threads send a message to update the UI independently of each other. If you insist on using a global counter, you are going to have to, somehow, synchronise across threads, which will be a performance hit. Yes, you could write your program that way but I don't recommend it.
If you don't like the idea of all the threads sending messages to the UI perhaps just one thread could do that; if one thread is 1/4 of the way through the program, so are the other threads (approximately).
Thanks again Mark. I tried the approaches that you have suggested. I have put reduction(+:iCount) and also tried wrapping iCount++ around pragma critical, and yes it is a performance hit (also I could see no speedup). Also, I have let one thread handle iCount, but the approaches I made results in no speedup.
I expected that if I put a pragma for around the inner loop, and declare iCount as a reduction variable, I would notice at least some speedup. My aim is the parallel execution of these statements for an Index1, Index2 pair:
fDist =(*this)[iIndex1].distance( (*this)[iIndex2] );
m_oPDF.addPairDistance( fDist );
which could noticeably impact the program run time.
Many thanks Mark. I removed iCount and made the outer loop parallel, but I am digging the code since I am observing no speedup still when compared to the serial version.
I would like to take this opportunity to get a basic fact clarified...in a nested loop environment like the above...which one could be generally better:
Making the inner loop parallel
pragma omp parallel
for(...i...)
pragma omp for
for(...j...)Making the outer loop parallel, (just a ...pragma parallel for... before the outer loop)
Using Collapse (for Omp 3.0)
Thanks
Sayan
精彩评论