OpenMP "parallel for" is weird in specific program
I started programming about a month ago, and recently I've been trying to learn multi-core development with OpenMP in C++ programs. I can't get OpenMP to work correctly for me in a large program I've written. I want to execute the following loop in parallel.
for(int iAngle=0; iAngle<int(nAngles); iAngle++){
for(int k=0; k<int(nRadSamples); k++){
int x=coordX[k][iAngle];
int y=coordY[k][iAngle];
for(int i=yMin; i<int(yMax); i++){
int iy=i+y;
for(int j=xMin; j<int(xMax); j++){
response[iAngle][i][j] += inputSlice[iy][j+x];
}
}
}
}
coordX, coordY, response, and inputSlice are
vector<vector<int> >
vector<vector<int> >
vector<vector<vector<float> > >
vector<vector<float> >
respectively. There is considerable slowdown (50 second runtime slows to 75 second runtime) upon adding
#pragma omp parallel for
as the line above this loop.
I don't think my problems come from accessing the response and inputSlice shared variables, because even basic OpenMP code executes strangely in this particular program. For example the basic program
#include <stdio.h>
#include <omp.h>
int main() {
//////////////////////
#pragma omp parallel for
for(int i = 0; i<2; i++){
int thread = omp_get_thread_num() + 1 ;
int numThreads = omp_get_num_threads();
printf("Thread %d of %d printing %d\n", thread, numThreads,i);
}
////////////////////
return 0;
}
outputs
Thread 2 of 8 printing 1
Thread 1 of 8 printing 0
but when I copy and paste the code inside the /////////////// borders into the
int main(int argc, char* argv[])
function of my large program, it outputs
Thread 1 of 1 printing 0
Thread 1 of 1 printing开发者_运维百科 1
Thread 1 of 1 printing 0
Thread 1 of 1 printing 1
Thread 1 of 1 printing 0
Thread 1 of 1 printing 1
Thread 1 of 1 printing 0
Thread 1 of 1 printing 1
Thread 1 of 1 printing 0
Thread 1 of 1 printing 1
Thread 1 of 1 printing 0
Thread 1 of 1 printing 1
Thread 1 of 1 printing 0
Thread 1 of 1 printing 1
Thread 1 of 1 printing 0
Thread 1 of 1 printing 1
It's as if each thread executes the entire for loop separately, without seeing eachother, which would understandbly slow down my program's runtime. 'i' has not been declared in the scope of the main function of my program.
I am linking to other libraries when compiling my program including adding -pthreads, -lguide, and -ltiff, in addition to -fopenmp, when compiling with gcc4.4.
Any help with this particular problem, or my coding style in general would be greatly appreciated! I've been banging my head against my keyboard for awhile now.
I figured out the problem. I was also linking to the the intel library libguide.so which it seems is known to be incompatible with gnu's OpenMP implementation. I simply changed this linker option to link to libiomp5.so (after downloading the library), which is compatible with -fopenmp, and now I have a blazing fast program!!
Thanks for the support!
It hard to see what's going wrong without analyzing the complete code. But a few hints:
The slowdown can come from
- synchronizing locks when accessing shared variables,
- doing the same work multiple times according to numThreads,
- copying lots of data to all threads.
The for loop counter i should be declared inside the loop, so OpenMP can split this loop according to the number of available threads. Another possibility is to declare it
#pragma parallel for private(i)
Another problem seems to come from a race condition, so all threads print thread number 1.
精彩评论