Starting a thread for each inner loop in OpenMP
I'm fairly new to OpenMP and I'm trying to start an individual thread to process each item in a 2D arra开发者_C百科y.
So essentially, this:
for (i = 0; i < dimension; i++) {
for (int j = 0; j < dimension; j++) {
a[i][j] = b[i][j] + c[i][j];
What I'm doing is this:
#pragma omp parallel for shared(a,b,c) private(i,j) reduction(+:diff) schedule(dynamic)
for (i = 0; i < dimension; i++) {
for (int j = 0; j < dimension; j++) {
a[i][j] = b[i][j] + c[i][j];
Does this in fact start a thread for each 2D item or no? How would I test that? If it is wrong, what is the correct way to do it? Thanks!
Note: The code has been greatly simplified
Only the outer loop is parallel in your code sample. You can test by printing omp_get_thread_num()
in the inner loop and you will see that, for a given i
, the thread num is the same (of course, this test is demonstrative rather than definitive since different runs will give different results). For example, with:
#include <stdio.h>
#include <omp.h>
#define dimension 4
int main() {
#pragma omp parallel for
for (int i = 0; i < dimension; i++)
for (int j = 0; j < dimension; j++)
printf("i=%d, j=%d, thread = %d\n", i, j, omp_get_thread_num());
}
I get:
i=1, j=0, thread = 1
i=3, j=0, thread = 3
i=2, j=0, thread = 2
i=0, j=0, thread = 0
i=1, j=1, thread = 1
i=3, j=1, thread = 3
i=2, j=1, thread = 2
i=0, j=1, thread = 0
i=1, j=2, thread = 1
i=3, j=2, thread = 3
i=2, j=2, thread = 2
i=0, j=2, thread = 0
i=1, j=3, thread = 1
i=3, j=3, thread = 3
i=2, j=3, thread = 2
i=0, j=3, thread = 0
As for the rest of your code, you might want to put more details in a new question (it's difficult to tell from the small sample), but for example, you can't put private(j)
when j
is only declared later. It is automatically private in my example above. I guess diff
is a variable that we can't see in the sample. Also, the loop variable i
is automatically private (from the version 2.5 spec - same in the 3.0 spec)
The loop iteration variable in the for-loop of a for or parallel for construct is private in that construct.
Edit: All of the above is correct for the code that you and I have shown, but you may be interested in the following. For OpenMP Version 3.0 (available in e.g. gcc version 4.4, but not version 4.3) there is a collapse
clause where you could write the code as you have, but with
#pragma omp parallel for collapse (2)
to parallelize both for loops (see the spec).
Edit: OK, I downloaded gcc 4.5.0 and ran the above code, but using collapse (2)
to get the following output, showing the inner loop now parallelized:
i=0, j=0, thread = 0
i=0, j=2, thread = 1
i=1, j=0, thread = 2
i=2, j=0, thread = 4
i=0, j=1, thread = 0
i=1, j=2, thread = 3
i=3, j=0, thread = 6
i=2, j=2, thread = 5
i=3, j=2, thread = 7
i=0, j=3, thread = 1
i=1, j=1, thread = 2
i=2, j=1, thread = 4
i=1, j=3, thread = 3
i=3, j=1, thread = 6
i=2, j=3, thread = 5
i=3, j=3, thread = 7
Comments here (search for "Workarounds") are also relevant for work-arounds in version 2.5 if you want to parallelize both loops, but the version 2.5 spec cited above is quite explicit (see the non-conforming examples in section A.35).
You can try of using nested omp parallel fors (after omp_set_nested(1)
call), but they a not supported on all openmp implementations.
So I guess to make some 2D grid and start all thread on grid from single for (example for fixed 4x4 thread grid):
#pragma omp parallel for
for(k = 0; k < 16; k++)
{
int i,j,i_min,j_min,i_max,j_max;
i_min=(k/4) * (dimension/4);
i_max=(k/4 + 1) * (dimension/4);
j_min=(k%4) * (dimension/4);
j_max=(k%4 + 1) * (dimension/4);
for(i=i_min;i<i_max;i++)
for(j=j_min;j<j_max;j++)
f(i,j);
}
精彩评论