开发者

unroll nested for loops in C++

How would I unroll the following nested loops?

for(k = begin; k != end; ++k) {
 for(j = 0; j < Emax; ++j) {
  for(i = 0; i < N; ++i) { 
   if (j >= E[i]) continue; 
   array[k] += foo(i, tr[k][开发者_开发知识库i], ex[j][i]);
  }
 }
}

I tried the following, but my output isn't the same, and it should be:

for(k = begin; k != end; ++k) {
 for(j = 0; j < Emax; ++j) {
  for(i = 0; i+4 < N; i+=4) { 
   if (j >= E[i]) continue; 
   array[k] += foo(i, tr[k][i], ex[j][i]);
   array[k] += foo(i+1, tr[k][i+1], ex[j][i+1]);
   array[k] += foo(i+2, tr[k][i+2], ex[j][i+2]);
   array[k] += foo(i+3, tr[k][i+3], ex[j][i+3]);
  }
  if (i < N) {
   for (; i < N; ++i) {
    if (j >= E[i]) continue; 
    array[k] += foo(i, tr[k][i], ex[j][i]);
   }
  }
 }
}

I will be running this code in parallel using Intel's TBB so that it takes advantage of multiple cores. After this is finished running, another function prints out what is in array[] and right now, with my unrolling, the output isn't identical. Any help is appreciated.

Update: I fixed it. I used the answer for this question to do the unrolling... the output wasn't matching because I wasn't doing array[k] = 0; after the first for loop.

Thanks, Hristo


   if (j >= E[i]) continue; 
   array[k] += foo(i, tr[k][i], ex[j][i]);
   array[k] += foo(i+1, tr[k][i+1], ex[j][i+1]);
   array[k] += foo(i+2, tr[k][i+2], ex[j][i+2]);
   array[k] += foo(i+3, tr[k][i+3], ex[j][i+3]);

versus

if (j >= E[i]) continue; 
array[k] += foo(i, tr[k][i], ex[j][i]);

Screening conditions are not identical

a better approach to screening (eliminate branching):

array[k] += (j < E[i])*foo(i, tr[k][i], ex[j][i]);

also, you need to guarantee N is divisible by 4 otherwise you may overshoot. alternatively, truncate N to be divisible by four (N - N%4)


I think that the if (j >= E[i]) continue; is your problem. In the original, this test is run for every index i. In your unrolled version, it is only tested for every fourth index. Try the following:

for (i = 0; i < N; /*advanced in loop*/) {
    if (j >= E[i]) continue;
    array[k] += foo(i, tr[k][i], ex[j][i]); ++i;
    if (j >= E[i]) continue;
    array[k] += foo(i, tr[k][i], ex[j][i]); ++i;
    if (j >= E[i]) continue;
    array[k] += foo(i, tr[k][i], ex[j][i]); ++i;
    if (j >= E[i]) continue;
    array[k] += foo(i, tr[k][i], ex[j][i]); ++i;
}
while (i < N) {
    if (j >= E[i]) {
        ++i; // missing in original version
        continue;
    }
    array[k] += foo(i, tr[k][i], ex[j][i]);
    ++i;
}

Edit: I forgot to increment an index in the original version that was causing an infinite loop when j >= E[i].

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜