开发者

CUDA fast math operations

This is my code,

    __device__ void calculateDT(float *devD, int *devImg, int cntVoxelLi, int *neighVoxels)
   {
    float minV = devD[cntVoxelLi];
   int cv = devImg[cntVoxelLi];
   float v = 0,cuVal = 0;
   int c1=0,d1=0,r1=0;
   GetInd2Sub(cntVoxelLi, r1,c1,d1);

   for(int ind=0;ind<开发者_C百科9;ind++)
   {
    v = pow(float(cv - devImg[neighVoxels[ind]]),2);
    cuVal = devD[neighVoxels[ind]]  + (1-exp(-v/100));
    minV = min(minV, cuVal);
   }
   devD[cntVoxelLi] = minV;
   }

When I run the entire program it takes about 15seconds. But when I remove the

     exp(-v/100)

it takes only 7 seconds. It seems to be this exp operation takes much time. I tries with expf function as well. How can I improve the performance?


The performance difference you ar seeing is mostly the result of compiler optimization. When you remove the exp expression, the variable v becomes unused, and the compiler will remove the calculation of v because it is effectively dead code. So the large drop in execution time is due to the elimination of all of the floating point computation fom the kernel loop, not from the removal of the exp function alone.

As for performance optimizations, the obvious one is to eliminate the use of pow for computing a simple square (the compiler might be doing this itself), and tidy up all of the floating point expressions to eliminate a number of implicit integer-floating point conversions (hint: 0 is an integer, 0. is double precision and 0.f is single precision).

It is difficult to comment on memory transaction performance in the kernel from the code you have posted. The CUDA 4 visual profiler has some useful diagnostics which show whether a piece of code is memory or arithmetic limited. You might find it useful to profile the code and see what it reports.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜