开发者

cuda - kernel optimization

I created a simple particle system. I have a device with compute capability 2.1. What could I change to optimize the kernel?

I assume that variables tPos and tVel are stored in the registers.

__global__ void particles_kernel(float4 *vbo, float4 *pos, float4 *vel)
{
     int tid = blockIdx.x * blockDim.x + threadIdx.x;

     float4 tPos = pos[tid];
     float4 tVel = vel[tid];

     tPos.x += tVel.x;
     tPos.y += tVel.y;
     tPos.z += tVel.z;

     if(tPos.x < -2.0f)
     {
         tVel.x = -tVel.x;
     }
     else if(tPos.x > 2.0f)
     {
         tVel.x = -tVel.x;
     }


     if(tPos.y < -2.0f)
     {
         tVel.y = -tVel.y;
     }
     else if(tPos.y > 2.0f)
     {
         tVel.y = -tVel.y;
     }


     if(tPos.z < -2.0f)
 开发者_运维百科    {
         tVel.z = -tVel.z;
     }
     else if(tPos.z > 2.0f)
     {
         tVel.z = -tVel.z;
     }


     pos[tid] = tPos;
     vel[tid] = tVel;


     vbo[tid] = make_float4(tPos.x, tPos.y, tPos.z, tPos.w);
}


Unless I am missing something, your clamping code can be simplified like this:

if (fabsf(tVel.x) > 2.0f) tVel.x = -tVel.x;
if (fabsf(tVel.y) > 2.0f) tVel.y = -tVel.y;
if (fabsf(tVel.z) > 2.0f) tVel.z = -tVel.z;

However given the relatively small amont of computation, this change will probably not improve performance as the code appears to be memory bound (you are streaming through the data). Maybe there is additional computation elsewhere in your app that you could combine with this computation to increase the computational density?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜