开发者

Nested loops to CUDA

I want to port my c code to CUDA. The main computational part contains 3 for nested loops:

for (int i=0; i< Nx;i开发者_JAVA技巧++){
  for (int j=0;j<Ncontains[i];j++){
    for (int k=0;k< totalVoxels;k++){
          .......
   }
  }
}

How can I translate that to my CUDA kernel? With two for loops I could do something like:

int n= blockIdy.y * blockDim.y + threadIdx.y;
int i= blockIdx.x * blockDim.x + threadIdx.x;

But how can I initially this get running?


Many ways you can do it, One of them is:

for (int i=blockIdx.x; i< Nx; i += gridDim.x){
  for (int j=threadIdx.y; j<Ncontains[i]; j+= blockDim.y){
    for (int k=threadIdx.x; k< totalVoxels; k += blockDim.x){
          .......
   }
  }
}

The above you would call:

// nx,ny block dimensions
kernel <<< dim3(nBlocks), dim3(nx, ny) >>> (...);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜