3D image indices

2023-04-03 11:26 问答作者：

I have an image of size 512 x 512 x 512. I need to process all the voxels individually. How can I get the thread id to do this? If I use 1D thread ID the number of blocks will exceeds 65536.

    int id = blockIdx.x*blockDim.x + threadIdx.x;

Note :- My card doesnt support for 开发者_JAVA百科the 3D grids

You are able to use 3D indicies in CUDA 4.0 and compute capability 2.0+. Example code:

int blocksInX = (nx+8-1)/8;
int blocksInY = (ny+8-1)/8;
int blocksInZ = (nz+8-1)/8;

dim3 Dg(blocksInX, blocksInY, blocksInZ);
dim3 Db(8, 8, 8);
foo_kernel<<Dg, Db>>(R, nx, ny, nz);

...

__global__ void foo_kernel( float* R, const int nx, const int ny, const int nz )
{
  unsigned int xIndex = blockDim.x * blockIdx.x + threadIdx.x;
  unsigned int yIndex = blockDim.y * blockIdx.y + threadIdx.y;
  unsigned int zIndex = blockDim.z * blockIdx.z + threadIdx.z;

  if ( (xIndex < nx) && (yIndex < ny) && (zIndex < nz) )
  {
    unsigned int index_out = xIndex + nx*yIndex + nx*ny*zIndex;
    ...
    R[index_out] = ...;
  }
}

If your device doesn't support compute capability 2.0, there is some trick:

int threadsInX = 16;
int threadsInY = 4;
int threadsInZ = 4;

int blocksInX = (nx+threadsInX-1)/threadsInX;
int blocksInY = (ny+threadsInY-1)/threadsInY;
int blocksInZ = (nz+threadsInZ-1)/threadsInZ;

dim3 Dg = dim3(blocksInX, blocksInY*blocksInZ);
dim3 Db = dim3(threadsInX, threadsInY, threadsInZ);

foo_kernel<<<Dg, Db>>>(R, nx, ny, nz, blocksInY, 1.0f/(float)blocksInY);

__global__ void foo_kernel(float *R, const int nx, const int ny, const int nz,
                           unsigned int blocksInY, float invBlocksInY)
{

    unsigned int blockIdxz = __float2uint_rd(blockIdx.y * invBlocksInY);
    unsigned int blockIdxy = blockIdx.y - __umul24(blockIdxz, blocksInY);
    unsigned int xIndex = __umul24(blockIdx.x, blockDim.x) + threadIdx.x;
    unsigned int yIndex = __umul24(blockIdxy, blockDim.y) + threadIdx.y;
    unsigned int zIndex = __umul24(blockIdxz, blockDim.z) + threadIdx.z;

    if ( (xIndex < nx) && (yIndex < xIndex) && (zIndex < nz) )
    {
        unsigned int index = xIndex + nx*yIndex + nx*ny*zIndex;
        ...
        R[index] = ...;
    }

}

You could use grids. It gives you much more indexes.

Note that the memory of your PC is not in 3D. It's just the matter of visualization, so you can convert your 3D image into a single pointer.

Array[i][j][z] is same as Array2[ i*cols+j + rows*cols*z];

Now feed the Array2 to CUDA and work in single dimension

If you need a larger grid, CUDA supports 2D grids on all hardware, and the most recent versions of the CUDA toolkit also support 3D grids on current Fermi hardware.

However, it isn't strictly necessary to have such large grids. If each voxel operation is independent, then why not just use a 1D grid, but have each thread process more than one voxel? Not only would such a scheme not need larger 2D or 3D grids, it might well be more efficient because the fixed costs associated with scheduling and initialization of a block can be amortized over multiple voxel calculations.

I used something like this:

In the code define your grid: dim3 altgrid,altthreads; altgrid.x=lx; altgrid.y=ly; altgrid.z=1; altthreads.x=lz; altthreads.y=1; altthreads.z=1;

and in the kernel

int idx = threadIdx.x;
int idy = blockIdx.x ;
int idz = blockIdx.y ;

Since the array in on device is only 1D you retrieve the [idx][idy][idz] element by of a matrix A as A[ind], where ind=idz+lz*(idy+ly*idx);

I hope it helps

3D image indices

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？