开发者

Keeping unused variables in CUDA

I made some kernels for testing bandwidth and they do no useful computations. A minimal example is

__global__ void testKernel(float* a) 
{
    unsigned int i = blockIdx.x*blockDim.x + threadIdx.x;
    float x;
    x = a[i];
}

When I compile, I get (not surprisingly)

warning: variable "x" was set but never used

and the kernel runs as quickly as an empty kernel:

__global__ void donothing() 
{
}

This indicates that the read of a[i] has been optimized out.

I have tried tricks such as

volatile float x;

if(x);

(void)(x;)

and they suppress the warning, but the kernel still finishes too quickly.

How can I make sure that the useless instructions actually get executed?

I found the option CU_JIT_OPTIMIZATION_LEVEL but google provides mostly links to the documentation and not how to use it. Woul开发者_如何学Cd this option help me and how do I use it?


Try introducing a branch which stores the variable:

__global__ void testKernel(float* a, float *b) 
{
    unsigned int i = blockIdx.x*blockDim.x + threadIdx.x;
    float x;
    x = a[i];

    if(b)
    {
      *b = x;
    }
}

The cost of the branch compared to the cost of memory transfer is negligible.

At the kernel launch site, simply pass a null pointer:

testKernel<<<...>>>(a, static_cast<float*>(0));

nvcc will not perform constant folding at this granularity, so your load should not be removed because the compiler cannot prove it is useless.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜