开发者

Shared memory issue while debugging

I am trying to use Nsight to debug the following code:

__device__ void change(int shared[])
{
    if(threadIdx.x<10)
        shared[threadIdx.x]=threadIdx.x;
}
__global__ void MyK()
{
    int shared[10]; 
    change(shared);
    __syncthreads();
}

I am calling my kernel in the main method like this :

cudaSetDevice(1);
MyK<&l开发者_StackOverflow中文版t;<1,20>>>();

When I put a breakpoint before change(shared), I can see that the shared array is created and its values are set to 0. However, if I put the breakpoint after __syncthreads(), the debugger shows the following error:

cannot resolve name shared

Can't I pass my shared array to another device function?


The reason why you see the "Cannot resolve name shared" in the memory watch window is because shared array is being optimised out by the compiler since it is not being used at all by any part of your kernel after change(shared). Like user586831 mentioned earlier, try outputing the value as your return value for your device function.

Also on another note, not sure if you really meant "_ shared _" array or referring to the array by its name 'shared'. Anyway you're not using shared memory in your code above. "int shared" is just a normal integer array type. You need to specify the _ shared _ qualifier in order to declare shared memory. For e.g. extern _ shared _ int shared[10]


Is that the actual code or you omitted _ _ shared _ _ from the buffer declaration ?

Keep also in mind that the _ _device _ _ functions get inlined by the compiler and that the debugger can stop only at some point in the whole process. Try to use a kernel of a multiple of at least 16 or 32 threads or otherwise you are not running a full SP and that might trick the debugger.


Calling __syncthreads() for some and not all threads can cause a deadlock. threadIdx.x < 10 calls _syncthreads() As previously mentioned you are not using shared memory here. The compiler is clever if you are not using the value afterwards the memory location can become invalid. Try outputing the value as your return value for your device function. Should work fine especially if you move/remove __syncthreads().

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜