printf inside CUDA global function

2022-12-18 10:48 问答作者：

I am currently writing a matrix multiplication on a GPU and would like to debug my code, but since I can not use printf inside a device function, is there something else I can do to see what is going on inside that function. This my current function:

__global__ void MatrixMulKernel(Matrix Ad, Matrix Bd, Matrix Xd){

    int tx = threadIdx.x;
    int ty = threadIdx.y;

    int bx = blockIdx.x;
    int by = blockIdx.y;

    float sum = 0;

    for( int k = 0; k < Ad.width ; ++k){
        float Melement = Ad.elements[ty * Ad.width + k];
        float Nelement = Bd.elements[k * Bd.width + tx];
        sum += Melement * Nelement;
    }

    Xd.elements[ty * Xd.width +开发者_开发问答 tx] = sum;
}

I would love to know if Ad and Bd is what I think it is, and see if that function is actually being called.

CUDA now supports printfs directly in the kernel. For formal description see Appendix B.16 of the CUDA C Programming Guide.

EDIT

To avoid misleading people, as M. Tibbits points out printf is available in any GPU of compute capability 2.0 and higher.

END OF EDIT

You have choices:

Use a GPU debugger, i.e. cuda-gdb on Linux or Nexus on Windows
Use cuprintf, which is available for registered developers (sign up here)
Manually copy the data that you want to see, then dump that buffer on the host after your kernel has completed (remember to synchronise)

Regarding your code snippet:

Consider passing the Matrix structs in via pointer (i.e. cudaMemcpy them to the device, then pass in the device pointer), right now you will have no problem but if the function signature gets very large then you may hit the 256 byte limit
You have inefficient reads from Ad, you will have a 32-byte transaction to the memory for each read into Melement - consider using shared memory as a staging area (c.f. the transposeNew sample in the SDK)

cuprintf
try Nexus http://developer.nvidia.com/object/nexus.html

by the way..

use shared memory
multiply outside of the loop
Look at this: http://www.seas.upenn.edu/~cis665/LECTURES/Lecture11.ppt

See "Formatted output" (currently B.17) section of CUDA C Programming Guide.

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

继续阅读：c gpu

printf inside CUDA global function

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？