HELP! CUDA kernel will no longer launch after using too much memory

2023-03-01 03:44 问答作者：

I'm writing a program that requires the following kernel launch:

dim3 blocks(16,16,16); //grid dimensions
dim3 threads(32,32); //block dimensions
get_gaussian_responses<<<blocks,threads>>>(pDeviceIntegral,itgStepSize,pScaleSpace);

I forgot to free the pScaleSpace array at the end of the program, and then ran the program through the CUDA profiler, which runs it 15 times in succession, using up a lot of memory / causing a lot of fragmentation. Now whenever I run the program, the kernel doesn't even launch. If I look at the list of function calls recorded by the profiler, the kernel is not there. I realize this is a pretty stupid error, but I don't know what I can do at this point to get the program to run again. I have restarted my computer, but that did not help. If I reduce the dimensions of the kernel, it runs fine, but the current dimensions are 开发者_运维百科well within the allowed maximum for my card.

Max threads per block: 1024
Max grid dimensions: 65535,65535,65535

Any suggestions appreciated, thanks in advance!

Try launching with lesser number of threads. If that works, it means that each of your threads is doing a lot of work or using a lot of memory. Thus the maximum possible number of threads cannot possibly be practically launched by CUDA on your hardware.

You may have to make your CUDA code more efficient to be able to launch more threads. You could try slicing your kernel into smaller pieces if it has complex logic inside it. Or get more powerful hardware.

If you compile your code like this:

nvcc -Xptxas="-v" [other compiler options]

the assembler will report the number of local heap memory that the code requires. This can be a useful diagnostic to see what the memory footprint of the kernel is. There is also an API call cudaThreadSetLimit which can be used to control the amount of per thread heap memory which a kernel will try and consume during execution.

Recent toolkits ship with a utility called cuda-memchk, which provides valgrind like analysis of kernel memory access, including buffer overflows and illegal memory usage. It might be that your code is overflowing some memory somewhere and overwriting other parts of GPU memory, leaving the card in a parlous state.

I got it! nVidia NSight 2.0 - which supposedly supports CUDA 4 - changed my CUDA_INC_PATH to use CUDA 3.2. No wonder it wouldn't let me allocate 1024 threads per block. All relief and jubilation aside, that is a really stupid and annoying bug considering I already had CUDA 4.0 RC2 installed.

HELP! CUDA kernel will no longer launch after using too much memory

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？