Why doesn't my kernel fail when I use a little more than 64kb of constant cache? (OpenCL/CUDA)

2023-01-23 19:49 问答作者：

I ran some tests on my kernel which uses constant cache. If I use 16,000 floats (16,000 * 4KB = 64KB) then everything runs smoothly. If I use 16,200 it still runs smoothly. I get errors in my results (not from OpenCL) if I use 16,400 floats. Could it just be that technically there is 64.x KB of constant cache available? Should I even trust my code if I am using exactly 16,000 floats? Usually I expect code to break when you use stuff to the stated limit.

You can and should query this using the OpenCL clGetDeviceInfo API, with the parameter CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE. The OpenCL 1.1 spec says that a conforming implementation has to provide at least 64K bytes, which is probably what your device is implementing.

If you exceed this limit, then OpenCL should either give you an error or tranparently move you constant array into a global memory array for you.

If it's not returning an error, but giving you bad results, that's a bug in your OpenCL implementation. Not too surprising, none of them are very mature yet. You should definitely report the bug to vendor. (Which I assume is NVidia because of your references to CUDA) (After making sure you've got the latest version installed, of course.)

I haven't even glanced at GPU specs to find out which machines do and don't have hard limits of 64KB of constant memory; I'll assume you've made sure that this is in fact the limit on your card.

I will add the observation though that generally GPUs and their CUDA/OpenCL/whatever runtimes aren't very agressive about catching or flagging errors, and certainly don't make an effort to fail if invalid parameters are used. While I've never seen it explicitly stated, my understanding is that this is partly to avoid overhead, but mostly to be as forgiving as possible; in a game, it's better that the monsters arm look funny for a few frames than the entire game die because someone made a single out of bounds access.

For those doing GPGPU programming, this is awkward -- it's up to you to make sure all of your parameters and memory uses are valid, and if not, the results can be weird: sometimes it will work, and often it won't. But such is the way of things. I certainly wouldn't count on things failing reliably, and with some obvious and helpful way if you went a bit over a given memory limit.

继续阅读：caching constants limit opencl

Why doesn't my kernel fail when I use a little more than 64kb of constant cache? (OpenCL/CUDA)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？