Why doesn't my kernel fail when I use a little more than 64kb of constant cache? (OpenCL/CUDA)
I ran some tests on my kernel which uses constant cache. If I use 16,000 floats (16,000 * 4KB = 64KB) then everything runs smoothly. If I use 16,200 it still runs smoothly. I get errors in my results (not from OpenCL) if I use 16,400 floats. Could it just be that technically there is 64.x KB of constant cache available? Should I even trust my code if I am using exactly 16,000 floats? Usually I expect code to break when you use stuff to the stated limit.
You can and should query this using the OpenCL clGetDeviceInfo API, with the parameter CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE. The OpenCL 1.1 spec says that a conforming implementation has to provide at least 64K bytes, which is probably what your device is implementing.
If you exceed this limit, then OpenCL should either give you an error or tranparently move you constant array into a global memory array for you.
If it's not returning an error, but giving you bad results, that's a bug in your OpenCL implementation. Not too surprising, none of them are very mature yet. You should definitely report the bug to vendor. (Which I assume is NVidia because of your references to CUDA) (After making sure you've got the latest version installed, of course.)
I haven't even glanced at GPU specs to find out which machines do and don't have hard limits of 64KB of constant memory; I'll assume you've made sure that this is in fact the limit on your card.
I will add the observation though that generally GPUs and their CUDA/OpenCL/whatever runtimes aren't very agressive about catching or flagging errors, and certainly don't make an effort to fail if invalid parameters are used. While I've never seen it explicitly stated, my understanding is that this is partly to avoid overhead, but mostly to be as forgiving as possible; in a game, it's better that the monsters arm look funny for a few frames than the entire game die because someone made a single out of bounds access.
For those doing GPGPU programming, this is awkward -- it's up to you to make sure all of your parameters and memory uses are valid, and if not, the results can be weird: sometimes it will work, and often it won't. But such is the way of things. I certainly wouldn't count on things failing reliably, and with some obvious and helpful way if you went a bit over a given memory limit.
精彩评论