How to treat 64-bit words on a CUDA device?
I'd like to handle directly 64-bit words on the CUDA platform (eg. uint64_t
vars).
I understand, however, that addressing space, registers and the SP architecture are all 32-bit based.
I actually found this to work correctly (on my CUDA cc1.1 card):
__global__ void test64Kernel( uint64_t *word )
{
(*word) <<= 56;
}
but I don't know, for example, how this affects registers usage and the operations per clock cyc开发者_高级运维le count.
Whether addresses are 32-bit or anything else does not affect what data types you can use. In your example you have a pointer (32-bit, 64-bit, 3-bit (!) - doesn't matter) to a 64-bit unsigned integer.
64-bit integers are supported in CUDA but of course for every 64-bit value you are storing twice as much data as a 32-bit value and so will use more registers and arithmetic operations will take longer (adding two 64-bit integers will just expand it out onto the smaller datatypes using carries to push into the next sub-word). The compiler is an optimising compiler, so will try to minimise the impact of this.
Note that using double precision floating point, also 64-bit, is only supported in devices with compute capability 1.3 or higher (i.e. 1.3 or 2.0 at this time).
精彩评论