开发者

How to treat 64-bit words on a CUDA device?

I'd like to handle directly 64-bit words on the CUDA platform (eg. uint64_t vars).

I understand, however, that addressing space, registers and the SP architecture are all 32-bit based.

I actually found this to work correctly (on my CUDA cc1.1 card):

__global__ void test64Kernel( uint64_t *word )
{
    (*word) <<= 56;
}

but I don't know, for example, how this affects registers usage and the operations per clock cyc开发者_高级运维le count.


Whether addresses are 32-bit or anything else does not affect what data types you can use. In your example you have a pointer (32-bit, 64-bit, 3-bit (!) - doesn't matter) to a 64-bit unsigned integer.

64-bit integers are supported in CUDA but of course for every 64-bit value you are storing twice as much data as a 32-bit value and so will use more registers and arithmetic operations will take longer (adding two 64-bit integers will just expand it out onto the smaller datatypes using carries to push into the next sub-word). The compiler is an optimising compiler, so will try to minimise the impact of this.

Note that using double precision floating point, also 64-bit, is only supported in devices with compute capability 1.3 or higher (i.e. 1.3 or 2.0 at this time).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜