开发者

How do CUDA devices handle immediate operands?

Compiling CUDA code with immediate (integer) operands, are they held in the instruction stream, or are they placed into memory? Specifically I'm thinking about 24 or 32 bit unsigned integer operands.

I haven't been able to find information about this in any of the CUDA documentation I've examined so far. So ref开发者_如何学Pythonerences to any documents on specific uarch details like this would be perfect, as I don't currently have a good model for how CUDA works at this level.


NVIDIA doesn't release any information about how the devices work at this level. There is a tool called decuda that can decompile cubins, so you can see the machine code. If I recall, immediates go into the instruction stream, at least as far a decuda is able to deduce. The problem with decuda is that it only works for CUDA 2.3 or lower. They changed the executable format to elf in CUDA 3.0, and decuda hasn't been maintained in a long time.

The best official documentation is the PTX documentation, but that documents a virtual machine isa, not the real device.


If I recall correctly integer division (for example) is very costly, some while floating point operations (like sinf(..)) are completely implemented in hardware and therefore fast.

This talk gave me some insight: "CUDA Tricks for Computational Physics" http://physics.bu.edu/~kbarros/talks/

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜