Should I expect differences in the ouput of a cuda application between generations?
I have some code that is compiled and tested on both Tesla and Fermi generation chipsets.
Across all Tesla generation chips (260,280,c1060) the output is consistent.
Acro开发者_StackOverflow中文版ss all Fermi generation chips (460-580, c2080) the output is consistent.
However, between the Tesla and Fermi generations the output images are subtley different.
Is this to be expected? There is floating point math in the code, and precision is my first suspicion, but I can't find any mention of it in Nvidia's docs.
You should also check out my whitepaper and webinar on floating point for NVIDIA GPUs (I'm an NVIDIA employee).
http://developer.nvidia.com/content/everything-you-ever-wanted-know-about-floating-point-were-afraid-ask
To answer the question, there are indeed numeric differences between the hardware generations. The "compute capability" tells you what features the chip has. Devices of compute capability 1.0-1.2 just have single precision. Single precision on these devices is flush-to-zero, meaning it doesn't support denormal numbers. Some operations like division and square root are not correctly rounded (they use fast hardware approximations to the functions).
Devices of compute capability 1.3 added support for double precision. Double precision is correctly rounded and supports denormals. Double precision also has a fused multiply-add, which increases precision.
Devices of compute capability 2.0 and later upgraded the single precision floating point. Now single precision is correctly rounded and supports denormals. They also have a fused multiply-add in single precision as well as in double precision.
In the Fermi Tuning Guide there is a section about IEEE 754-2008 Compliance which states:
Devices of compute capability 2.x have far fewer deviations from the IEEE 754-2008 floating point standard than devices of compute capability 1.x, particularly in single precision (Section F.2). This can cause slight changes in numeric results between devices of compute capability 1.x and devices of compute capability 2.x.
The full document is available in the downloads section of the CUDA website.
精彩评论