开发者

Registers and shared memory depending on compiling compute capability?

Hey there, when I compile with nvcc -arch=sm_13 I get:

ptxas info    : Used 29 registers, 28+16 bytes smem, 7200 bytes cmem[0], 8 bytes cmem[1] 

when I use nvcc -arch=sm_20 I get:

ptxas info    : Used 34 registers, 60 bytes cmem[0], 7200 bytes cmem[2], 4 bytes cmem[16] 

I thought all the kernel parameters are passed to shared memory but for sm_20, it doesn't seem so...?! Perhaps they are also passed int开发者_运维技巧o registers? The head of my function looks like the following:

__global__ void func(double *, double , double, int)

Thanks so far!


As @talonmies states, shared memory differences are due to SM 2.x devices passing kernel arguments via constant rather than shared memory.

However one of the main differences in register usage in SM 2.x devices is the fact that while SM 1.x devices have dedicated address registers for load and store instructions, SM 2.x uses general-purpose registers for addresses. This tends to increase register pressure on SM 2.x. Luckily the register file is also 2x larger on GF100 (SM 2.0) vs. GT200 (SM 1.3).


In compute capability 2.x devices, arguments to kernels are stored in constant memory. The register difference is probably down to differences in the code generated for math library functions between versions. Are there things like transcendental functions or sqrt in the kernel?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜