开发者

Is local memory access coalesced?

Suppose, I declare a local variable in a CUDA kernel function for each thread:

float f = ...; // some calculations here

Suppose also, that the declared variable was placed by a compiler to a local memory (wh开发者_如何学运维ich is the same as global one except it is visible for one thread only as far as I know). My question is will the access to f be coalesced when reading it?


I don't believe there is official documentation of how local memory (or stack on Fermi) is laid out in memory, but I am pretty certain that mulitprocessor allocations are accessed in a "striped" fashion so that non-diverging threads in the same warp will get coalesced access to local memory. On Fermi, local memory is also cached using the same L1/L2 access mechanism as global memory.


CUDA cards don't have memory allocated for local variables. All local variables are stored in registers. Complex kernels with lots of variables reduce the number of threads that can run concurrently, a condition known as low occupancy.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜