Can I prefetch specific data to a specific cache level in a CUDA kernel?
I understand that Fermi GPUs support prefetching to L1 or L2 cache. However, in the CUDA reference manual I can not find any thing about it.
Dues CUDA allow my kernel code to prefetch spec开发者_Go百科ific data to a specific level of cache?
Well not at instruction level but detailed information about prefetching in GPUs in here:
Many-Thread Aware Prefetching Mechanisms for GPGPU Applications
(paper in the the ACM symposium on microarchitecture 2010)
You can find instruction reference in nVIDIA's PTX ISA reference document; the relevant instructions are prefetch
and prefetchu
.
精彩评论