Why does CUDA Profiler indicate replayed instructions: 82% != global replay + local replay + shared replay?
I got information from CUDA Profiler. I am so confused why Replays Instruction != Grobal memory replay + Local memory replay + Shared bank conflict replay?
See the following information I got from profiler:
Replayed Instructions(%): 81.60
Global memory replay(%): 21.80
Local memory replay开发者_开发技巧s(%): 0.00
Shared bank conflict replay(%): 0.00
Could you help me explain this? Is there any other case causing instruction replay?
Because The SM can replay instructions due to other factors, like different branching logic.
So I can assume that 60% of your code is being reissued due to branching and 20% due to global memory. Can you post a snippet ?
From the F1 Help menu of the Cuda 4.0 profiler:
Replayed Instructions (%) This gives the percentage of instructions replayed during kernel execution. Replayed instructions are the difference between the numbers of instructions that are actually issued by the hardware to the number of instructions that are to be executed by the kernel. Ideally this should be zero. This is calculated as 100 * (instructions issued - instruction executed) / instruction issued
Global memory replay (%) Percentage of replayed instructions caused due to global memory accesses. This is calculated as 100 * (l1 global load miss) / instructions issued
Local memory replay (%) Percentage of replayed instructions caused due to local memory accesses. This is calculated as 100 * (l1 local load miss + l1 local store miss) / instructions issued
Shared bank conflict replay (%) Percentage of replayed instructions caused due to shared memory bank conflicts. This is calculated as 100 * (l1 shared conflict)/ instructions issued
精彩评论