Time to be used in calculating bandwidth
I am trying to find the effective bandwidth used by my code against the CUDA GEforce 8800 gtx maximum of 86GB/s .I am not sure what time to use though .Currently I am using the difference between calling the kernel with my instructions against calling the kernel with no instructions.Is this the correct approach?(formula i use is ->effective bw= (bytes read+written)/time)
Also I开发者_如何学编程 get a really bad kernel call overhead (close to 1 sec) .Is there a way to get rid of it?
You can time your kernel fairly precisely with cuda events.
//declare the events
cudaEvent_t start;
cudaEvent_t stop;
float kernel_time;
//create events before you use them
cudaEventCreate(&start);
cudaEventCreate(&stop);
//put events and kernel launches in the stream/queue
cudaEventRecord(start,0);
myKernel <<< config >>>( );
cudaEventRecord(stop,0);
//wait until the stop event is recorded
cudaEventSynchronize(stop);
//and get the elapsed time
cudaEventElapsedTime(&kernel_time,start,stop);
//cleanup
cudaEventDestroy(start);
cudaEVentDestroy(stop);
Effective Bandwidth in GBps= ( (Br + Bw)/10^9 ) / Time
Br = number of bytes read by kernel from DRAM
Bw = number of bytes written by kernel in DRAM
Time = time taken by kernel.
For example you test the effective bandwidth of copying a 2048x2048 matrix of floats (4 bytes each) from one locations to another in GPU's DRAM. The formula would be:
Bandwidth in GB/s = ( (2048x2048 x 4 x 2)/10^9 ) / time-taken-by-kernel
here:
2048x2048 (matrix elements)
4 (each element has 4 bytes)
2 (one for read and one for write)
/10^9 to covert B into GB.
精彩评论