开发者

Time to be used in calculating bandwidth

I am trying to find the effective bandwidth used by my code against the CUDA GEforce 8800 gtx maximum of 86GB/s .I am not sure what time to use though .Currently I am using the difference between calling the kernel with my instructions against calling the kernel with no instructions.Is this the correct approach?(formula i use is ->effective bw= (bytes read+written)/time)

Also I开发者_如何学编程 get a really bad kernel call overhead (close to 1 sec) .Is there a way to get rid of it?


You can time your kernel fairly precisely with cuda events.

//declare the events
cudaEvent_t start;
cudaEvent_t stop;
float kernel_time;

//create events before you use them
cudaEventCreate(&start);
cudaEventCreate(&stop);

//put events and kernel launches in the stream/queue
cudaEventRecord(start,0);
myKernel <<< config >>>( );
cudaEventRecord(stop,0);

//wait until the stop event is recorded
cudaEventSynchronize(stop);

//and get the elapsed time
cudaEventElapsedTime(&kernel_time,start,stop);

//cleanup
cudaEventDestroy(start);
cudaEVentDestroy(stop);


Effective Bandwidth in GBps= ( (Br + Bw)/10^9 ) / Time

Br = number of bytes read by kernel from DRAM

Bw = number of bytes written by kernel in DRAM

Time = time taken by kernel.

For example you test the effective bandwidth of copying a 2048x2048 matrix of floats (4 bytes each) from one locations to another in GPU's DRAM. The formula would be:

Bandwidth in GB/s = ( (2048x2048 x 4 x 2)/10^9 ) / time-taken-by-kernel

here:

2048x2048 (matrix elements)

4 (each element has 4 bytes)

2 (one for read and one for write)

/10^9 to covert B into GB.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜