CUDA: Host to Device bandwidth greater than peak b/w of PCIe?
I had used the same plot as attached, for another question. One could see that the peak bandwidth is more than 5.5GB/s. I am using NVidia's bandwidth test program from code samples to find the bandwidth between host to device and vice versa. The system consists of total 12 Intel Westmere CPUs on two sockets, 4 Tesla C2050 开发者_JS百科GPUs with 4 PCIe Gen2 Express slots. Now the question is, since the peak bandwidth of PCIe x16 Gen2 is 4GB/s in one direction, how come I am getting a much more bandwidth while doing host to device transfer?
I have in mind that each PCIe is connected to the CPU via an I/O Controller Hub, which is connected through QPI (much more b/w) to the CPU.
The peak bandwidth of PCIe x16 Gen2 is 8GB/s in each direction. You are not exceeding the peak.
精彩评论