CUDA 4.0 Peer to Peer Access confusion

2023-03-24 17:43 问答作者：

I have two questions related to CUDA 4.0 Peer access:

Is there any way I could copy data like from GPU#0 ---> GPU#1 ---> GPU#2 ---> GPU#3. Presently in my code it works fine when I use just two GPUs at a time, but fails when I check peer access on a third GPU using cudaDeviceCanAccessPeer. So, the following works - cudaDeviceCanAccessPeer(&flag_01, dev0, dev1), but when I have two such statements: cudaDeviceCanAccessPeer(&flag_01, dev0, dev1) and cudaDeviceCanAccessPeer(&flag_12, dev1, dev2), the later fails (0 is returned to the flag_12 variable).
Would it work only for GPUs connected via a common PCIe OR is Peer copy dependent upon the underlying PCIe interconnection? I do not开发者_StackOverflow中文版 understand PCIe, but upon doing nvidia-smi I see that the PCIe buses of the GPUs are 2, 3, 83 and 84.

The testbed is a dual socket 6 core Intel Westmere, with 4 GPUs - Nvidia Tesla C2050.

EDIT: Bandwidthtest between HtoD and DtoH, and SimpleP2P results between two GPUs (DtoD):

CUDA 4.0 Peer to Peer Access confusion

I suspect this is the problem. From an upcoming NVIDIA document:

NVIDIA GPUs are designed to take full advantage of the PCI-e Gen2 standard, including the Peer-to-Peer communication, but the IOH chipset does not support the full PCI-e Gen2 specification for P2P communication with other IOH chipsets

The cudaPeerEnable() API call will return an error code if the application tries to establish a P2P relationship between two GPUs that would require P2P communication over QPI. The cudaMemcopy() function for P2P Direct Transfers automatically falls back to using a Device-to-Host-to-Device path, but there is no automatic fallback for P2P Direct Access (P2P load/store instructions in device code).

One known example system is the HP Z800 workstation with dual IOH chipsets which can run the simpleP2P example, but bandwidth is very low (100s of MB/s instead of several GB/s) because of the fallback path.

NVIDIA is investigating whether GPU P2P across QPI can be supported by adding functionality to future GPU architectures.

Reference: Intel® 5520 Chipset and Intel® 5500 Chipset Datasheet, Table 7-4: Inbound Memory Address Decoding: “The IOH does not support non-contiguous byte enables from PCI Express for remote peer-to-peer MMIO transactions. This is an additional restriction over the PCI Express standard requirements to prevent incompatibility with Intel QuickPath Interconnect”. -- http://www.intel.com/Assets/PDF/datasheet/321328.pdf

In general we advise building multi-GPU workstations and clusters that have all PCI-express slots intended for GPUs connected to a single IOH.

CUDA 4.0 Peer to Peer Access confusion

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生 新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？