CUDA : sharing data between multiple devices?
in CUDA C Programming Guide, it is said that
... by design, a host thread c开发者_如何转开发an execute device code on only one device at any given time. As a consequence, multiple host threads are required to execute device code on multiple devices. Also, any CUDA resources created through the runtime in one host thread cannot be used by the runtime from another host thread...
What I wanted to do is make two GPUs share data on host(mapped memory),
but the manual is seemed to say that it is not possible. Is there any solution for thisWhen you are allocating the host memory, you should allocate using cudaHostAlloc()
and pass the cudaHostAllocPortable
flag. This will allow the memory to be accessed by multiple CUDA contexts.
Solution is to manually manage these common data. Even with SLI.
Cards do not really have shared memory in SLI mode - shared data must be copied from one to the other via the bus.
http://forums.nvidia.com/index.php?showtopic=30740
You may want to look at GMAC. It's a library built on top of CUDA that gives the illusion of shared memory. What it actually does is to allocate memory at the same virtual address on the host and GPU devices, and use page protection to transfer data on demand. Be aware that it is somewhat experimental, maybe in the beta testing stage.
http://code.google.com/p/adsm/
Maybe think about using something like MPI along with CUDA?
http://forums.nvidia.com/index.php?showtopic=30741
http://www.ncsa.illinois.edu/UserInfo/Training/Workshops/CUDA/presentations/tutorial-CUDA.html
You want to allocate your pinned memory as portable by passing cudaHostAllocPortable
to cudaHostAlloc()
. You can exchange data outside the kernel between devices from the same pinned memory for sure, as I've done this before. As for mapped memory, I'm not quite as sure but I don't see why you wouldn't be able to. Try using cudaHostGetDevicePointer()
to get the device pointer to use for the current device (that you've associated with the same CPU thread.)
There's more info in section 3.2.5.3 of the CUDA Programming Guide (v3.2):
A block of page-locked host memory can be allocated as both mapped and portable (see Section 3.2.5.1), in which case each host thread that needs to map the block to its device address space must call cudaHostGetDevicePointer() to retrieve a device pointer, as device pointers will generally differ from one host thread to the other.
I have specifically asked a similar question on the NVIDIA forums regarding how to transfer data between two gpus and have receieved responses saying that if you want to use two gpus simultaneously and transfer data between them, you must have two threads (as the manual suggests). The manual says that "CUDA resources" cannot be shared, however the host memory they are copied from can be shared (using openmp or mpi). Thus if you transfer your memory back to the host from each device, you could access memory between devices.
Keep in mind that this will be very slow, as the transfer of memory to/from devices will be very slow.
So no you can't access gpu1 memory from gpu2 (even with sli - which I have been yelled at for not being related at all to cuda). however you can take gpu1, write to a region on the host, and then take gpu2 and write to another region, and allow the threads managing each device to write the necessary data back to correct gpu.
精彩评论