concurrent kernel execution
Is it possible to launch kernels from different threads of a (host) application and have 开发者_C百科them run concurrently on the same GPGPU device? If not, do you know of any plans (of Nvidia) to provide this capability in the future?
The programming guide http://developer.download.nvidia.com/compute/cuda/3_1/toolkit/docs/NVIDIA_CUDA_C_ProgrammingGuide_3.1.pdf says:
3.2.7.3 Concurrent Kernel Execution Some devices of compute capability 2.0 can execute multiple kernels concurrently. Applications may query this capability by calling cudaGetDeviceProperties() and checking the concurrentKernels property. The maximum number of kernel launches that a device can execute concurrently is sixteen.
So the answer is: It depends. It actually depends only on the device. Host threads won't make a difference in any way. Concurrent kernel launches are serialized if the device doesn't support concurrent kernel execution and if the device does, serial kernel launches on different streams are executed concurrently.
精彩评论