Is there a maximum number of streams in CUDA?
Is there a maximum number of streams that can be created in CUDA?
To clarify I mean CUDA streams as in开发者_C百科 the stream that allows you to execute kernels and memory operations.
There is no realistic limit to the number of streams you can create (at least 1000's). However, there's a limit to the number of streams you can use effectively to achieve concurrency.
In Fermi, the architecture supports 16-way concurrent kernel launches, but there is only a single connection from the host to the GPU. So even if you have 16 CUDA streams, they'll eventually get funneled into one HW queue. This can create false data-dependencies, and limit the amount of concurrency one can easily get.
With Kepler, the number of connections between the Host and the GPU is now 32 (instead of one with Fermi). With the new Hyper-Q technology, it is now much easier to keep the GPU busy with concurrent work.
I haven't seen a limit in any documentation, but that doesn't mean all streams will execute concurrently, since that is a hard hardware limit (Multiprocessors, registers, etc).
According to this NVIDIA presentation, max is 16 streams (on Fermi). http://developer.download.nvidia.com/CUDA/training/StreamsAndConcurrencyWebinar.pdf
To clarify, I've successfully created more than 16 streams, but I think the hardware can only support 16 concurrent kernels, so the excess ones are wasted in terms of concurrency.
Kepler is probably different.
精彩评论