CUDA streams and context
I am using an application presently that spawns a bunch of pthreads (linux), and each of those creates it's own CUDA context. (using cuda 3.2 right now).
The problem I am having is that it seems like each thread having its own context costs a lot of memory on the GPU. Something like 200MB per thread, so this is really limiting me.
Can I simply create streams in the host thread, pass the stream reference to the worker t开发者_开发技巧hreads, which would then be able to pass to my CUDA library their stream number, and all work out of the same context?
Does a worker thread automatically know the same CUDA context as it's parent thread?
Thanks
Each CUDA context does cost quite a bit of device memory, and their resources are strictly partitioned from one another. For example, device memory allocated in context A cannot be accessed by context B. Streams also are valid only in the context in which they were created.
The best practice would be to create one CUDA context per device. By default, that CUDA context can be accessed only from the CPU thread that created it. If you want to access the CUDA context from other threads, call cuCtxPopCurrent() to pop it from the thread that created it. The context then can be pushed onto any other CPU thread's current context stack, and subsequent CUDA calls would reference that context.
Context push/pop are lightweight operations and as of CUDA 3.2, they can be done in CUDA runtime apps. So my suggestion would be to initialize the CUDA context, then call cuCtxPopCurrent() to make the context "floating" unless some threads wants to operate it. Consider the "floating" state to be the natural one - whenever a thread wants to manipulate the context, bracket its usage with cuCtxPushCurrent()/cuCtxPopCurrent().
精彩评论