CUDA on Tesla adapter and Full screen DX window on another NVIDIA adapter
I have an application that does some processing with CUDA on a Tesla X2050 adapter. In my system I also have a Qudaro4000, which for this purpose is not used by the application. In addition I have another Geforce2xx graphics card which is used to display patterns. The part which displays the patterns is just code that creates a full screen DX9 device on the GeForce2xx adapter and display a different pattern each display frame. For this purpose it need to display it VSynched and it shouldn't skip or miss any pattern. The issue I am having is that when I turn on the VSync, all the CUDA processing become extremly slow. If I disable VSynch, then I get tearing in the patterns which is not desired. How can I combine both the CUDA processing and the pattern displaying? For the sake of context, this is done for a structured light system in which one adapter is connected to a projector which project patterns.
Edit 10.4.2011: I have discovered why the sequence is projected perfectly on one computer and why the images stall from time to time in the more powerful computer. The difference is that one has an onboard intel GPU and one has 3 NVIDIA GPUs. Well, for this particular task, the onboard intel GPU does the job a lot better than any of the NVIDIA GPUs. It might be because of the different drivers, and I am looking if there is any sort of option\parameter combination to开发者_开发百科 set in the NVIDIA driver to have the same perfect performance the intel GPU has.
Thank you.
Ofer.
I have solved this problem a while ago.
The reason this issue happen is because the VSync also stall CUDA calculation, it stalls the whole GPU. So there are two solutions:
If you have a Tesla then you can set the Tesla into TCC mode, which is an exclusive mode. This means the VSync in the display GPU(A geforce or quadro) will not stall the Tesla and the CUDA calculations on it.
Try to call a VSync stalling operation as late as possible or instead of a stalling vsync, do a test command.
In DX9 the Present command have two modes that either block(the GPU) and wait for the VSync or test if Present was successful, if not, it doesn't stall.
With a combination of sleep, measuring time from the last frame, or testing if Present was successful, it is possible to make the GPU stall as little as possible from the VSync. I was able to run CUDA structured light decoding + Pattern projection in DX9 + 3D display all on the same GPU(GeForce 320M) this way.
One solution would be to buffer the created images and display them in sequence.
How are you using the DX9 context? are you copying the result of the 2050 to the DX9 Context? Are you using AsyncCalls on the computation side?
精彩评论