I created a simple particle system. I have a device with compute capability 2.1. What could I change to optimize the kernel?
I asked this same question a few months ago, but I\'ve run into another roadblock and I\'m hoping someone will have a flash of insight.The is the previous thread: Detecting if the monitor is powered o
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references,or expertise, but this question will likely solicit debate, a
I am having trouble finding a way to force any display resolution/timing I want in my C# program. I am running Windows 7 with a GeForce 210 graphics card. My current method to achieve these custom res
I have some question about how GPUs perform synchronizations. As I know, when a warp encounters a barrier (assuming it is in OpenCL), and it knows that the other warps of the same group haven\'t been
In the below code, I first bind the texture called ref to an array called gpu in the global memory. Then I call a function called getVal in which i first set the value of gpu[1] to 5 and then read it
I have two programs. the only difference is that one uses constant memory to store input while the other uses global memory.I want to know why the global memory one is faster than the constant memory
I\'m trying to write a small utility that will enable/disable monitors under Windows 7 with my nVidia graphics card.(ie. \"Extend the desktop onto this monitor\", etc)
I am trying to compile a cuda project that someone sent me. Though the compile stage passes, the link stage is failing. Below is an example of the error:
i am confused why my texture version is slower than my global memory version because the texture version should exploit spatial locality. I am trying to compute the dot product in the below case. Thus