开发者

openCL behavior --- need clarification

I am using the following parameters for my simulation on Geforce GT 220 card -

number of compute units = 6

local size = 32

global size = 32*6*256 = 49152

(everything is one dimensional)

But in the Visual Profiler, I see that N开发者_如何学Cumber of work groups per Compute Unit = 768. Which means it is utilizing only 2 compute units. Why is that? How can I make sure all the compute units are busy? I mean, ideally, I would expect 49152/(32*6) = 256 work groups per compute unit. I am confused at this behavior.


You should not care about compute units, that is onyl HW specific. Just care about local size and global size, and try to use the largest local size as you can.

What is probably happening, is that you specify a very small local size. Every group of local size threads are loaded inside a compute unit. And is not efficient to run only 32 threads. So the loading trashing slows the performance, and probably makes the Compute Units remain idle lot of time.

My recomendation, use a very high Local size. Or DO NOT specify a local size (OpenCL will select the higest one posible)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜