开发者

OpenCL: basic questions about SIMT execution model

Some of the concepts and designs of the "SIMT" architecture are still unclear to me.

From what I've seen and read, diverging code paths and if() altogether are a rather bad idea, because many threads might execute in lockstep. Now what does that exactly mean? What about something like:

kernel void foo(..., int flag)
{
    if (flag)
        DO_STUFF
    else
        DO_SOMETHING_ELSE
}

The parameter "flag" is the same for all work units and the same branch is taken for all work units. Now, is a GPU going to execute all of the code, serializing everything nonetheless and b开发者_高级运维asically still taking the branch that is not taken? Or is it a bit more clever and will only execute the branch taken, as long as all threads agree on the branch taken? Which would always be the case here.

I.e. does serialization ALWAYS happen or only if needed? Sorry for the stupid question. ;)


No, doesn´t happen always. Executing both branches happens only if the condition is not coherent between threads in a local work group, that means if the condition evaluates to different values between work items in a local work group, current generation GPUs will execute both branches, but only the correct branches will write values and have side effects.

So, maintaining coherency is vital to performance in GPU branches.


not sure about ati, but for nvidia - it is clever. There will be no serialization, if every thread in warp goes the same way.


in your example, flag will have the same value for all work items, so a good compiler will generate code which will take all work-items in the same direction.

But consider the following case:

kernel void foo(..., int *buffer)
{
    if (buffer[get_global_id(0)])
        DO_STUFF
    else
        DO_SOMETHING_ELSE
}

Here it is not guaranteed that all work-items will take the same path, so serialization or control-flow elimination is required.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜