开发者

Why the following will drag the performance of Fragment Shader (Open GL ES 2.0)

I have the following code in the Fragment Shader:

precision lowp float;

varying vec2 v_texCoord;
uniform sampler2D s_texture;

uniform bool color_tint;
uniform float color_tint_amount;
uniform vec4 color_tint_color;

void main(){
    float gradDistance;
    vec4 texColor, gradColor;
    texColor = texture2D(s_texture, v_texCoord);
    if (color_tint){
        gradColor = color_tint_color;
        gradColor.a = texColor.a;
        texColor = gradCol开发者_运维问答or * color_tint_amount + texColor * (1.0 - color_tint_amount);
    }
    gl_FragColor = texColor;
}

The code works fine, but it is interesting that even all color_tint I passed in is false, the above code still cause serious drag in performance. When comparing to:

void main(){
    float gradDistance;
    vec4 texColor, gradColor;
    texColor = texture2D(s_texture, v_texCoord);
    if (false){
        gradColor = color_tint_color;
        gradColor.a = texColor.a;
        texColor = gradColor * color_tint_amount + texColor * (1.0 - color_tint_amount);
    }
    gl_FragColor = texColor;
}

Which the later one can achieve 40+ fps while the first one is about 18 fps. I double checked and all color_tint passed in the first one are false so the block should never executed.

BTW, I am programming the above in Android 2.2 using GLES20.

Could any expert know what's wrong with the shader?


I am not an expert in fragment shaders, but I assume the second one would be faster because the entire if statement could be removed at compile time because it is never true. In the first one it can't tell that color_tint is always false until runtime so will need to check that and branch every time. Branches can be expensive, especially on graphics hardware that is often designed for predictable serial programming.

I suggest you try rewriting it to be branchless - Darren's answer has some good suggestions in that direction.


Branches are very slow on fragment shaders avoid them if possible. Use color_tint_amount of 0 for no tint. Premultiply the color_tint_color and save a multiply per pixel. Make color_tint_amount = 1.0 - color_tint_amount. (so now 1.0 means no gradColor) These shaders and run millions upon millions of times a second, you have to save every cycle you can.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜