Why is the later version of Cg compiler producing shader using more instructions?

2023-03-20 06:47 问答作者：

I have a shader that looks like this:

void main( in   float2              pos         : TEXCOORD0,
           in   uniform sampler2D   data        : TEXUNIT0,
           in   uniform sampler2D   palette     : TEXUNIT1,
           in   uniform float       c,
           in   uniform float       th0,
           in   uniform float       th1,
           in   uniform float       th2,
           in   uniform float4      BackGroundColor,
           out  float4              color       : COLOR
         )
{
    const float4 dataValue = tex2D( data, pos );
    const float vValue = dataValue.x;
    const float tValue = dataValue.y;

    color = BackGroundColor;
    if ( tValue <= th2 )
    {
        if ( tValue < th1 )
        {
            const float vRealValue = abs( vValue - 0.5 );
            if ( vRealValue > th0 )
            {
                // determine value and color
                const float power = ( c > 0.0 ) ? vValue : ( 1.0 - vValue );
                color = tex2D( palette, float2( power, 0.0 ) );
            }
        }
        else
        {
            color = float4( 0.0, tValue, 0.0, 1.0 );
        }
    }
}

and I am compiling it like this:

cgc -profile arbfp1 -strict -O3 -q sh.cg -o sh.asm

Now, different versions of Cg compiler creating different output.

cgc version 2.2.0006 is compiling the shader into an assembler code using 18 instructions:

!!ARBfp1.0
PARAM c[6] = { program.local[0..4],{ 0, 1, 0.5 } };
TEMP R0;
TEMP R1;
TEMP R2;
TEX R0.xy, fragment.texcoord[0], texture[0], 2D;
ADD R0.z, -R0.x, c[5].y;
CMP R0.z, -c[0].x, R0.x, R0;
MOV R0.w, c[5].x;
TEX R1, R0.zwzw, texture[1], 2D;
SLT R0.z, R0.y, c[2].x;
ADD R0.x, R0, -c[5].z;
ABS R0.w, R0.x;
SGE R0.x, c[3], R0.y;
MUL R2.x, R0, R0.z;
SLT R0.w, c[1].x, R0;
ABS R2.y, R0.z;
MUL R0.z, R2.x, R0.w;
CMP R0.w, -R2.y, c[5].x, c[5].y;
CMP R1, -R0.z, R1, c[4];
MUL R2.x, R0, R0.w;
MOV R0.xzw, c[5].xyxy;
CMP result.color, -R2.x, R0, R1;
END
# 18 instructions, 3 R-regs

cgc version 3.0.0016 is compiling 开发者_开发百科the shader into an assembler code using 23 instructions:

!!ARBfp1.0
PARAM c[6] = { program.local[0..4], { 0, 1, 0.5 } };
TEMP R0;
TEMP R1;
TEMP R2;
TEX R0.xy, fragment.texcoord[0], texture[0], 2D;
ADD R1.y, R0.x, -c[5].z;
MOV R1.z, c[0].x;
ABS R1.y, R1;
SLT R1.z, c[5].x, R1;
SLT R1.x, R0.y, c[2];
SGE R0.z, c[3].x, R0.y;
MUL R0.w, R0.z, R1.x;
SLT R1.y, c[1].x, R1;
MUL R0.w, R0, R1.y;
ABS R1.z, R1;
CMP R1.y, -R1.z, c[5].x, c[5];
MUL R1.y, R0.w, R1;
ADD R1.z, -R0.x, c[5].y;
CMP R1.z, -R1.y, R1, R0.x;
ABS R0.x, R1;
CMP R0.x, -R0, c[5], c[5].y;
MOV R1.w, c[5].x;
TEX R1, R1.zwzw, texture[1], 2D;
CMP R1, -R0.w, R1, c[4];
MUL R2.x, R0.z, R0;
MOV R0.xzw, c[5].xyxy;
CMP result.color, -R2.x, R0, R1;
END
# 23 instructions, 3 R-regs

The strange thing is that the optimization level for the cg 3.0 doesn't seems to influence anything.

Can someone explain what is going on? Why is the optimization not working and why is the shader longer when I compiled with cg 3.0?

Take a note that I removed comments from the compiled shaders.

This might not be a real answer to the problem but maybe give some more insight. I inspected the generated assembly code a bit and converted it back to high-level code. I tried to compress it as much as possible and remove all copies and temporaries that follow implicitly from the high-level operations. I used b variables as temporary bools and fs as temporary floats. The first one (with the 2.2 version) is:

power = ( c > 0.0 ) ? vValue : ( 1.0 - vValue );
R1 = tex2D( palette, float2( power, 0.0 ) );

vRealValue = abs( vValue - 0.5 );

b1 = ( tValue < th1 );
b2 = ( tValue <= th2 );

b3 = b1;

b1 = b1 && b2 && ( vRealValue > th0 );
R1 = b1 ? R1 : BackGroundColor;

color = ( b2 && !b3 ) ? float4( 0.0, tValue, 0.0, 1.0 ) : R1;

and the second (with 3.0) is:

vRealValue = abs( vValue - 0.5 );

f0 = c;
b0 = ( 0 < f0 );

b1 = ( tValue < th1 );
b2 = ( tValue <= th2 );

b4 = b1 && b2 && ( vRealValue > th0 );

b0 = b0;
b3 = b1;

power = ( b4 && !b0 ) ? ( 1.0 - vValue ) : vValue;
R1 = tex2D( palette, float2( power, 0.0 ) );

R1 = b4 ? R1 : BackGroundColor;

color = ( b2 && !b3 ) ? float4( 0.0, tValue, 0.0, 1.0 ) : R1;

Most parts are essentially the same. The second program does some unneccessary operations. It copies the c variable into a temporary instead of using it directly. Moreover does it switch vValue and 1-vValue in the power computation, so it needs to negate b0 (resulting in one more CMP), whereas the first one does not use a temporary at all (it uses CMP directly instead of SLT and CMP). It also uses b4 in this computation, which is completely unneccessary, because when b4 is false, the result of the texture access is irrelevant, anyway. This results in one more && (implemented with MUL). There is also the unneccessary copy from b1 to b3 (in the first program it is neccessary, but not in the second). And the extremely useless copy from b0 into itself (which is disguised as an ABS, but as the value comes from an SLT, it can only be 0.0 or 1.0 and the ABS degenerates to a MOV).

So the second program is quite similar to the first one with just some additional, but IMHO completely useless instructions. The optimizer seems to have done a worse job compared to the previous(!) version. As the Cg compiler is an nVidia product (and not from some other not to be named graphics company) this behaviour is really strange.

继续阅读：cg pixel-shader

Why is the later version of Cg compiler producing shader using more instructions?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？