Retrieving the ZF in GCC inline assembly

2023-03-24 00:44 问答作者：

I need to use some x86 instructions that have no GCC intrinsics, such as BSF and BSR. With GCC inline assembly, I can write something like the following

__INTRIN_INLINE unsigned char bsf64(unsigned long* const index, const uint64_t mask)
{
__asm__("bsf %[mask], %[index]" : [index] "=r" (*index) : [mask] "mr" (mask));
return mask ? 1 : 0;
}

Code like if (bsf64(x, y)) { /* use x */ } is translated by GCC to something like

0x000000010001bf04 <bsf64+0>:   bsf    %rax,%rdx
0x000000010001bf08 <bsf64+4>:   test   %rax,%rax
0x000000010001bf0b <bsf64+7>:   jne    0x10001b开发者_运维百科f44 <...>

However if mask is zero, BSF already sets the ZF flag, so the test after bsf is redundant.

Instead of returning mask ? 1 : 0, is it possible to retrieve the ZF flag and returning it, making GCC not generate the test?

EDIT: made the if example more clear

EDIT: In response to Damon, __builtin_ffsl generates even less optimal code. If I use the following code

    int b = __builtin_ffsl(mask);
    if (b) {
        *index = b - 1;
        return true;
    } else {
        return false;
    }

GCC generates this assembly

   0x000000000044736d <+1101>:  bsf    %r14,%r14
   0x0000000000447371 <+1105>:  cmove  %r12,%r14
   0x0000000000447375 <+1109>:  add    $0x1,%r14d
   0x0000000000447379 <+1113>:  je     0x4471c0 <...>
   0x000000000044737f <+1119>:  lea    -0x1(%r14),%ecx

So the test is gone, but redundant conditional move, increment and decrement are generated.

A couple of remarks:

This is an "anti-optimization". You're trying to do a micro-optimization on something that the compiler already supports.
Your code does not generate the bsf instruction at all with my version of gcc with all optimization switches turned on. Looking at the code, that is not surprising, because you return mask, which is the source operand, not the destination operand (gcc uses AT&T syntax!). The compiler is intelligent enough to figure this out and drops the assembler code (which doesn't do anything) alltogether.
There is an intrinsic function __builtin_ffsl which does exactly the same as your inline assembly (though, correctly). An intrinsic is no less portable than inline assembler, but easier for the compiler to optimize.
Using the intrinsic function results in a bsf cmov sequence on my compiler (assuming the calling code forces it to actually emit the instruction), which shows that the compiler uses the zero-flag just fine without an additional test instruction.
Returning a char when you want a bool is not the best possible hint for the compiler, though it will probably figure it out anyway most of the time. However, telling the compiler to use a bitscan instruction when you are really only interested in "zero or not zero" is certainly sub-optimal. if(x) and if(!x) work perfectly well for that matter. It would be different if you returned the result as reference, so you could reuse it in another place, but as it is, your code is only a very complicated way of writing if(x).

继续阅读：assembly gcc x86

Retrieving the ZF in GCC inline assembly

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？