开发者

Retrieving the ZF in GCC inline assembly

I need to use some x86 instructions that have no GCC intrinsics, such as BSF and BSR. With GCC inline assembly, I can write something like the following

__INTRIN_INLINE unsigned char bsf64(unsigned long* const index, const uint64_t mask)
{
__asm__("bsf %[mask], %[index]" : [index] "=r" (*index) : [mask] "mr" (mask));
return mask ? 1 : 0;
}

Code like if (bsf64(x, y)) { /* use x */ } is translated by GCC to something like

0x000000010001bf04 <bsf64+0>:   bsf    %rax,%rdx
0x000000010001bf08 <bsf64+4>:   test   %rax,%rax
0x000000010001bf0b <bsf64+7>:   jne    0x10001b开发者_运维百科f44 <...>

However if mask is zero, BSF already sets the ZF flag, so the test after bsf is redundant.

Instead of returning mask ? 1 : 0, is it possible to retrieve the ZF flag and returning it, making GCC not generate the test?

EDIT: made the if example more clear

EDIT: In response to Damon, __builtin_ffsl generates even less optimal code. If I use the following code

    int b = __builtin_ffsl(mask);
    if (b) {
        *index = b - 1;
        return true;
    } else {
        return false;
    }

GCC generates this assembly

   0x000000000044736d <+1101>:  bsf    %r14,%r14
   0x0000000000447371 <+1105>:  cmove  %r12,%r14
   0x0000000000447375 <+1109>:  add    $0x1,%r14d
   0x0000000000447379 <+1113>:  je     0x4471c0 <...>
   0x000000000044737f <+1119>:  lea    -0x1(%r14),%ecx

So the test is gone, but redundant conditional move, increment and decrement are generated.


A couple of remarks:

  • This is an "anti-optimization". You're trying to do a micro-optimization on something that the compiler already supports.
  • Your code does not generate the bsf instruction at all with my version of gcc with all optimization switches turned on. Looking at the code, that is not surprising, because you return mask, which is the source operand, not the destination operand (gcc uses AT&T syntax!). The compiler is intelligent enough to figure this out and drops the assembler code (which doesn't do anything) alltogether.
  • There is an intrinsic function __builtin_ffsl which does exactly the same as your inline assembly (though, correctly). An intrinsic is no less portable than inline assembler, but easier for the compiler to optimize.
  • Using the intrinsic function results in a bsf cmov sequence on my compiler (assuming the calling code forces it to actually emit the instruction), which shows that the compiler uses the zero-flag just fine without an additional test instruction.
  • Returning a char when you want a bool is not the best possible hint for the compiler, though it will probably figure it out anyway most of the time. However, telling the compiler to use a bitscan instruction when you are really only interested in "zero or not zero" is certainly sub-optimal. if(x) and if(!x) work perfectly well for that matter. It would be different if you returned the result as reference, so you could reuse it in another place, but as it is, your code is only a very complicated way of writing if(x).
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜