Retrieving the ZF in GCC inline assembly
I need to use some x86 instructions that have no GCC intrinsics, such as BSF and BSR. With GCC inline assembly, I can write something like the following
__INTRIN_INLINE unsigned char bsf64(unsigned long* const index, const uint64_t mask)
{
__asm__("bsf %[mask], %[index]" : [index] "=r" (*index) : [mask] "mr" (mask));
return mask ? 1 : 0;
}
Code like if (bsf64(x, y)) { /* use x */ }
is translated by GCC to something like
0x000000010001bf04 <bsf64+0>: bsf %rax,%rdx
0x000000010001bf08 <bsf64+4>: test %rax,%rax
0x000000010001bf0b <bsf64+7>: jne 0x10001b开发者_运维百科f44 <...>
However if mask
is zero, BSF already sets the ZF flag, so the test
after bsf
is redundant.
Instead of returning mask ? 1 : 0
, is it possible to retrieve the ZF flag and returning it, making GCC not generate the test
?
EDIT: made the if
example more clear
EDIT: In response to Damon, __builtin_ffsl
generates even less optimal code. If I use the following code
int b = __builtin_ffsl(mask);
if (b) {
*index = b - 1;
return true;
} else {
return false;
}
GCC generates this assembly
0x000000000044736d <+1101>: bsf %r14,%r14
0x0000000000447371 <+1105>: cmove %r12,%r14
0x0000000000447375 <+1109>: add $0x1,%r14d
0x0000000000447379 <+1113>: je 0x4471c0 <...>
0x000000000044737f <+1119>: lea -0x1(%r14),%ecx
So the test
is gone, but redundant conditional move, increment and decrement are generated.
A couple of remarks:
- This is an "anti-optimization". You're trying to do a micro-optimization on something that the compiler already supports.
- Your code does not generate the
bsf
instruction at all with my version of gcc with all optimization switches turned on. Looking at the code, that is not surprising, because you returnmask
, which is the source operand, not the destination operand (gcc uses AT&T syntax!). The compiler is intelligent enough to figure this out and drops the assembler code (which doesn't do anything) alltogether. - There is an intrinsic function
__builtin_ffsl
which does exactly the same as your inline assembly (though, correctly). An intrinsic is no less portable than inline assembler, but easier for the compiler to optimize. - Using the intrinsic function results in a
bsf cmov
sequence on my compiler (assuming the calling code forces it to actually emit the instruction), which shows that the compiler uses the zero-flag just fine without an additional test instruction. - Returning a
char
when you want abool
is not the best possible hint for the compiler, though it will probably figure it out anyway most of the time. However, telling the compiler to use a bitscan instruction when you are really only interested in "zero or not zero" is certainly sub-optimal.if(x)
andif(!x)
work perfectly well for that matter. It would be different if you returned the result as reference, so you could reuse it in another place, but as it is, your code is only a very complicated way of writingif(x)
.
精彩评论