x86 assembly idioms
I've been trying to get a good hold 开发者_如何学Goon the x86 assembly language, and was wondering if there was a quick-and-short equivalent of movl $1, %eax
. That's when I thought that a list of idioms used frequently in the language would perhaps be a good idea.
This could include the preferred use of xorl %eax, %eax
as opposed to movl $0, %eax
, or testl %eax, %eax
against cmpl $0, %eax
.
Oh, and kindly post one example per post!
Here's another interesting "idiom". Hopefully everyone knows that division is a big time sink even compared to a multiplication. Using a little math, it's possible to multiply by the inverse of constant instead of dividing by it. This goes beyond the shr tricks. For example, to divide by 5:
mov eax, some_number
mov ebx, 3435973837 // 32-bit inverse of 5
mul ebx
Now eax has been divided by 5 without using the slow div opcode. Here is a list of useful constants for division shameless stolen from http://blogs.msdn.com/devdev/archive/2005/12/12/502980.aspx
3 2863311531
5 3435973837
7 3067833783
9 954437177
11 3123612579
13 3303820997
15 4008636143
17 4042322161
For numbers not on the list, you might need to do a shift beforehand (to divide by 6, shr 1, then multiply by the inverse of 3).
on x64:
xor eax, eax
for
xor rax, rax
(the first one also implicitly clears the upper half of rax
, but has a smaller opcode)
Using LEA
for e.g. multiplication, like:
lea eax, [ecx+ecx*4]
for EAX = 5 * ECX
Expanding on my comment:
To an undiscerning processor such as the Pentium Pro, xorl %eax, %eax
appears to have a dependency on %eax
and thus must wait for the value of that register to be available. Later processors actually have additional logic to recognize that instruction as not having any dependencies.
The instructions incl
and decl
set some of the flags but leave others unchanged. That's the worst situation if the flags are modelized as a single register for the purpose of instruction reordering: any instruction that reads a flag after an incl
or decl
must be considered as depending on the incl
or decl
(in case it's reading one of the flags that this instruction sets) and also on the previous instruction that set the flags (in case it's reading one of the flags that this instruction does not set). A solution would be to divide the flags register into two and to consider dependencies with this finer grain... but AMD had a better idea and removed these instructions entirely from the 64-bit extension they proposed a few years back.
Regarding the links, I found this either in the Intel manuals for which it's useless to provide a link because they are on a corporate website that's reorganized every six months, or on Agner Fog's site: http://www.agner.org/optimize/#manuals
At loops...
dec ecx
cmp ecx, -1
jnz Loop
is
dec ecx
jns Loop
Faster and shorter.
You might as well as how to optimize in assembly. Then you'd have to ask what you're optimizing for: size or speed? Anyway, here's my "idiom", a replacement for xchg
:
xor eax, ebx
xor ebx, eax
xor eax, ebx
Using SHL
and SHR
for multiplication/division by a power of 2
Another one (beside xor
) for
mov eax, 0 ; B800000000h
is
sub eax, eax ; 29C0h
Rationale: smaller opcode
Don't know whether this counts as an idiom, but on most processors prior to i7
movq xmm0, [eax]
movhps xmm0, [eax+8]
or, if SSE3 is available,
lddqu xmm0, [eax]
are faster for reading from an unaligned memory location than
movdqu xmm0, [eax]
The earliest reference to division by invariant integers that is more than just an inverse multiply is here: Torbjörn Granlund of The Royal Institue of Technology in Stockholm. Check out his publications
精彩评论