开发者

x86 assembly idioms

I've been trying to get a good hold 开发者_如何学Goon the x86 assembly language, and was wondering if there was a quick-and-short equivalent of movl $1, %eax. That's when I thought that a list of idioms used frequently in the language would perhaps be a good idea.

This could include the preferred use of xorl %eax, %eax as opposed to movl $0, %eax, or testl %eax, %eax against cmpl $0, %eax.

Oh, and kindly post one example per post!


Here's another interesting "idiom". Hopefully everyone knows that division is a big time sink even compared to a multiplication. Using a little math, it's possible to multiply by the inverse of constant instead of dividing by it. This goes beyond the shr tricks. For example, to divide by 5:

mov eax, some_number
mov ebx, 3435973837    // 32-bit inverse of 5
mul ebx

Now eax has been divided by 5 without using the slow div opcode. Here is a list of useful constants for division shameless stolen from http://blogs.msdn.com/devdev/archive/2005/12/12/502980.aspx

3   2863311531
5   3435973837
7   3067833783
9   954437177
11  3123612579
13  3303820997
15  4008636143
17  4042322161

For numbers not on the list, you might need to do a shift beforehand (to divide by 6, shr 1, then multiply by the inverse of 3).


on x64:

xor eax, eax 

for

xor rax, rax

(the first one also implicitly clears the upper half of rax, but has a smaller opcode)


Using LEA for e.g. multiplication, like:

lea eax, [ecx+ecx*4]   

for EAX = 5 * ECX


Expanding on my comment:

To an undiscerning processor such as the Pentium Pro, xorl %eax, %eax appears to have a dependency on %eax and thus must wait for the value of that register to be available. Later processors actually have additional logic to recognize that instruction as not having any dependencies.

The instructions incl and decl set some of the flags but leave others unchanged. That's the worst situation if the flags are modelized as a single register for the purpose of instruction reordering: any instruction that reads a flag after an incl or decl must be considered as depending on the incl or decl (in case it's reading one of the flags that this instruction sets) and also on the previous instruction that set the flags (in case it's reading one of the flags that this instruction does not set). A solution would be to divide the flags register into two and to consider dependencies with this finer grain... but AMD had a better idea and removed these instructions entirely from the 64-bit extension they proposed a few years back.

Regarding the links, I found this either in the Intel manuals for which it's useless to provide a link because they are on a corporate website that's reorganized every six months, or on Agner Fog's site: http://www.agner.org/optimize/#manuals


At loops...

  dec     ecx 
  cmp     ecx, -1       
  jnz     Loop              

is

  dec     ecx  
  jns     Loop 

Faster and shorter.


You might as well as how to optimize in assembly. Then you'd have to ask what you're optimizing for: size or speed? Anyway, here's my "idiom", a replacement for xchg:

xor eax, ebx
xor ebx, eax
xor eax, ebx


Using SHL and SHR for multiplication/division by a power of 2


Another one (beside xor) for

mov eax, 0   ; B800000000h

is

sub eax, eax ; 29C0h

Rationale: smaller opcode


Don't know whether this counts as an idiom, but on most processors prior to i7

movq xmm0, [eax]
movhps xmm0, [eax+8]

or, if SSE3 is available,

lddqu xmm0, [eax]

are faster for reading from an unaligned memory location than

movdqu xmm0, [eax]


The earliest reference to division by invariant integers that is more than just an inverse multiply is here: Torbjörn Granlund of The Royal Institue of Technology in Stockholm. Check out his publications

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜