x86 assembly idioms

2022-12-26 19:43 问答作者：

I've been trying to get a good hold 开发者_如何学Goon the x86 assembly language, and was wondering if there was a quick-and-short equivalent of movl $1, %eax. That's when I thought that a list of idioms used frequently in the language would perhaps be a good idea.

This could include the preferred use of xorl %eax, %eax as opposed to movl $0, %eax, or testl %eax, %eax against cmpl $0, %eax.

Oh, and kindly post one example per post!

Here's another interesting "idiom". Hopefully everyone knows that division is a big time sink even compared to a multiplication. Using a little math, it's possible to multiply by the inverse of constant instead of dividing by it. This goes beyond the shr tricks. For example, to divide by 5:

mov eax, some_number
mov ebx, 3435973837    // 32-bit inverse of 5
mul ebx

Now eax has been divided by 5 without using the slow div opcode. Here is a list of useful constants for division shameless stolen from http://blogs.msdn.com/devdev/archive/2005/12/12/502980.aspx

3   2863311531
5   3435973837
7   3067833783
9   954437177
11  3123612579
13  3303820997
15  4008636143
17  4042322161

For numbers not on the list, you might need to do a shift beforehand (to divide by 6, shr 1, then multiply by the inverse of 3).

on x64:

xor eax, eax

for

xor rax, rax

(the first one also implicitly clears the upper half of rax, but has a smaller opcode)

Using LEA for e.g. multiplication, like:

lea eax, [ecx+ecx*4]

for EAX = 5 * ECX

Expanding on my comment:

To an undiscerning processor such as the Pentium Pro, xorl %eax, %eax appears to have a dependency on %eax and thus must wait for the value of that register to be available. Later processors actually have additional logic to recognize that instruction as not having any dependencies.

The instructions incl and decl set some of the flags but leave others unchanged. That's the worst situation if the flags are modelized as a single register for the purpose of instruction reordering: any instruction that reads a flag after an incl or decl must be considered as depending on the incl or decl (in case it's reading one of the flags that this instruction sets) and also on the previous instruction that set the flags (in case it's reading one of the flags that this instruction does not set). A solution would be to divide the flags register into two and to consider dependencies with this finer grain... but AMD had a better idea and removed these instructions entirely from the 64-bit extension they proposed a few years back.

Regarding the links, I found this either in the Intel manuals for which it's useless to provide a link because they are on a corporate website that's reorganized every six months, or on Agner Fog's site: http://www.agner.org/optimize/#manuals

At loops...

  dec     ecx 
  cmp     ecx, -1       
  jnz     Loop

  dec     ecx  
  jns     Loop

Faster and shorter.

You might as well as how to optimize in assembly. Then you'd have to ask what you're optimizing for: size or speed? Anyway, here's my "idiom", a replacement for xchg:

xor eax, ebx
xor ebx, eax
xor eax, ebx

Using SHL and SHR for multiplication/division by a power of 2

Another one (beside xor) for

mov eax, 0   ; B800000000h

sub eax, eax ; 29C0h

Rationale: smaller opcode

Don't know whether this counts as an idiom, but on most processors prior to i7

movq xmm0, [eax]
movhps xmm0, [eax+8]

or, if SSE3 is available,

lddqu xmm0, [eax]

are faster for reading from an unaligned memory location than

movdqu xmm0, [eax]

The earliest reference to division by invariant integers that is more than just an inverse multiply is here: Torbjörn Granlund of The Royal Institue of Technology in Stockholm. Check out his publications

继续阅读：assembly idioms x86

x86 assembly idioms

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？