开发者

Easiest/Best Way to Learn the x86 Instruction Set? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 10 years ago.

I would like to learn the x86 Instruction Set Architecture. I don't meaning learning an assembly for x86. I want to understand the machine code baby.

The reason is that I would like to write an assembler for x86. Then I want to write a compiler that compiles to that assembly.

I know that there are the Intel manuals and AMD manuals that cover the x86 instruction set. But those are very large and dense.

I'm wondering if there is a more approachable (possibly tutorial) approach to learni开发者_开发技巧ng the x86 instruction set architecture.


Well, I don't agree with you. Complexity of x86 is misunderstood and thus exaggerated. I'm not saying that it isn't complex. It surely is but thats the case only if want to write a full fledged Compiler or Assembler. If you just want to learn Assembly. It isn't that complex.

Lets break down x86-64 architecture to prove my point.


Registers:

x86-64 specifies few registers. How many exactly? Lets enumerate them

  • 16 General purpose registers (RAX, RBX, RCX, RDX,RSI,RDI, RBP, RSP + R8, R9, R10, R11, R12, R13, R14, R15)
  • 6 Segement registers (CS, DS, SS, ES, FS, GS)
  • 64-bit RFlags & 64-bit RIP
  • 8 80-bit Floating point (x87) registers (FPR0-FPR7) aliased to 64-bit MMX registers (MM0-MM7)
  • 16 128-bit extended media registers (XMM0-XMM7 + XMM8-XMM16)
  • some special/miscellaneous registers such as control registers (CR0 through 4), debug registers (DR0 through 3, plus 6 and 7), test registers (TR4 through 7), descriptor registers (GDTR, LDTR, IDTR), and a task register (TR) which we hardly need to care.

alt text http://www.viva64.com/content/articles/64-bit-development/amd64_em64t/01-big.png


Addressing Modes:

How to reference any memory location?

Source: http://en.wikipedia.org/wiki/X86#Addressing_modes

Addressing modes for 32-bit address size on 32-bit or 64-bit x86 processors can be summarized by this formula:

Easiest/Best Way to Learn the x86 Instruction Set? [closed]

Addressing modes for 64-bit code on 64-bit x86 processors can be summarized by these formulas:

Easiest/Best Way to Learn the x86 Instruction Set? [closed]

and

RIP + [displacement]


Operation Modes:

These are the modes in which it can operate:

  1. Real mode
  2. Protected mode
    • Virtual 8086 mode
  3. Long mode

Instruction Set:

You hear people saying its a large instruction set. Well, there are around 500-600 instructions. But some of them are same instructions with very little variations like CMPS/CMPSB/CMPSW/CMPSD/CMPSQ. If you group them like this number comes down to 400 instructions.

Do you feel its very large? Then I have few questions. How many functions does a C Standard library has? how many functions does POSIX library has? What about .NET & Java? How many classes & methods do they have? Do we have to know all of the functions/methods/classes? What approach do we take for learning these libraries?

Just learn few from each. Roughly go through all of them. Get the feel of their existence and use the reference when you need.

We can logically divide these instructions into following categories:

  1. General-Purpose Instructions
    • Basic Data Manipulation (moving & copying)
    • Control Transfer (Jumps, Calls, Interrupts)
    • Arithmetic & Logic Instructions (add,sub,and,xor etc..)
    • String & Bit Oriented Instructions
    • System Calls
  2. System Instructions
  3. x87 Floating-Point Instructions
  4. 64-Bit Media (MMX) Instructions
  5. 128-Bit Media (SSE) Instructions

Thats it!! Thats all you need to know. Now frankly tell me. Is it that complex?

Just get any good book on assembly language covering x86 architecture. I would personally suggest "Assembly Language Programming in GNU/Linux for IA32 Architectures" By Rajat Moona because its short & to the point. Doesn't waste much of your time. But it doesn't cover X86-64.

After familiarized with IA32 for x86-64 read http://csapp.cs.cmu.edu/public/1e/public/docs/asm64-handout.pdf


At some point you will have to cope with a bit of complexity. The x86 instruction set is large.

But you can make things substantially simpler by reading the documentation for an older CPU. Intel and AMD seem to add dozens of new instructions to each submodel. Try to read the Intel manual for the 80386, which is substantially smaller and yet covers much of what you will use.

I know a good (old) book but it is in French. It is called "Programmation du 80386" by J.-M. and M. Trio. I am not sure it is still edited nowadays (I bought mine nearly 20 years ago).


I'd say jump to the deep water and start from there.

Start by writing a simple (C/++) application. Then use the epic debugger called OllyDbg ( http://www.ollydbg.de/ ). Debug your application and see how the compiler implemented your code. Check loops. Check function calls. Check API calls. Check memory manipulation.

By doing that you'll get a real idea of how to do things.

I've been debugging application this way and learned assembly. You say you want to UNDERSTAND the machine code and there's no better way in my opinion.

You may also check with something called "crackme" (google it). This will put you in a challenge to test your skills. Once you're in control you'll see that everything you want to know is just a matter of digging the instructions set manual. get the point? Challenge yourself with specific targets.

Good luck. It's not easy yet very possible.


If you just want to understand the numbers and some of the complexities such as Mod R/M bytes and the other oddities behind it, you may want to try implementing a simple 8086 emulator. (just the CPU). I found it to be a fun and interesting experience.

http://www.ousob.com/ng/iapx86/ is a really good reference I used when writing an emulator and gives a very nice list of opcodes along with CPU version that it appeared, and the hex opcode for each variation of the opcode.


Old versions of the NASM manual had a nice, concise reference, though being old the CPUs they refer to are only so recent. Here's a random copy I found. Lists opcodes (arranged so the patterns are easy to see), and describes the addressing mode encodings:

http://www.posix.nl/linuxassembly/nasmdochtml/nasmdoca.html

I wrote a runtime machine code generator (targeting 486 or better) using basically just this information, so there should be enough there to get you started...


I think you are not realistic. You sed:

I know that there are the Intel manuals and AMD manuals that cover the x86 instruction set. But those are very large and dense.

...

I'd like to learn all of that. Perhaps I should start with what is simplest and easiest to learn.

Did you ask your self why there are large and dense? The answer is simple! If we are just looking Intel x86 products

  • There are: 8086, 8088 , 80186, 80188 and 80286 16 bit CPUs.
  • There are: 80386 and 80486 with build floating point coprocessor 32 bit CPUs.
  • There are: Pentium and Pentium MMX
  • There are: Pentium Pro, Pentium II and Pentium III
  • There are: Pentium 4 Pentium M, Pentium 5, Pentium 6, Celleron, Prescott
  • There are: Intel Core 2, Intel Core i7
  • There is:Intel Atom
  • There is:Sandy Bridge

  • There are 16, 32 and 64 bit architectures

  • There are several different math with floating point units.
  • There are several Streaming SIMD Extensions.
  • There are several protected models of CPU.

There are...

There are 32 years of R&D on x86 architectures . And I did'n mention AMD, VIA and so on!

No there is no faster way!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜