开发者

Why are there only four registers?

Why are there only four registers in the most common CPU (x86)? Wouldn't there be a huge increase in speed if more regis开发者_StackOverflow中文版ters were added? When will more registers be added?


The x86 has always had more than four registers. Originally, it has CS, DS, ES, SS, AX, BX, CX, DX, SI, DI, BP, SP, IP and Flags. Of those, seven (AX, BX, CX, DX, SI, DI, and BP) supported most general operations (addition, subtraction, etc.) BP and BX also supported use as "Base" register (i.e., to hold addresses for indirection). SI and DI can also be used as index registers, which are about the same as base registers, except that an instruction can generate an address from one base register and one index register, but NOT from two index registers or two base registers. At least in typical use, SP is devoted to acting as the stack pointer.

Since then, the registers have gotten larger, more have been added, and some of them have become more versatile, so (for example) you can now use any 2 general-purpose registers in 2-register addressing modes. Somewhat strangely, two segment registers (FS and GS) were added in the 386, which also allowed 32-bit segments, which mostly rendered all the segment registers nearly irrelevant. They are sometimes used for thread-local storage.

I should also add that when you do multi-tasking, multi-threading, etc., lots of registers can have a pretty serious penalty -- since you don't know which registers are in use, when you do a context switch you have to save all the registers in one task, and load all the saved registers for the next task. In a CPU like the Itanium or the SPARC with 200+ registers, this can be rather slow. Recent SPARCs devote a fair amount of chip area to optimizing this, but their task switches are still relatively slow. It's even worse on the Itanium -- one reason it's less than impressive on typical server tasks, even though it blazes on scientific computing with (very) few task switches.

Finally, of course, all this is really quite different from how a reasonably modern implementation of x86 works. Starting with the Pentium Pro, Intel decoupled the architectural registers (i.e., the ones that can be addressed in an instruction) from the implementation. To support concurrent, out of order execution, the Pentium Pro had (if memory serves) a set of 40 internal registers, and used "register renaming" so two (or more) of those might correspond to a given architectural register at a given time. For example, if you manipulate a register, then store it, load a different value, and manipulate that, the processor can detect that the load breaks the dependency chain between those two sets of instructions, so it can execute both of those manipulations simultaneously.

The Pentium Pro is now quite old, of course--and of course, AMD has also been around for a while (though their designs are reasonably similar in this respect). While the details change with new processors, having renaming capability that decouples architectural registers from physical registers is now more or less a fact of life.


There are more than 4 nowadays. If you look at the history of the x86 architecture, you see that it has evolved from the 8086 instruction set. Intel has always wanted to keep some degree of backwards compatibility in its processor line, so all subsequent processors simply extended the original A,B,C,D registers to wider numbers of bits. The original segment registers can be used for general purposes today, since there aren't really segments anymore (this is an oversimplification, but roughly true). The new x64 architecture provides some extra registers as well.


X86 is really an 8 register machine (eax/ebx/ecx/edx/esi/edi/ebp/esp). You lose 1 of those to the stack pointer/base pointer, so in practical usage you get 7, which is a bit on the low side, but even some RISC machines have 8 (SuperH and ARM in THUMB mode, because they have 16bit instruction size and more registers would be too long to encode!). For 64bit code, you upgrade from 8 to 16 (they used some leftover bits in instruction encoding AFAIK).

Still, 8 registers is just about right just enough to pipeline the CPU, which is perfect for 486s and pentiums. Some other architectures, like 6502/65816, died off in the early 32bit era because you just can't make a fast in-order pipelined version (you only have 3 registers, and only 1 for general math, so everything causes a stall!). Once you get to the generation where all your registers are renamed and everything is out of order (pentium 2 etc), then it doesn't really matter anymore and you won't get stalls if you reuse the same register over and over, and then 8 registers is quite allright.

The other use for more registers is to keep loop constants in registers, and you don't need to on x86 because every instruction can do a memory load, so you can keep all your constants in memory. This is the one feature missing from RISCs (by definition), and while they make up for it by being easier to pipeline (your longest latency is 2 cycles instead of 3) and being slightly more superscalar, your code size still increases a bit...

There are some non obvious costs to adding more registers. Your instructions get longer because you need more bits, which increases program size, which slows down your program if your code speed is limited by the memory bandwidth of reading instructions!

There's also the fact that the larger your register file is, the more multiplexer levels/general circuitry you have to go through to read a value, which increases latency, which can potentially reduce the clock speed!

This is why atm the conventional wisdom is that more than 32 registers is not really a good idea (not useful, especially on an out-of-order CPU), and 8 is just about too low (memory reads are still expensive!), and why the ideal architecture is considered to be something like 75% RISC 25% CISC, and why ARM is popular (balanced just about right!), almost all RISC architectures still have some CISC parts (address calculation in every memory OP, 32bit opcodes but not more!), why Itanium failed (128bit opcodes? 64 registers? no address calculation in memory ops???).

For all of these reasons, x86 hasn't been surpassed - sure the instruction encoding is totally insane, but aside from that, all the crazy reordering and renaming and speculative load-store insanity it does to stay efficient is actually all really useful features and are exactly what gives it its edge over various simpler in-order designs such as the POWER6. Once you reorder and rename everything, all instruction sets are more or less the same anyways, so it's very hard to make a design that's actually faster in any way, except specific cases (GPUs essentially). Once ARM cpus get as fast as x86s, they will be just as crazy and complicated as the ones Intel puts out.


  1. Registers used to be expensive to implement.
  2. Not necessarily. The number of registers on a modern x86 CPU is well beyond what the CPU reveals - the CPU maintains shadow registers which are renamed as needed based upon the instruction flow.
  3. In AMD64/x86_64. When running in 64bit mode, the number of general purpose registers is doubled (in addition to their size being doubled).

There are many architectures with more registers (ARM, PowerPC, etc). At times, they can achieve higher instruction throughput as less work is done in manipulating the stack, and instructions may be shorter (no need to reference stack variables). The counter-point is function calls become more expensive due to more register saving.


More registers doesn't necessarily make things faster, they make the CPU architecture more complicated, as the registers have to be close to other components and many instructions work only on specific registers.

But modern CPUs have more than four registers, from top of my head there are AX, BX, CX, DX, SI, DI, BP, ... then a CPU has internalregisters, for instance for PIC (processor instruction counters)


Well, there are more, the four are just special, they are 'general purpose' I think, the reasons for all this and why the rest isn't used that much is:

  • x86 wasn't exactly the best instruction set to be de facto standard, Intell just saw the potential of backwards compatibility, once AMD joined in it was only a matter of time.
  • It's the de facto standard now, so we have to live with it.
  • Adding more registers would no longer be x86, so you mean 'creating a new instruction set based on x86 with more registers'.
  • Most compilers would not use these as they can just as well compile to x86 to also target a superset of x86.
  • More registers means more expensive hardware.


The memory that registers use is really expensive to engineer in the CPU. Aside from the design difficulties in doing so, increasing the number of available registers make CPU chips more expensive.

In addition:

  • There are other methods to increase CPU performance that is more cost efficient
  • Even if more where introduced, you still need to update the instruction set and have compilers modified to use.
  • There is already more than 4 registers: From wikipedia (the worlds, eh, most reliable source)
    • AX/EAX/RAX: accumulator
    • BX/EBX/RBX: base index (ex: arrays)
    • CX/ECX/RCX: counter
    • DX/EDX/RDX: data/general
    • SI/ESI/RSI: "source index" for string operations.
    • DI/EDI/RDI: "destination index" for string operations.
    • SP/ESP/RSP: stack pointer for top address of the stack.
    • BP/EBP/RBP: stack base pointer for holding the address of the current stack frame.
    • IP/EIP/RIP: instruction pointer. Holds the program counter, the current instruction address.


Um..... (E/R)AX, (E/R)BX, (E/R)CX, (E/R)DX, (E/R)SI, (E/R)DI, (E/R)SP, (E/R)BP, (E/R)IP. I count that as more than 4. :)


It simply depends on architectural descisions. Intel Itanium has 128 general purpose and 128 floating point registers, while Intel x86 only has 8 general purpose registers and a stack of 8 floats.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜