Moving data from memory to memory in micro controllers
Why can't we move data directly from a memory location into another memory location.
Pardon me if I am asking a dumb question, but I think this is a true situation开发者_如何转开发, at least for the ones I've encountered (8085,8086 n 80386)
I am not really looking for a solution for moving the data (like for eg, using movs n all), but actually the reason for this anomaly.
What about MOVS? It moves a 8/16/32-bit value addressed by esi to the location addressed by edi.
The basic reason is that most instruction sets allow one register operand, and one memory operand, and sticking to this format makes designing the instruction decoder easier. It also makes the execution engine inside the CPU easier, because the instruction can issue typically a memory operation to just one memory location, and at most one register block read or write.
To do a memory-to-memory instruction directly requires two memory locations to be designated. This is awkward given a register/memory instruction format. Given the performance of the machines, there is little justification for modifying the instruction format just for this.
A hack used by more modern CPUs is to provide some type of block-move instruction, in which the source and destination locations are located in registers (for the X86 this is ESI and EDI respectively). Then an instruction can just designate two registers (or in the case of the x86, instructions that simply know which registers). That solves the instruction decoding problem.
The instruction execution problem is a little harder but people have lots of transistors. Organizing a read indirect from one register, and write indirect through another, and increment both is awkward in silicon but that just chews up some transistors. Now you can have an instruction that moves from memory to memory, just as you asked. One of the other posters noted for the X86 there are instrucitons (MOVB, MOVW, MOVS, ...) that do exactly this, one memory byte/word/... at a time.
Moving a block of memory would be ideal because the CPU can generate high-bandwith reads and writes. The x86 does this with with a REP (repeat) prefix on MOV- to move a larger block.
But if a single insturction can do this, you have the problem that it might take a long time to execute (how long to move 1Gb? --> millions of clock cycles!) and that ruins the interrupt response rate of the CPU.
The x86 solves this by allowing REP MOV- to be interrupted, with the PC being set back to the beginning of the instruction. By updating the registers during the move appropriately, you can interrupt and restart the REP MOV- instruction having both a fast block move and high interrupt response rates. More transistors down the tube.
The RISC guys figured out that all this complexity for a block move instruction was mostly not worth it. You can code a dumb loop (even the x86):
copy: MOV EAX,[ESI]
ADD ESI,4
MOV [EDI],EAX
ADD EDI,4
DEC ECX
JNE copy
which does the same basic thing as REP MOV- . Pretty much the modern CPUs (x86, others) execute this so fast (superscalar, etc.) that the bus is just as utilized as the custom move instruction, but now you don't need all those wasted transistors (or corresponding heat).
Most CPU varieties don't allow memory-to-memory moves. Normally the CPU can access only one memory location at at time, which means you need a temporary spot to store the value when moving it (a general purpose register, usually). If you think about it, moving directly from one memory location to another would require that the CPU be able to access two different spots in RAM simultaneously - that means two full memory controllers at least, and even then, the chances they'd "play nice" enough to access the same RAM would be pretty bad. The chip designers might have been able to pull some tricks to allow direct copies from one RAM chip to another, but that would be a pretty special-application kind of feature that would just add cost and complexity to solve a very uncommon problem.
You might be able to use some special DMA hardware to make it look to your program like memory is being moved without that temporary storage, at least from the perspective of your CPU.
You have one set of address lines, one set of data lines, and a few control lines between the CPU and RAM. You can't physically move directly from memory to memory without a second set of address lines and a whole bunch of complicated logic inside the RAM. Therefore, we have to store it temporarily in a register.
You could make an instruction that does the load and store together and looks like one instruction to the programmer, but there are other considerations like instruction size, non-duplication of effective address calculation logic, pipelining, etc. that make it desirable to keep it more simple.
Memory-memory machines turn out to be slower in general than load-store machines. This was deduced/figured out/invented by the RISC researchers in 1980ish or so. So the older architectures (VAX/OS360) tend to have memory-memory architectures; newer machines do load-store.
Another interesting variant is stack machines; they seem to always be around as a minority.
精彩评论