Transferring < 4 byte data on a 32-bit system

2023-03-15 14:39 问答作者：

I was interested in learning about how a single bytes and words are read by the CPU from physical memory on a machine that has a bus-width of 32 bits. After searching for awhile, I came upon various sources stating that control bus byte-enable lines are used to influence the bank activation of the four individual memory banks corresponding to the bytes of the data bus.

I understand that开发者_Go百科 only addresses evenly divisible by four are placed onto the address bus. However, are all four bytes read into the data bus, followed by the discarding of some of the unused bytes? For example, let's say the CPU with a 32-bit data bus decided to read a word beginning at location 1 (I know due to data alignment, this is not probable). Would the approach to read this word be:

Place address 0 onto the address bus.
Place bytes 0, 1, 2 and 3 onto the data bus.
Discard bytes 0 and 3 (clear their respective bits)
Re-arrange bytes 1 and 2 on the data bus so that byte 1 is in the 3rd bank and byte 2 is in the 4th bank
Dispatch the altered value on the data bus to the Memory Data Registered within the control unit.

Very much a depends on architecture kind of thing. And beyond that you have the processors bus(es) and perhaps on the other side of other support chips possibly dram or pcie or other interfaces which also have their own rules and nuances. Or if the chip has multiple interfaces then each interface can vary.

These days you will see the kinds of things you describe.

Reads have more flexibility than writes because for example your memory or peripheral system can be as simple as everything is 32 bit, and the memory interface closer to the processor can select and manipulate the byte lanes without the peripheral or memory knowing. Sometimes you do have byte select lines that tell the peripheral all I care about are these byte lanes and the peripheral can choose to be dumb and simple and just put all four bytes on there or just fill in the bytes requested and leave the other lines floating or driven with stale data or driven high or low. Some systems want you to right or left justify the data so if you want the byte at address 0x01 the memory interface may expect the result byte to be on the left most or right most byte lane depending on endianness (I am not limiting this answer to x86 by the way).

Writes require some more detail, if you have a system where you are allowed (the system has defined) that you can write individual bytes but the bus is 32 bit then you can have the right or left justified thing and specify address and size. Or you can do the bytelane thing and both sides have to select data from the correct byte lanes. The peripheral being written to will need to manage the read-modify-write in this situation (writing to a 32 bit memory that contained 0x12345678 the byte 0x55 at address 0x02 should result in 0x12555678 when the write completes, the peripheral in this case has to read the 0x12345678 modify the one byte then write the 32 bits back to the memory.

x86 is a very painful platform, it allows for unaligned accesses. Many other platforms force memory alignment, making the code easier, the execution smoother, and the memory interfaces and peripherals simpler (less logic, less power, less of everything except performance). So for systems like that you still have byte lanes for individual byte accesses but dont have to be concerned about a 32 bit write to address 0x01, which is normally handled on an x86 for example with two accesses a 24 bit at address 1 and a 8 bit at address 0 which is why unaligned although legal is bad on an x86 platform, cuts your performance in half. Aligned means 32 bit accesses can only be on addresses with the lower two bits a zero, (2 to the power 2 bytes or 4 bytes), 0x0, 0x04, 0x08, 0x0c, 0x10, etc. 16 bit accesses are on two to the power 1 boundaries, addresses with the lower bit a zero, 0x0 0x02 0x04 0x06 0x08. and bytes are on byte boundaries, 2 to the power 0, 0x01 0x02 0x03 0x04, etc.

On some systems and generally how it is handled today on your x86 systems for example, when you have DRAM (DDR, etc), in particular ram interfaces with ecc capabilities, at the memory layer itself you can only write in complete N bit quantities, for example 64 bit with 8 bits of ecc, because they have to compute in this case the 8 ecc bits from the 64 data bites and write all 72 bits at once, when read back all 72 bits are read and checked, if good the 64 data bits go back to the user/program. How all of this normally works is through layers of caching, one of the beauties of caching is that it sits between the processor and the larger memory system. At least for the cases of a read, you may want to read a byte, but the cache may read 128 bits, or whatever the size of the cache line, which is normally some whole number of units of the memory. That cache line stays in cache, when you write to a byte next to the one you just read, or perhaps that same byte, it modifies that byte in cache, eventually when it has to evict that data from memory it writes the whole cache line back, in nice efficient whole units for the memory. byte writes with a cache miss and things like that still involve a read-modify-write.

With or without cache systems sometimes the memory interface on or closer to the processor manages the read modify writes and insures the peripherals or main memory is only ever accessed in a word aligned manner (where a word is 16 or 32 or 64 bits, whatever that memory system has defined as the size of that bus).

I was looking at the pinout of an old x86 yesterday and forgot that the data and some of the address bus are on the same pins. There were address periods in the memory cycle and data portions, and the processor and peripherals would have to change directions or at least use some method to manage the bus. The point here is that within the history of the x86 architecture things have changed and keep changing. So this answer being architecture specific cannot be limited to just x86, you have to be very specific to the pin/pad layout, plus the chipset that goes with it. ARM also over its life has changed and offers the same cores sometimes with different interfaces options (AXI 32 bit or AXI 64 bit for example).

Dram also adds the complication of refreshing so someone somewhere has to be constantly running through the memory keeping it alive, in between accesses driven by the processor or caches. Likewise, ecc adds the complication that if there is a single bit error, someone ideally needs to perform a write back with the corrected bit. So a byte read may cause a 32 bit write independent of what the software is doing. If there is no write back mechanism then it is pretty silly to have ecc protection, just use parity, or do without.

The wishbone interface which I am actually not familiar with, but see it used all over opencores, is probably something you can look at and get a feel for. Likewise any of the x86 family will have or at least the older family has hardware reference manuals that describe the timing of the memory/I/O interface on the edge of the chip. For ARM you can google for amba and axi, and if you get the trm's (technical reference manuals)(Available for free from arms website) for their caches you may get some insight on how their memory interfaces work. Likewise the DDR pinout and details should be available from the vendors or perhaps even on wikipedia. DDR and QDR get even more fun in that you get into things happening on the half clock cycle, traditionally you only changed busses once per clock cycle or multiple clock cycles, now we are into things changing in the middle of the clock cycle and worse, there can be a clock out to the memory device and a separate clock back from the memory device to allow for trace length losses and try to help with this half clock cycle thing. It is very messy which makes it easier to go back to the old chips (808x, 6502, 6800, z80, etc) and look at their memory busses, and work your way forward. Likewise look at ISA before you try to dive into PCI or PCIe.

Wow, long answer, sorry about that, hope it is helpful.

继续阅读：32-bit byte low-level memory transfer

Transferring < 4 byte data on a 32-bit system

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？