Understanding CPU cache and cache line
I am trying to understand how CPU cache is operating. Lets say we have this configuration (as an example).
- Cache size 1024 bytes
- Cache line 32 bytes
- 1024/32 = 32 cache lines all together.
- Singel cache line can store 32/4 = 8 ints.
1) According to these configuration length of tag should be 32-5=27 bits, and size of index 5 bits (2^5 = 32 addresses for each byte in cache line).
If total cache size is 1024 and t开发者_运维技巧here are 32 cache lines, where is tags+indexes are stored? (There is another 4*32 = 128 bytes.) Does it means that actual size of the cache is 1024+128 = 1152?
2) If cache line is 32 bytes in this example, this means that 32 bytes getting copied in cache whenerever CPU need to get new byte from RAM. Am I right to assume that cache line position of the requested byte will be determined by its adress?
This is what I mean: if CPU requested byte at [FF FF 00 08]
, then available cache line will be filled with bytes from [FF FF 00 00]
to [FF FF 00 1F]
. And our requseted single byte will be at position [08]
.
3) If previous statement is correct, does it mean that 5 bits that used for index, are technically not needed since all 32 bytes are in the cache line anyway?
Please let me know if I got something wrong. Thanks
A cache consists of data and tag RAM, arranged as a compromise of access time vs efficiency and physical layout. You're missing an important stat: number of ways (sets). You rarely have 1-way caches, because they perform pathologically badly with simple patterns. Anyway:
1) Yes, tags take extra space. This is part of the design compromise - you don't want it to be a large fraction of the total area, and why line size isn't just 1 byte or 1 word. Also, all tags for an index are simultaneously accessed, and that can affect efficiency and layout if there's a large number of ways. The size is slightly bigger than your estimate. There's usually also a few bits extra bits to mark validity and sometimes hints. More ways and smaller lines needs a larger fraction taken up by tags, so generally lines are large (32+ bytes) and ways are small (4-16).
2) Yes. Some caches also do a "critical word first" fetch, where they start with the word that caused the line fill, then fetch the rest. This reduces the number of cycles the CPU is waiting for the data it actually asked for. Some caches will "write thru" and not allocate a line if you miss on a write, which avoids having to read the entire cache line first, before writing to it (this isn't always a win).
3) The tags won't store the lower 5 bits as they're not needed to match a cache line. They just index into individual lines.
Wikipedia has a pretty good, if a bit intense, write-up on caches: http://en.wikipedia.org/wiki/CPU_cache - see "Implementation". There's a diagram of how data and tags are split. Me, I think everyone should learn this stuff because you really can improve performance of code when you know what the underlying machine is actually capable of.
- The cache metadata is typically not counted as a part of the cache itself. It might not even be stored in the same part of the CPU (it could be in another cache, implemented using special CPU registers, etc).
- This depends on whether your CPU will fetch unaligned addresses. If it will only fetch aligned addresses, then the example you gave would be correct. If the CPU fetches unaligned addresses, then it might fetch the range 0xFFFF0008 to 0xFFFF0027.
- The index bytes are still useful, even when cache access is aligned. This gives the CPU a shorthand method for referencing a byte within a cache line that it can use in its internal bookkeeping. You could get the same information by knowing the address associated with the cache line and the address associated with the byte, but that's a whole lot more information to carry around.
Different CPUs implement caching very differently. For the best answer to your question, please give some additional details about the particular CPU (type, model, etc) that you are talking about.
This is based on my vague memory, you should read books like "Computer Architecture: A Quantitative Approach" by Hennessey and Patterson. Great book.
Assuming a 32-bit CPU... (otherwise your figures would need to use >4 bytes (maybe <8 bytes since some/most 64-bit CPU don't have all 64 bits of address line used)) for the address.
1) I believe it's at least 4*32 bytes. Depending on the CPU, the chip architects may have decided to keep track of other info besides the full address. But it's usually not considered part of the cache.
2) Yes, but how that mapping is done is different. See Wikipedia - CPU cache - associativity There's the simple direct mapped cache and the more complex associative mapped cache. You want to avoid the case where some code needs two piece of information but the two addresses map to the exact same cache line.
精彩评论