Prefetching aligned memory
I have some threaded C code that req开发者_StackOverflowuires 64 byte alignment of the processed data structure. How will this alignment interact with prefetch instructions like the gcc __builtin_prefetch? Will the effects of prefetching be the same as using a non-aligned array or not?
Note that I am using memalign to obtain the aligned array.
Thanks.
The answer to this one is highly implementation-dependent.
However, on x86 and x86_64, GCC implements __builtin_prefetch
as a single PREFETCH
assembly instruction.
According to Intel's documentation (search for "PREFETCH"):
Fetches the line of data from memory that contains the byte specified with the source operand to a location in the cache hierarchy specified by a locality hint:
I am 99% sure the AMD version behaves the same way, but I am too busy to check...
So if the memory operand is unaligned, it will effectively be rounded down to a multiple of 64 bytes and that cache line will be prefetched. (Well, 64 bytes on all the current CPUs I know of. The instruction set reference only guaranteed to be "a minimum of 32 bytes". Not sure why they bothered saying that; in any situation where it makes sense to use this gadget, you have to be assuming a lot about the particular CPU already.)
精彩评论