开发者

How can I prefetch a memory region most easily?

Background: I've implemented a stochastic algorithm that requires random ordering for best convergence. Doing so obviously destroys memory locality, however. I've found that by prefetching the next iteration's data, the performance drop is minimized.

I can prefetch n cache lines using _mm_prefetch in a simple, mostly OS+compil开发者_运维知识库er-portable fashion - but what's the length of a cache line? Right now, I'm using a hardcoded value of 64, which seems to be the norm nowadays on x64 processors - but I don't know how to detect this at runtime, and a question about this last year found no simple solution.

I've seen GetLogicalProcessorInformation on windows but I'm leery of using such a complex API for something so simple, and that won't work on macs or linux anyhow.

Perhaps there's some entirely other API/intrinsic that could prefetch a memory region identified in terms of bytes (or words, or whatever) and allows me to prefetch without knowing the cache line length?

Basically, is there a reasonable alternative to _mm_prefetch with #define CACHE_LINE_LEN 64?


There's a question asking just about the same thing here. You can read it from the CPUID if you feel like delving into some assembly. You'll have to write platform specific code for this of course.

You're probably already familiar with Agner Fog's manuals for optimization which gives the cache information for many popular processors. If you are able to determine the expected CPU's you'll encounter you can just hard-code the cache line sizes and look up the CPU vendor information to set the line size.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜