cache-related performance optimization techniques?
there're a lot of buzz about cache-related performance issues. I have several questions about them:
- Probably most popular issues are cache locality, and false cache sharing. Any others?
- any good overview?
- are there any proven techniques to fight them开发者_开发百科?
- what are common characteristics of applications where they are real problems? computational-intensive fields (math/image processing etc.)? highly-parallel applications?
One of the more interesting ones is avoiding cache collisions. If you know a memory access pattern, you can lay out the accessed items in a way that minimizes the cache line collisions between the accessed data. You can do this for data and code.
Figuring out data access patterns is relatively hard, but you can relatively easily figure out code access patterns. Given a call graph, the sets of blocks that make up the bodies of functions, and some estimates of the transition frequencies between the blocks, you can assign code blocks to the cache in a way that maximizes the chance that the next block you need will be in some other cache line that doesn't conflict with the current one. One interesting idea was that you only had to assign code blocks that were "hot" (high probability of execution); it didn't matter much where you put the cold ones. IIRC, that means you can sort the blocks by frequency of probable execution, and then assign them in that order.
You merely need a global analysis :-} The first place I read about this, the optimizatoin was actually implemented as part of a linker, which is one way to get your hands on the entire program.
I don't remember any nice overview or set of collected techniques. The PLDI conferences tend to have research papers on this topic, though.
精彩评论