Performance gain issue in multicore application

2023-01-10 14:06 问答作者：

I have a serial(nonparallel) application written in C. I have modified and re-written it using Intel Threading Building Blocks. When I run this parallel version on an AMD Phenom II machine which is a quad-core machine, I get a performance gain of more开发者_JAVA技巧 than 4X which conflicts with the Amdahl's law. Can anyone give me a reason why this is happening?

Thanks, Rakesh.

If you rewrite the program, you could make it more efficient. Amdahl's law only limits the amount of speedup due to parallelism, not how much faster you can make your code by improving it.

You could be realizing the effects of having 4x the cache, since now you can use all four procs. Or maybe having less contention with other processes running on your machine. Or you accidentally fixed a mispredicted branch.

TL/DR: it happens.

It's known as "super linear speedup", and can occur for a variety of reasons, though the most common root cause is probably cache behaviour. Usually when superlinear speedup occurs, it's a clue that you could make the sequential version more efficient.

For example, suppose you have a processor where some of the cores share an L2 cache (a common architecture these days), and suppose your algorithm makes multiple traversals of a large data structure. If you perform the traversals in sequence, then each traversal will have to pull the data into the L2 cache afresh, whereas if you perform the traversals in parallel then you may well avoid a large number of those misses, as long as the traversals run in step (getting out of step is a good source of unpredictable performance here). To make the sequential verison more efficient you could interleave the traversals, thereby improving locality.

Can anyone give me a reason why this is happening?

In a word, caches.

Each core has its own L1 cache and, therefore, simply by using more cores you have increased the amount of cache in-play which has, in turn, brought more of your data closer to where it will be processed. That alone can improve performance significantly (as if you had a bigger cache on a single core). When combined with near linear speedup from effective parallelization, you can see superlinear performance improvements overall.

继续阅读：multicore tbb

Performance gain issue in multicore application

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？