Multithreaded app in multicore environment - weird load per core

2023-04-03 15:46 问答作者：

Given environment: Xeon processor with 16 cores, OS - Win 2008 server R2.

Given application (.Net/C#) before paralleling loads 1 core at almost 100%. Obvious solution to make some profit was to use .Net 4 parallel task library to speed application up X-times. Suppose the part of application that is paralleled is really appropriate - no locking occurs between threads (no shared resources, each parallel task is completely independent). But to my regret the profit is really low - 16-threaded app works approx. 2 times faster than sequential.

Here is the first illustration - 16 threads on 16 cores

It seems really weird - each task is equal but first 8 cores are loaded at almost same level (~30%) and other 8 have progressively descending load.

So, I've tried different configurations, for example 8 threads on 16 cores

Multithreaded app in multicore environment - weird load per core

Looks like 8 threads are all runnin on 8 cores and threads are not transfered from one core to another. Moreover, on 8 cores average core load is greater than on 16.

I did some research via profiler - each thread has same behaviour like in single threaded case in terms of percentage of time spent in different methods. Only (and mean) difference is absolute time - it gets greater and greater with the growth of thread number (like if the performance of each core was degrading)

So the main tendencies that I cant explain - more threads mean lower average load per core and integral cpu usage is about 20-25% at maximum. And each operation in thread runs slower with the growth of the number of threads.

Any ideas to explain this weird things?

UPD

After applying Server GC the picture has changed significantly

8 threads on 16 cores illustration:

Multithreaded app in multicore environment - weird load per core

12 threads on 16 cores illustration:

Multithreaded app in multicore environment - weird load per core

15 threads on 16 cores illustration:

Multithreaded app in multicore environment - weird load per core

So, looks like cpu usage is increasing with the growth of core number. First thing that botheres me is that i t looks like all of cores are used and threads are jumping from core to core, so overall performance is not as good.

Second thing is that app maximum speed is at 12 cores, 15 cores 开发者_JAVA百科give same results, 16 cores are even slower.

What is the possible reason?

The pattern that you are seeing is often an indication of an I/O bottleneck. If your disks or network are running full-out to provide data to these calculations (or handle the results), then you could run it on a million cores with no additional benefit. I'd suggest using Sysinternals Process Explorer to examine network and disk I/O and see if there is an issue there before trying to get further into why this isn't parallelizing well.

Since it sounds like you have no synchronization internal to your method, the problem is likely in the partitioning.

Given that you're using the TPL, work must get sent to cores based on a partitioner. However, the actual source IEnumerable<T> is not thread safe, so that requires access via a single core. This, in effect, will often lead to performance characteristics like the one you are showing above if the actual work is small compared to the number of items.

The way around this is to use the Partitioner class to pre-partition your work items into blocks, and then iterate through the "blocks" of items in parallel. For details, see How to: Speed Up Small Loop Bodies.

继续阅读：.net .net-4.0 multicore multithreading task-parallel-library

Multithreaded app in multicore environment - weird load per core

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？