开发者

How will applications be scheduled on hyper-threading enabled multi-core machines?

I'm trying to gain a better understanding of how hyper-threading enabled multi-core processors work. Let's say I have an app which can be compiled with MPI or OpenMP or MPI+OpenMP. I wonder how it will be scheduled on a CentOS 5.3 box with four Xeon X7560 @ 2.27GHz processors and each processor core has Hyper-Threading enabled.

The processor is numbered from 0 to 63 in /proc/cpuinfo. For my understanding, there are FOUR 8-cores physical processors, the total PHYSICAL CORES are 32, each processor core has Hyper-Threading enabled, the total LOGICAL processors are 64.

  1. Compiled with MPICH2 How many physical cores will be used if I run with mpirun -np 16? Does it get divided up amongst the available 16 PHYSICAL cores or 16 LOGICAL processors ( 8 PHYSICAL cores using hyper-threading)?

  2. compiled with OpenMP How many physical cores will be used if I set OMP_NUM_THREADS=16? Does it will use 16 LOGICAL processors开发者_如何学JAVA ?

  3. Compiled with MPICH2+OpenMP How many physical cores will be used if I set OMP_NUM_THREADS=16 and run with mpirun -np 16?

  4. Compiled with OpenMPI

OpenMPI has two runtime options

-cpu-set which specifies logical cpus allocated to the job, -cpu-per-proc which specifies number of cpu to use for each process.

If run with mpirun -np 16 -cpu-set 0-15, will it only use 8 PHYSICAL cores ?

If run with mpirun -np 16 -cpu-set 0-31 -cpu-per-proc 2, how it will be scheduled?

Thanks

Jerry


I'd expect any sensible scheduler to prefer running threads on different physical processors if possible. Then I'd expect it to prefer different physical cores. Finally, if it must, it would start using the hyperthreaded second thread on each physical core.

Basically when threads have to share processor resources they slow down. So the optimal strategy is usually to minimise the amount of processor resource sharing. This is the right strategy for CPU bound processes and that's normally what an OS assumes it is dealing with.


I would hazard a guess that the scheduler will try to keep threads in one process on the same physical cores. So if you had sixteen threads, they would be on the smallest number of physical cores. The reason for this would be cache locality; it would be considered threads from the same process would be more likely to touch the same memory, than threads from different processes. (For example, the costs of cache line invalidation across cores is high, but that cost does not occur for logical processors in the same core).


As you can see from the other two answers the ideal scheduling policy varies depending on what activity the threads are doing.

Threads working on completely different data benefit from more separation. These threads would ideally be scheduled in separate NUMA domains and physical cores.

Threads working on the same data will benefit from cache locality, so the idea policy is to schedule them close together so they share cache.

Threads that work on the same data and experience a large amount of pipeline stalls benefit from sharing a hyperthread core. Each thread can run until it stalls, at which point the other thread can run. Threads that run without stalls are only hurt by hyperthreading and should be run on different cores.

Making the ideal scheduling decision relies on a lot of data collection and a lot of decision making. A large danger in OS design is to make the thread scheduling too smart. If the OS spends a lot of processor time trying to find the ideal place to run a thread, it's wasting time it could be using to run the thread.

So often it's more efficient to use a simplified thread scheduler and if needed, let the program specify its own policy. This is the thread affinity setting.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜