The CPU core ordering/numbering in 2-chipset Intel Westmere

2023-03-14 01:00 问答作者：

I am using a Intel Westmere processor. The architecture of westmere consists of 12 CPU cores arranged on 2-chips. So it means that each chip contains 6 cores.

I don't how the CPU cores are ordere开发者_C百科d or numbered. My guess is that it can either of the following:

core 0,1,2,3,4, and 5 are on one chip and core 6,7,8,9,10, and 11 are on the second chip

core 0,2,4,6,8, and 10 are on one chip and core 1,3,5,7,9, and 11 are on the second chip

Do anyone know the ordering/numbering of the CPU cores

For more information you can try to use this tool: http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration

It is the official tool to determine that.

Here is an example run from a machine with two physical Intel X5560 (6core+6HT) running CentOS 5.3 (might be old a bit).

Package 0 Cache and Thread details

Box Description:
Cache  is cache level designator
Size   is cache size
OScpu# is cpu # as seen by OS
Core   is core#[_thread# if > 1 thread/core] inside socket
AffMsk is AffinityMask(extended hex) for core and thread
CmbMsk is Combined AffinityMask(extended hex) for hw threads sharing cache
       CmbMsk will differ from AffMsk if > 1 hw_thread/cache
Extended Hex replaces trailing zeroes with 'z#'
       where # is number of zeroes (so '8z5' is '0x800000')
L1D is Level 1 Data cache, size(KBytes)= 32,  Cores/cache= 2, Caches/package= 4
L1I is Level 1 Instruction cache, size(KBytes)= 32,  Cores/cache= 2, Caches/package= 4
L2 is Level 2 Unified cache, size(KBytes)= 256,  Cores/cache= 2, Caches/package= 4
L3 is Level 3 Unified cache, size(KBytes)= 8192,  Cores/cache= 8, Caches/package= 1
      +-----------+-----------+-----------+-----------+
Cache |  L1D      |  L1D      |  L1D      |  L1D      |
Size  |  32K      |  32K      |  32K      |  32K      |
OScpu#|    0     8|    1     9|    2    10|    3    11|
Core  |c0_t0 c0_t1|c1_t0 c1_t1|c2_t0 c2_t1|c3_t0 c3_t1|
AffMsk|    1   100|    2   200|    4   400|    8   800|
CmbMsk|  101      |  202      |  404      |  808      |
      +-----------+-----------+-----------+-----------+

Cache |  L1I      |  L1I      |  L1I      |  L1I      |
Size  |  32K      |  32K      |  32K      |  32K      |
      +-----------+-----------+-----------+-----------+

Cache |   L2      |   L2      |   L2      |   L2      |
Size  | 256K      | 256K      | 256K      | 256K      |
      +-----------+-----------+-----------+-----------+

Cache |   L3                                          |
Size  |   8M                                          |
CmbMsk|  f0f                                          |
      +-----------------------------------------------+

Combined socket AffinityMask= 0xf0f

Package 1 Cache and Thread details

Box Description:
Cache  is cache level designator
Size   is cache size
OScpu# is cpu # as seen by OS
Core   is core#[_thread# if > 1 thread/core] inside socket
AffMsk is AffinityMask(extended hex) for core and thread
CmbMsk is Combined AffinityMask(extended hex) for hw threads sharing cache
       CmbMsk will differ from AffMsk if > 1 hw_thread/cache
Extended Hex replaces trailing zeroes with 'z#'
       where # is number of zeroes (so '8z5' is '0x800000')
      +-----------+-----------+-----------+-----------+
Cache |  L1D      |  L1D      |  L1D      |  L1D      |
Size  |  32K      |  32K      |  32K      |  32K      |
OScpu#|    4    12|    5    13|    6    14|    7    15|
Core  |c0_t0 c0_t1|c1_t0 c1_t1|c2_t0 c2_t1|c3_t0 c3_t1|
AffMsk|   10   1z3|   20   2z3|   40   4z3|   80   8z3|
CmbMsk| 1010      | 2020      | 4040      | 8080      |
      +-----------+-----------+-----------+-----------+

Cache |  L1I      |  L1I      |  L1I      |  L1I      |
Size  |  32K      |  32K      |  32K      |  32K      |
      +-----------+-----------+-----------+-----------+

Cache |   L2      |   L2      |   L2      |   L2      |
Size  | 256K      | 256K      | 256K      | 256K      |
      +-----------+-----------+-----------+-----------+

Cache |   L3                                          |
Size  |   8M                                          |
CmbMsk| f0f0                                          |
      +-----------------------------------------------+

They are supposed to be interleaved so that taking successive cores spreads the load as much as possible. If 0 and 1 were on the same chip, then naive code that only used two cores would be wasting half the cache.

So numbered cores should first alternate physical CPUs. They should next alternate dies, if possible. They should then go through the cores on a single die. They should then include virtual cores, if possible.

So if you had two physical CPUs (P1, P2), each dual core (C1, C2) and each hyper-threaded (V1, V2), the cores should go: P1C1V1, P2C1V1, P1C2V1, P2C2V1, P1C1V2, P2C1V2, P1C2V2, P2C2V2

The rationale is to allow code that doesn't understand the CPU topology to just grab as many cores as it knows how to use and get optimal performance. If you could only support two cores, you want P1C1V1 and P2C1V1, not P1C1V1 and P1C1V2, or you'd be massively wasting cache and execution units.

继续阅读：intel microprocessors

The CPU core ordering/numbering in 2-chipset Intel Westmere

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？