why is multithread access to data in same cacheline has low cache miss rate?

2023-01-23 14:41 问答作者：

Its been noted that access to data elements that fall in same cache-line performs badly due to ping-pong effect. However, the code I wrote doesn't and tested with valgrind --tool=cachegr开发者_C百科ind doesn't show this behaviour. Would appreciate any insights regarding this?.

Attached below is function that each pthread executes:

   void test_cache(void* arg)    
   {    
    long id = (long) arg;  
    uint32_t idx = (uint32_t) id;  
    uint32_t ctr = 0;  
    uint32_t total_sum = 0;  
    for(; ctr < 500000; ++ctr)  
    {  
      total_sum += shared[idx];  
      AO_fetch_and_add(&shared[idx], idx);    
    }
    printf("%d %d,\n",id, total_sum);   
}

Reads are ok (once the cache is filled), writes are not, as that, depending on architecture, will cause all other processors to invalidate that cache line and fetch the line from memory. (Systems that do cache line snooping could avoid that penalty).

The initial cache line load would also have a penalty as a load per cache is required (shared caches are better), with the situation being the worst in NUMA (fetch from distant processor).

If you are running on a "dual core" whatever, you are hitting shared cache. You need separate physical CPUs to see the ping-pong effect. Include your hardware spec in the question.

继续阅读：caching pthreads

why is multithread access to data in same cacheline has low cache miss rate?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？