What is the cost of memory access?

2023-01-03 05:25 问答作者：

We like to think that a memory access is fast and constant, but on modern architectures/OSes, that's not necessarily true.

Consider the follow开发者_运维技巧ing C code:

int i = 34;
int *p = &i;

// do something that may or may not involve i and p

{...}

// 3 days later:

*p = 643;

What is the estimated cost of this last assignment in CPU instructions, if

i is in L1 cache,
i is in L2 cache,
i is in L3 cache,
i is in RAM proper,
i is paged out to an SSD disk,
i is paged out to a traditional disk?

Where else can i be?

Of course the numbers are not absolute, but I'm only interested in orders of magnitude. I tried searching the webs, but Google did not bless me this time.

Here's some hard numbers, demonstrating that exact timings vary from CPU family and version to version: http://www.agner.org/optimize/

These numbers are a good guide:

L1           1 ns
L2           5 ns
RAM         83 ns
Disk  13700000 ns

And as an infograph to give you the orders of magnitude:

What is the cost of memory access?

(src http://news.ycombinator.com/item?id=702713)

Norvig has some values from 2001. Things have changed some since then but I think the relative speeds are still roughly correct.

It could also be in a CPU-register. The C/C++-keyword "register" tells the CPU to keep the variable in a register, but you can't guarantee it will stay or even ever get in there.

As long as the Cache/RAM/Harddisk/SSD is not busy serving other access (e.g. DMA requests) and that the hardware is reasonably reliable, then the cost is still constant (though they may be a large constant).

When you get a cache miss, and you have to page to harddisk to read the variable, then it's just a simple harddisk read request, this cost is huge, as the CPU has to: send interrupt to the kernel for harddisk read request, send a request to harddisk, wait for the harddisk to write the data to RAM, then read the data from RAM to cache and to a register. However, this cost is still constant cost.

The actual numbers and proportions will vary depending on your hardware, and on the compatibility of your hardware (e.g. if your CPU is running on 2000 Mhz and your RAM sends data at 333 Mhz then they doesn't sync very well). The only way you can figure this out is to test it in your program.

And this is not premature optimization, this is micro-optimization. Let the compiler worry about these kind of details.

These numbers change all the time. But for rough estimates for 2010, Kathryn McKinley has nice slides on the web, which I don't feel compelled to copy here.

The search term you want is "memory hierarchy" or "memory hierarchy cost".

Where else can i be?

i and *i are different things, both of them can be located in any of the locations in your list. The pointer address might additionally still be stored in a CPU register when the assignment is made, so it doesn't need to be fetched from RAM/Cache/…

Regarding performance: this is highly CPU-dependent. Thinking in orders of magnitude, accessing RAM is worse than accessing cache entries and accessing swapped-out pages is the worst. All are a bit unpredictable because they depend on other factors as well (i.e. other processors, depending on the system architecture).

继续阅读：hardware language-agnostic memory

What is the cost of memory access?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？