What is the cost of memory access?
We like to think that a memory access is fast and constant, but on modern architectures/OSes, that's not necessarily true.
Consider the follow开发者_运维技巧ing C code:
int i = 34;
int *p = &i;
// do something that may or may not involve i and p
{...}
// 3 days later:
*p = 643;
What is the estimated cost of this last assignment in CPU instructions, if
i
is in L1 cache,i
is in L2 cache,i
is in L3 cache,i
is in RAM proper,i
is paged out to an SSD disk,i
is paged out to a traditional disk?
Where else can i
be?
Of course the numbers are not absolute, but I'm only interested in orders of magnitude. I tried searching the webs, but Google did not bless me this time.
Here's some hard numbers, demonstrating that exact timings vary from CPU family and version to version: http://www.agner.org/optimize/
These numbers are a good guide:
L1 1 ns
L2 5 ns
RAM 83 ns
Disk 13700000 ns
And as an infograph to give you the orders of magnitude:
(src http://news.ycombinator.com/item?id=702713)Norvig has some values from 2001. Things have changed some since then but I think the relative speeds are still roughly correct.
It could also be in a CPU-register. The C/C++-keyword "register" tells the CPU to keep the variable in a register, but you can't guarantee it will stay or even ever get in there.
As long as the Cache/RAM/Harddisk/SSD is not busy serving other access (e.g. DMA requests) and that the hardware is reasonably reliable, then the cost is still constant (though they may be a large constant).
When you get a cache miss, and you have to page to harddisk to read the variable, then it's just a simple harddisk read request, this cost is huge, as the CPU has to: send interrupt to the kernel for harddisk read request, send a request to harddisk, wait for the harddisk to write the data to RAM, then read the data from RAM to cache and to a register. However, this cost is still constant cost.
The actual numbers and proportions will vary depending on your hardware, and on the compatibility of your hardware (e.g. if your CPU is running on 2000 Mhz and your RAM sends data at 333 Mhz then they doesn't sync very well). The only way you can figure this out is to test it in your program.
And this is not premature optimization, this is micro-optimization. Let the compiler worry about these kind of details.
These numbers change all the time. But for rough estimates for 2010, Kathryn McKinley has nice slides on the web, which I don't feel compelled to copy here.
The search term you want is "memory hierarchy" or "memory hierarchy cost".
Where else can i be?
i
and *i
are different things, both of them can be located in any of the locations in your list. The pointer address might additionally still be stored in a CPU register when the assignment is made, so it doesn't need to be fetched from RAM/Cache/…
Regarding performance: this is highly CPU-dependent. Thinking in orders of magnitude, accessing RAM is worse than accessing cache entries and accessing swapped-out pages is the worst. All are a bit unpredictable because they depend on other factors as well (i.e. other processors, depending on the system architecture).
精彩评论