How fast is access to atomic variables in C++
My question is how fast is access to atomic variables in C++ by using the C++0x actomic<> class? What goes down at the cache level. Say if one thread is just reading it, would it need to go down to the RAM or it can just read from the cache of the core in which it is executing? Assume the architecture is x86.
I am especially interested in knowing if a thread is just reading from it, while no other thread is writing at that time, would the penalty would be the same as for reading a normal variable. How atomic variables are accessed. Does each read implicity involves a write as well, as in compare-and-swap? Are atomic variables implement开发者_如何学Pythoned by using compare-and-swap?
If you want raw numbers, Anger Fog's data listings from his optimization manuals should be of use, also, intels manuals have a few section detailing the latencies for memory read/writes on multicore systems, which should include details on the slow-downs caused by bus locking needed for atomic writes.
The answer is not as simple as you perhaps expect. It depends on exact CPU model, and it depends on circumstances as well. The worst case is when you need to perform read-modify-write operation on a variable and there is a conflict (what exactly is a conflict is again CPU model dependent, but most often it is when another CPU is accessing the same cache line).
See also .NET or Windows Synchronization Primitives Performance Specifications
Atomics use special architecture support to get atomicity without forcing all reads/writes to go all the way to main memory. Basically, each core is allowed to probe the caches of other cores, so they find out about the result of other thread's operations that way.
The exact performance depends on the architecture. On x86, MANY operations were already atomic to start with, so they are free. I've seen numbers from anywhere to 10 to 100 cycles, depending on the architecture and operation. For perspective, any read from main memory is 3000-4000 cycles, so the atomics are all MUCH faster than going straight to memory on nearly all platforms.
精彩评论