Atomic operations: under the hood
How do atomic operations work, under the hood?
Are atomic operations so-called "wait-free"?
I'm seeking for a description of the "least c开发者_Python百科ommon divisor" of atomic operations. What do all atomic operations share?
If we're talking about atomic operations that are used by synchronization mechanism (mutexes, semaphores etc) they have to be supported by the OS on single CPU machines and by the hardware on multi CPU.
On a single CPU machine an instruction sequence can be made "atomic" in the sense that it cannot be interrupted in the middle (for e.g. the timer interrupt which gives a switch to another thread) if interrupts are shut off. This means that synchronization primitives can be written quite simply once the CPU enters kernel mode and can access the interrupt control registers.
In a multi core machine it is more complex. Then the instructions have to be truly atomic, across all CPUs. This requires all CPUs, not only the one executing the atomic instructions, to flush relevant parts of their cache to RAM. This flushing is what makes synchronization so expensive on these architectures.
The instructions themselves take the form of "Bit test and set" in one operation. This is enough to implement a simple mutex. Even if two threads on different CPU/cores are executing the test and set operation on the same time on the same address, only one will get the result that the bit was unset and is now set. That thread is the one that owns the mutex.
Atomicity as a concept occurs in several places, I suspect you're thinking about atomic operations in code, but there are other meanings.
One fundamental property of a Database transaction is Atomicity, see a description of ACID properties of transactions.
In this case, you have lots of database cleverness, locks, and so on, which almost certainly imply waiting when two threads of control (or two processes) want to get at the same data.
When you come to lines of code I guess you're thinking about a declaration (in some fictitious language)
global int x = 7;
in one thread
x = 25000;
print x;
and in another
print x;
Can we say anything about what the second thread will print? We might accept either 7 or 25000, we'd be less happy to get a number that was the high order byte of 25,000 and a low order byte of 7 - which conceptually would be the result of a non-atomic integer assignment.
Different programming languages are free to define whatever semantics they wish, it's conceivable that some would just accept whatever natural behaviors that the CPU they work on (say 32-bit int was atomic, 64 long was not) or they might do something much cleverer, and if the CPU itself doesn't provide atomic operations then I don't see much alternative to some kind of waiting if they want to fake atomicity - eg. Java synchronized keyword.
Depends on the atomic operation you're talking about. If you're talking about ISA-level stuff, "test and set" instructions are included in some popular ISAs, I think.
Can num++ be atomic for 'int num'? explains how modern x86 CPUs implement atomic RMWs by holding a cache lock (delaying cache-coherency responses for that cache line, so this core maintains exclusive ownership of it from the load to the store of the microcoded instruction).
Wait-free vs. lock-free is debatable on x86 or ARMv8.1 depending whether you're counting in retries or in clock cycles. But on LL/SC machines, RMW operations like .fetch_add
need a retry loop because the fail on contention instead of having hardware wait. See Anything in std::atomic is wait-free?
Pure-load or pure-store operations are easier for CPUs to handle atomically; it just happens naturally in assembly language for loads and stores that are aligned and no wider than the load/store execution units, and/or the chunk size for transfers between caches. (In higher-level languages, you need stuff like C++ std::atomic<int64_t>
to make sure the compiler actually uses a single instruction and doesn't optimize away the memory access. Also for ordering wrt. other operations.) See Atomicity on x86 for more details about pure-loads and pure-stores. Also pretty much applies to other architectures, except the choice of examples.
Atomicity for larger operations like database transactions isn't provided directly by hardware, rather by software. (Unless you have transactional memory like Intel's TSX (https://www.realworldtech.com/haswell-tm/), or PowerPC HTM in POWER8.)
This answer is intentionally just summarizing and linking to other answers for deeper details because much has already been written, no need to repeat it all here. Seriously go read some of the answers I linked on other questions if you're interested in a more low-level understanding of CPU hardware atomic operations.
精彩评论