Atomic operations: under the hood

2023-03-24 03:44 问答作者：

How do atomic operations work, under the hood?

Are atomic operations so-called "wait-free"?

I'm seeking for a description of the "least c开发者_Python百科ommon divisor" of atomic operations. What do all atomic operations share?

If we're talking about atomic operations that are used by synchronization mechanism (mutexes, semaphores etc) they have to be supported by the OS on single CPU machines and by the hardware on multi CPU.

On a single CPU machine an instruction sequence can be made "atomic" in the sense that it cannot be interrupted in the middle (for e.g. the timer interrupt which gives a switch to another thread) if interrupts are shut off. This means that synchronization primitives can be written quite simply once the CPU enters kernel mode and can access the interrupt control registers.

In a multi core machine it is more complex. Then the instructions have to be truly atomic, across all CPUs. This requires all CPUs, not only the one executing the atomic instructions, to flush relevant parts of their cache to RAM. This flushing is what makes synchronization so expensive on these architectures.

The instructions themselves take the form of "Bit test and set" in one operation. This is enough to implement a simple mutex. Even if two threads on different CPU/cores are executing the test and set operation on the same time on the same address, only one will get the result that the bit was unset and is now set. That thread is the one that owns the mutex.

Atomicity as a concept occurs in several places, I suspect you're thinking about atomic operations in code, but there are other meanings.

One fundamental property of a Database transaction is Atomicity, see a description of ACID properties of transactions.

In this case, you have lots of database cleverness, locks, and so on, which almost certainly imply waiting when two threads of control (or two processes) want to get at the same data.

When you come to lines of code I guess you're thinking about a declaration (in some fictitious language)

global int x = 7;

in one thread

x = 25000;

print x;

and in another

print x;

Can we say anything about what the second thread will print? We might accept either 7 or 25000, we'd be less happy to get a number that was the high order byte of 25,000 and a low order byte of 7 - which conceptually would be the result of a non-atomic integer assignment.

Different programming languages are free to define whatever semantics they wish, it's conceivable that some would just accept whatever natural behaviors that the CPU they work on (say 32-bit int was atomic, 64 long was not) or they might do something much cleverer, and if the CPU itself doesn't provide atomic operations then I don't see much alternative to some kind of waiting if they want to fake atomicity - eg. Java synchronized keyword.

Depends on the atomic operation you're talking about. If you're talking about ISA-level stuff, "test and set" instructions are included in some popular ISAs, I think.

Can num++ be atomic for 'int num'? explains how modern x86 CPUs implement atomic RMWs by holding a cache lock (delaying cache-coherency responses for that cache line, so this core maintains exclusive ownership of it from the load to the store of the microcoded instruction).

Wait-free vs. lock-free is debatable on x86 or ARMv8.1 depending whether you're counting in retries or in clock cycles. But on LL/SC machines, RMW operations like .fetch_add need a retry loop because the fail on contention instead of having hardware wait. See Anything in std::atomic is wait-free?

Pure-load or pure-store operations are easier for CPUs to handle atomically; it just happens naturally in assembly language for loads and stores that are aligned and no wider than the load/store execution units, and/or the chunk size for transfers between caches. (In higher-level languages, you need stuff like C++ std::atomic<int64_t> to make sure the compiler actually uses a single instruction and doesn't optimize away the memory access. Also for ordering wrt. other operations.) See Atomicity on x86 for more details about pure-loads and pure-stores. Also pretty much applies to other architectures, except the choice of examples.

Atomicity for larger operations like database transactions isn't provided directly by hardware, rather by software. (Unless you have transactional memory like Intel's TSX (https://www.realworldtech.com/haswell-tm/), or PowerPC HTM in POWER8.)

This answer is intentionally just summarizing and linking to other answers for deeper details because much has already been written, no need to repeat it all here. Seriously go read some of the answers I linked on other questions if you're interested in a more low-level understanding of CPU hardware atomic operations.

继续阅读：atomic computer-science

Atomic operations: under the hood

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？