Word Tearing on x86

2022-12-08 23:15 问答作者：

Under what circumstances is it unsafe to have two different threads simultaneously writing to adjacent elements of the same array on x86? I understand that on some DS9K-like architectures with insane memory models this can cause word tearing, but on x86 single bytes are addressable. For example, in the D programming language real is an 80-bit floating point typ开发者_Go百科e on x86. Would it be safe to do something like:

real[] nums = new real[4];  // Assume new returns a 16-byte aligned block.
foreach(i; 0..4) {
    // Create a new thread and have it do stuff and 
    // write results to index i of nums.
}

Note: I know that, even if this is safe, it can sometimes cause false sharing problems with the cache, leading to slow performance. However, for the use cases I have in mind writes will be infrequent enough for this not to matter in practice.

Edit: Don't worry about reading back the values that are written. The assumption is that there will be synchronization before any values are read. I only care about the safety of writing in this way.

The x86 has coherent caches. The last processor to write to a cache line acquires the whole thing and does a write to the cache. This ensures that single byte and 4 byte values written on corresponding values are atomically updated.

That's different than "its safe". If the processors each only write to bytes/DWORDS "owned" by that processor by design, then the updates will be correct. In practice, you want one processor to read values written by others, and that requires synchronization.

It is also different than it is "efficient". If several processors can each write to different places in the cache line, then the cache line can ping-pong between CPUs and that's a lot more expensive than if it the cache line goes to a single CPU and stays there. The usual rule is to put processor-specific data in its own cache line. Of course, if you are only going to write to just that one word, just once, and the amount of work is significant compared to a cache-line move, then your performance will be acceptable.

I might be missing something, but I don't foresee any issues. x86 architecture writes only what it needs, it doesn't do any writing outside the specified values. Cache-snooping handles the cache issues.

You are asking about x86 specifics, yet your example is in some high-level language. Your specific question about D can only be answered by the people who wrote the compiler you are using, or perhaps the D language specification. Java for example requires that array element access must not cause tearing.

Regarding x86, atomicity of operations is specified in Section 8.1 of Intel's Software Developer's Manual Volume 3A. According to it, atomic store operations include: storing a byte, storing word-aligned word and dword-aligned dword on all x86 CPUs. It also specifies that on P6 and later CPUs unaligned 16-, 32- and 64-bit access to cached memory within a cache line is atomic.

继续阅读：d multithreading parallel-processing race-condition thread-safety

Word Tearing on x86

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？