InterlockedExchange and memory visibility
I have read the article Synchronization and Multiprocessor Issues and I have a question about InterlockedCompareExchange and InterlockedExchange. The question is actually about the last example in the article. They have two variables iValue
and fValueHasBeenComputed
and in CacheComputedValue()
they modify each of them using InterlockedExchange
:
InterlockedExchange ((LONG*)&iValue, (LONG)ComputeValue()); // don't understand
InterlockedExchange ((LONG*)&fValueHasBeenComputed, TRUE); // understand
I understand that I can use InterlockedExchange
for modifing iValue
but is it enought just to do
iValue = ComputeValue();
So is it actually necessary to use InterlockedExchange
to set iValue? Or other threads will see iValue correctly even if iValue = ComputeValue();
. I mean the other threads will see iValue correctly because there is InterlockedExchange
after it.
There is also the paper A Principle-Based Sequential Memory Model for Microsoft Native Code Platforms. There is the 3.1.1 example with more or less the same code. One of the reco开发者_JAVA百科mendation Make y interlocked
. Notice - not both y
and x
.
Update
Just to clarify the question. The issue is that I see a contradiction. The example from "Synchronization and Multiprocessor Issues" uses twoInterlockedExchange
. On the contrary, in the example 3.1.1 "Basic Reodering" (which I think is quite similar to the first example) Herb Sutter gives this recomendation
"Make y interlocked: If y is interlocked, then there is no race on y because it is atomically updatable,and there is no race on x because a -> b -> d."
. In this draft Herb do not use two interlocked variable (If I am right he means use InterlockedExchange
only for y
).
They did that to prevent partial reads/writes if the address of iValue
is not aligned to an address that guarantees atomic access. this problem would arise when two or more physical thread try to write the value concurrently, or one reads and one tries to write at the same time.
As a secondary point, it should be noted that stores are not always globally visible, they are only going to be visible when serialized, either by a fence or by a bus lock.
You simply get an atomic operation with InterlockedExchange
. Why you need it?
Cause InterlockedExchange
does 2 things.
- Replaces a value of variable
- Returns an old value
If you do the same things in 2 operations (Thus first check value then replace) you can get screwed if other instructions (on another thread) occur between these 2.
And you also prevent data races on this value. here you get a good explanation why read/write on a LONG is not atomic
There are two plausible resolutions to the contradiction you've observed.
One is that the second document is simply wrong in that particular respect. It is, after all, a draft. I note that the example you refer to specifically states that the programmer cannot rely on the writes to be atomic, which means that both writes must indeed be interlocked.
The other is that the additional interlock might not actually be required in that particular example, because it is a very special case: only a single bit of the variable is being changed. However, the specification being developed doesn't appear to mention this as a premise, so I doubt that this is intentional.
I think this discussion has the answer to the question: Implicit Memory Barriers.
Question: does calling InterlockedExchange (implicit full fence) on T1 and T2, gurentess that T2 will "See" the write done by T1 before the fence? (A, B and C variables), even though those variables are not plance on the same cache-line as Foo and Bar ?
Answer: Yes -- the full fence generated by the InterlockedExchange will guarantee that the writes to A, B, and C are not reordered past the fence implicit in the InterlockedExchange call. This is the point of memory barrier semantics. They do not need to be on the same cache line.
Memory Barriers: a Hardware View for Software Hackers and Lockless Programming Considerations for Xbox 360 and Microsoft Windows are also insteresting.
精彩评论