Combining stores/loads of consecutive atomic variables
Referring to a (slightly dated) paper by Hans Boehm, under "Atomic Operations". It mentions that the memory model (proposed at the time) would not prevent an optimizing compiler from combining a sequence of loads, or stores, on the same variable from being combined into a single load. His example is as follows (updated to hopefully correct current syntax):
Given
atomic<int> v;
The code
while( v.load( memory_order_acquire ) ) { ... }
Could be optimized to:
int a = v.load(memory_order_acquire);
while(a) { ... }
Obviously this would be bad, as he 开发者_JS百科states. Now my question is, as the paper is a bit old, does the current C++0x memory model prevent this type of optimization, or is it still technically allowed?
My reading of the standard would seem to lean towards it being disallowed, but the use "acquire" semantics makes it less clear. For example if it were "seq_cst" the model seems to guarantee that the load must partake in a total ordering on the access and loading the value only once would thus seem to violate ordering (as it breaks the sequence happens before relationship).
For acquire I interpret 29.3.2 to mean that this optimization can not occur, since any "release" operation must be observed by the "acquire" operation. Doing only one acquire would seem not valid.
So my question is whether the current model (in the pending standard) would disallow this type of optimization? And if yes, then which part specifically forbids it? If no, does using a volatile
atomic solve the problem?
And for bonus, if the load operation has a "relaxed" ordering is the optimization then allowed?
The C++0x standard attempts to outlaw this optimization.
The relevant words are from 29.3p13:
Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.
If the thread that is doing the load only ever issues one load instruction then this is violated, as if it misses the write the first time, it will never see it. It doesn't matter which memory ordering is used for the load, it is the same for both memory_order_seq_cst
and memory_order_relaxed
.
However, the following optimization is allowed, unless there is something in the loop that forces an ordering:
while( v.load( memory_order_acquire ) ) {
for(unsigned __temp=0;__temp<100;++__temp) {
// original loop body goes here
}
}
i.e. the compiler can generate code that executes the actual loads arbitrarily infrequently, provided it still executes them. This is even permitted for memory_order_seq_cst
unless there are other memory_order_seq_cst
operations in the loop, since this is equivalent to running 100 iterations between any memory accesses by other threads.
As an aside, the use of memory_order_acquire
doesn't have the effect you describe --- it is not required to see release operations (other than by 29.3p13 quoted above), just that if it does see the release operation then it imposes visibility constraints on other accesses.
Right from the very paper you're linking:
Volatiles guarantee that the right number of memory operations are performed.
The standard says essentially the same:
Access to volatile objects are evaluated strictly according to the rules of the abstract machine.
This has always been the case, since the very first C compiler by Dennis Ritchie I think. It has to be this way because memory mapped I/O registers won't work otherwise. To read two characters from your keyboard, you need to read the corresponding memory mapped register twice. If the compiler had a different idea about the number of reads it has to perform, that would be too bad!
精彩评论