What does `std::kill_dependency` do, and why would I want to use it?
I've been reading about the 开发者_如何学Pythonnew C++11 memory model and I've come upon the std::kill_dependency
function (§29.3/14-15). I'm struggling to understand why I would ever want to use it.
I found an example in the N2664 proposal but it didn't help much.
It starts by showing code without std::kill_dependency
. Here, the first line carries a dependency into the second, which carries a dependency into the indexing operation, and then carries a dependency into the do_something_with
function.
r1 = x.load(memory_order_consume);
r2 = r1->index;
do_something_with(a[r2]);
There is further example that uses std::kill_dependency
to break the dependency between the second line and the indexing.
r1 = x.load(memory_order_consume);
r2 = r1->index;
do_something_with(a[std::kill_dependency(r2)]);
As far as I can tell, this means that the indexing and the call to do_something_with
are not dependency ordered before the second line. According to N2664:
This allows the compiler to reorder the call to
do_something_with
, for example, by performing speculative optimizations that predict the value ofa[r2]
.
In order to make the call to do_something_with
the value a[r2]
is needed. If, hypothetically, the compiler "knows" that the array is filled with zeros, it can optimize that call to do_something_with(0);
and reorder this call relative to the other two instructions as it pleases. It could produce any of:
// 1
r1 = x.load(memory_order_consume);
r2 = r1->index;
do_something_with(0);
// 2
r1 = x.load(memory_order_consume);
do_something_with(0);
r2 = r1->index;
// 3
do_something_with(0);
r1 = x.load(memory_order_consume);
r2 = r1->index;
Is my understanding correct?
If do_something_with
synchronizes with another thread by some other means, what does this mean with respect to the ordering of the x.load
call and this other thread?
Assuming my understading is correct, there's still one thing that bugs me: when I'm writing code, what reasons would lead me to choose to kill a dependency?
The purpose of memory_order_consume is to ensure the compiler does not do certain unfortunate optimizations that may break lockless algorithms. For example, consider this code:
int t;
volatile int a, b;
t = *x;
a = t;
b = t;
A conforming compiler may transform this into:
a = *x;
b = *x;
Thus, a may not equal b. It may also do:
t2 = *x;
// use t2 somewhere
// later
t = *x;
a = t2;
b = t;
By using load(memory_order_consume)
, we require that uses of the value being loaded not be moved prior to the point of use. In other words,
t = x.load(memory_order_consume);
a = t;
b = t;
assert(a == b); // always true
The standard document considers a case where you may only be interested in ordering certain fields of a structure. The example is:
r1 = x.load(memory_order_consume);
r2 = r1->index;
do_something_with(a[std::kill_dependency(r2)]);
This instructs the compiler that it is allowed to, effectively, do this:
predicted_r2 = x->index; // unordered load
r1 = x; // ordered load
r2 = r1->index;
do_something_with(a[predicted_r2]); // may be faster than waiting for r2's value to be available
Or even this:
predicted_r2 = x->index; // unordered load
predicted_a = a[predicted_r2]; // get the CPU loading it early on
r1 = x; // ordered load
r2 = r1->index; // ordered load
do_something_with(predicted_a);
If the compiler knows that do_something_with
won't change the result of the loads for r1 or r2, then it can even hoist it all the way up:
do_something_with(a[x->index]); // completely unordered
r1 = x; // ordered
r2 = r1->index; // ordered
This allows the compiler a little more freedom in its optimization.
In addition to the other answer, I will point out that Scott Meyers, one of the definitive leaders in the C++ community, bashed memory_order_consume pretty strongly. He basically said that he believed it had no place in the standard. He said there are two cases where memory_order_consume has any effect:
- Exotic architectures designed to support 1024+ core shared memory machines.
- The DEC Alpha
Yes, once again, the DEC Alpha finds its way into infamy by using an optimization not seen in any other chip until many years later on absurdly specialized machines.
The particular optimization is that those processors allow one to dereference a field before actually getting the address of that field (i.e. it can look up x->y BEFORE it even looks up x, using a predicted value of x). It then goes back and determines whether x was the value it expected it to be. On success, it saved time. On failure, it has to go back and get x->y again.
Memory_order_consume tells the compiler/architecture that these operations have to happen in order. However, in the most useful case, one will end up wanting to do (x->y.z), where z doesn't change. memory_order_consume would force the compiler to keep x y and z in order. kill_dependency(x->y).z tells the compiler/architecture that it may resume doing such nefarious reorderings.
99.999% of developers will probably never work on a platform where this feature is required (or has any effect at all).
The usual use case of kill_dependency
arises from the following. Suppose you want to do atomic updates to a nontrivial shared data structure. A typical way to do this is to nonatomically create some new data and to atomically swing a pointer from the data structure to the new data. Once you do this, you are not going to change the new data until you have swung the pointer away from it to something else (and waited for all readers to vacate). This paradigm is widely used, e.g. read-copy-update in the Linux kernel.
Now, suppose the reader reads the pointer, reads the new data, and comes back later and reads the pointer again, finding that the pointer hasn't changed. The hardware can't tell that the pointer hasn't been updated again, so by consume
semantics he can't use a cached copy of the data but has to read it again from memory. (Or to think of it another way, the hardware and compiler can't speculatively move the read of the data up before the read of the pointer.)
This is where kill_dependency
comes to the rescue. By wrapping the pointer in a kill_dependency
, you create a value that will no longer propagate dependency, allowing accesses through the pointer to use the cached copy of the new data.
精彩评论