What does `std::kill_dependency` do, and why would I want to use it?

2023-03-30 09:52 问答作者：

I've been reading about the 开发者_如何学Pythonnew C++11 memory model and I've come upon the std::kill_dependency function (§29.3/14-15). I'm struggling to understand why I would ever want to use it.

I found an example in the N2664 proposal but it didn't help much.

It starts by showing code without std::kill_dependency. Here, the first line carries a dependency into the second, which carries a dependency into the indexing operation, and then carries a dependency into the do_something_with function.

r1 = x.load(memory_order_consume);
r2 = r1->index;
do_something_with(a[r2]);

There is further example that uses std::kill_dependency to break the dependency between the second line and the indexing.

r1 = x.load(memory_order_consume);
r2 = r1->index;
do_something_with(a[std::kill_dependency(r2)]);

As far as I can tell, this means that the indexing and the call to do_something_with are not dependency ordered before the second line. According to N2664:

This allows the compiler to reorder the call to do_something_with, for example, by performing speculative optimizations that predict the value of a[r2].

In order to make the call to do_something_with the value a[r2] is needed. If, hypothetically, the compiler "knows" that the array is filled with zeros, it can optimize that call to do_something_with(0); and reorder this call relative to the other two instructions as it pleases. It could produce any of:

// 1
r1 = x.load(memory_order_consume);
r2 = r1->index;
do_something_with(0);
// 2
r1 = x.load(memory_order_consume);
do_something_with(0);
r2 = r1->index;
// 3
do_something_with(0);
r1 = x.load(memory_order_consume);
r2 = r1->index;

Is my understanding correct?

If do_something_with synchronizes with another thread by some other means, what does this mean with respect to the ordering of the x.load call and this other thread?

Assuming my understading is correct, there's still one thing that bugs me: when I'm writing code, what reasons would lead me to choose to kill a dependency?

The purpose of memory_order_consume is to ensure the compiler does not do certain unfortunate optimizations that may break lockless algorithms. For example, consider this code:

int t;
volatile int a, b;

t = *x;
a = t;
b = t;

A conforming compiler may transform this into:

a = *x;
b = *x;

Thus, a may not equal b. It may also do:

t2 = *x;
// use t2 somewhere
// later
t = *x;
a = t2;
b = t;

By using load(memory_order_consume), we require that uses of the value being loaded not be moved prior to the point of use. In other words,

t = x.load(memory_order_consume);
a = t;
b = t;
assert(a == b); // always true

The standard document considers a case where you may only be interested in ordering certain fields of a structure. The example is:

r1 = x.load(memory_order_consume);
r2 = r1->index;
do_something_with(a[std::kill_dependency(r2)]);

This instructs the compiler that it is allowed to, effectively, do this:

predicted_r2 = x->index; // unordered load
r1 = x; // ordered load
r2 = r1->index;
do_something_with(a[predicted_r2]); // may be faster than waiting for r2's value to be available

Or even this:

predicted_r2 = x->index; // unordered load
predicted_a  = a[predicted_r2]; // get the CPU loading it early on
r1 = x; // ordered load
r2 = r1->index; // ordered load
do_something_with(predicted_a);

If the compiler knows that do_something_with won't change the result of the loads for r1 or r2, then it can even hoist it all the way up:

do_something_with(a[x->index]); // completely unordered
r1 = x; // ordered
r2 = r1->index; // ordered

This allows the compiler a little more freedom in its optimization.

In addition to the other answer, I will point out that Scott Meyers, one of the definitive leaders in the C++ community, bashed memory_order_consume pretty strongly. He basically said that he believed it had no place in the standard. He said there are two cases where memory_order_consume has any effect:

Exotic architectures designed to support 1024+ core shared memory machines.
The DEC Alpha

Yes, once again, the DEC Alpha finds its way into infamy by using an optimization not seen in any other chip until many years later on absurdly specialized machines.

The particular optimization is that those processors allow one to dereference a field before actually getting the address of that field (i.e. it can look up x->y BEFORE it even looks up x, using a predicted value of x). It then goes back and determines whether x was the value it expected it to be. On success, it saved time. On failure, it has to go back and get x->y again.

Memory_order_consume tells the compiler/architecture that these operations have to happen in order. However, in the most useful case, one will end up wanting to do (x->y.z), where z doesn't change. memory_order_consume would force the compiler to keep x y and z in order. kill_dependency(x->y).z tells the compiler/architecture that it may resume doing such nefarious reorderings.

99.999% of developers will probably never work on a platform where this feature is required (or has any effect at all).

The usual use case of kill_dependency arises from the following. Suppose you want to do atomic updates to a nontrivial shared data structure. A typical way to do this is to nonatomically create some new data and to atomically swing a pointer from the data structure to the new data. Once you do this, you are not going to change the new data until you have swung the pointer away from it to something else (and waited for all readers to vacate). This paradigm is widely used, e.g. read-copy-update in the Linux kernel.

Now, suppose the reader reads the pointer, reads the new data, and comes back later and reads the pointer again, finding that the pointer hasn't changed. The hardware can't tell that the pointer hasn't been updated again, so by consume semantics he can't use a cached copy of the data but has to read it again from memory. (Or to think of it another way, the hardware and compiler can't speculatively move the read of the data up before the read of the pointer.)

This is where kill_dependency comes to the rescue. By wrapping the pointer in a kill_dependency, you create a value that will no longer propagate dependency, allowing accesses through the pointer to use the cached copy of the new data.

继续阅读：c++11 memory-model multithreading

What does `std::kill_dependency` do, and why would I want to use it?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？