Can creation of composite objects from temporaries be optimised away?

2023-03-02 23:56 问答作者：

I've asked a few questions which have touched around this issue, but I've been getting differing responses, so I thought best to ask it directly.

Lets say we have the following code:

// Silly examples of A and B, don't take so seriously, 
// just keep in mind they're big and not dynamically allocated.
struct A { int x[1000]; A() { for (int i = 0; i != 1000; ++i) { x[i] = i * 2; } };
struct开发者_如何学编程 B { int y[1000]; B() { for (int i = 0; i != 1000; ++i) { y[i] = i * 3; } };

struct C
{
  A a;
  B b;
};

A create_a() { return A(); }
B create_b() { return B(); }

C create_c(A&& a, B&& b)
{
  C c;
  c.a = std::move(a);
  c.b = std::move(b);
  return C; 
};

int main()
{
  C x = create_c(create_a(), create_b());
}

Now ideally create_c(A&&, B&&) should be a no-op. Instead of the calling convention being for A and B to be created and references to them passed on stack, A and B should created and passed in by value in the place of the return value, c. With NRVO, this will mean creating and passing them directly into x, with no further work for the function create_c to do.

This would avoid the need to create copies of A and B.

Is there any way to allow/encourage/force this behavior from a compiler, or do optimizing compilers generally do this anyway? And will this only work when the compiler inline the functions, or will it work across compilation units.

(How I think this could work across compilation units...)

If create_a() and create_b() took a hidden parameter of where to place the return value, they could place the results into x directly, which is then passed by reference to create_c() which needs to do nothing and immediately returns.

There are different ways of optimizing the code that you have, but rvalue references are not one. The problem is that neither A nor B can be moved at no cost, since you cannot steal the contents of the object. Consider the following example:

template <typename T>
class simple_vector {
   typedef T element_type;
   typedef element_type* pointer_type;
   pointer_type first, last, end_storage;
public:
   simple_vector() : first(), last(), end_storage() {}
   simple_vector( simple_vector const & rhs )              // not production ready, memory can leak from here!
      : first( new element_type[ rhs.last - rhs.first ] ),
        last( first + rhs.last-rhs.first ),
        end_storage( last )
   {
       std::copy( rhs.first, rhs.last, first );
   }
   simple_vector( simple_vector && rhs ) // we can move!
      : first( rhs.first ), last( rhs.last ), end_storage( rhs.end_storage )
   {
      rhs.first = rhs.last = rhs.end_storage = 0;
   }
   ~simple_vector() {
      delete [] rhs.first;
   }
   // rest of operations
};

In this example, as the resources are held through pointers, there is a simple way of moving the object (i.e. stealing the contents of the old object into the new one and leaving the old object in a destroyable but useless state. Simply copy the pointers and reset them in the old object to null so that the original object destructor will not free the memory.

The problem with both A and B is that the actual memory is held in the object through an array, and that array cannot be moved to a different memory location for the new C object.

Of course, since you are using stack allocated objects in the code, the old (N)RVO can be used by the compiler, and when you do: C c = { create_a(), create_b() }; the compiler can perform that optimization (basically set the attribute c.a on the address of the returned object from create_a, while when compiling create_a, create the returned temporary directly over that same address, so effectively, c.a, the returned object from create_a and the temporary constructed inside create_a (implicit this to the constructor) are the same object, avoiding two copies. The same can be done with c.b, avoiding the copying cost. If the compiler does inline your code, it will remove create_c and replace it with a construct similar to: C c = {create_a(), create_b()}; so it can potentially optimize all copies away.

Note on the other hand, that this optimization cannot be completely used in the case of a C object allocated dynamically as in C* p = new C; p->a = create_a();, since the destination is not in the stack, the compiler can only optimize the temporary inside create_a and its return value, but it cannot make that coincide with p->a, so a copy will need to be done. This is the advantage of rvalue-references over (N)RVO, but as mentioned before you cannot do use effectively rvalue-references in your code example directly.

There are two kinds of optimization which can apply in your case:

Function Inlining (In the case of A, B, and C (and the A and B it contains))
Copy elision (C (and the A and B it contains) only, because you returned C by value)

For a function this small, it's probably going to be inlined. Most any compiler will do it if it exists in the same translation unit, and good compilers like MSVC++ and G++ (and I think LLVM but I'm not sure on that one) have whole-program-optimization settings which will do it even across translation units. If the function is inlined, then yes, the function call (And the copy that comes with it) aren't going to occur at all.

If for some reason the function doesn't get inlined (i.e. you used __declspec(noinline) on MSVC++), then you're still going to be eligible for the Named Return Value Optimization (NRO), which good C++ compilers (again, MSVC++, G++, and I think LLVM) all implement. Basically, the standard says that the compilers are allowed to not perform the copy on return if they can avoid doing so, and they will usually emit code that avoids it. There are some things you can do to deactivate NRVO, but for the most part it's a pretty safe optimization to rely on.

Finally, profile. If you see a performance problem, then figure out something else. Otherwise I'd write things in the ideomatic way and replace them with more performant constructs if and only if you need to.

Isn't the obvious thing to do to give C a constructor and then say:

C create_c(const A & a, const B & b)
{
  return C( a, b );
}

which has lots of possibilities for optimisation. Or indeed get rid of the create function. I don't think this is a very good motivating example.

继续阅读：c++11 optimization

Can creation of composite objects from temporaries be optimised away?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？