开发者

c++: Excessive copying of large objects

While there is quite a few questions about copy constructors/assignment operators on SO already, I did not find an answer that fit my problem.

I have a class like

class Foo
{
   // ...
private:
   std::vector<int> vec1;
   std::vector<int> vec2;
   boost::bimap<unsigned int, unsigned int> bimap;
   // And a couple more
};

Now it seems that there is some quite excessive copying going on (based on profile data).. So my question is how to best tackle this?

Should I implement custom copy constructor/assignment operator and use swap? Or should I define my own swap method and use that (where appropriate) instead of assignment?

As I am not a c++ expert, examples that show how to properly handle this situation are greatly appr开发者_C百科eciated.

UPDATE: It appears I was not terribly clear.. Let me try to explain. The program is basically an on-the-fly breadth-first search program, and for each step taken I need to store metadata about the step (which is the Foo class).. Now the problem is that there is (usually) exponentially steps, so you can imagine a large number of these objects needs to be stored.. I do pass by (const) reference always as far as I know.. Each time I calculate a successor from a node in the graph I need to create and store ONE Foo object (however, some of the data members will be added to this one foo further on in the processing of this successor)..

My profile data shows roughly something like this (I don't have the actual numbers on this machine):

SearchStrategy::Search    13s
FooStore::Save            10s

So you can see I spend nearly as much time saving this meta data as I do searching through the graph.. Oh, and FooStore saves Foo in a google::sparse_hash_map<long long, Foo, boost::hash<long long> >.

Compiler is g++4.4 or g++4.5 (I'm not at my dev. machine, so I cannot check at the moment)..

UPDATE 2 I assign some of the members after construction to a Foo instance like

void SetVec1(const std::vector<int>& vec1) { this->vec1 = vec1; };

I guess tomorrow, I should change this to use the swap method, which should definitely improve this a bit..

I'm sorry if I'm not entirely clear about what semantics I'm trying to achieve, but the reason is that I am not quite sure.

Regards,

Morten


Everything depends on what copying this object means in your case :

  1. it means copying it's whole value
  2. it means the copied object will refer to the same content

If it's 1, then this class seem correct. You're not very clear about the operations that you say does make lot of copies so I'm assuming you try to copy the whole object.

If it's 2, then you need to use something like shared_ptr to share the containers between the objects. Just using shared_ptr instead of real objects as member will implicitely allow the buffers to be refered by both objects (the copy and the copied). That's the easier way (using boost::shared_ptr or std::shared_ptr if you have a C++0x enabled compiler providing it).

There are harder ways but they will certainly become a problem later.


  1. Of course, and everyone says this, don't optimize prematurely. Don't bother with this until and unless you prove a) that your program goes too slowly, and b) it would go faster if you didn't copy so much data.

  2. If your program design requires you to hold multiple simultaneous copies of the data, there is nothing you can do. You just have to bite the bullet and copy the data. No, implementing a custom copy constructor and custom assignment operator won't make it go faster.

  3. If your program doesn't require multiple simultaneous copies of this data, then you do have a couple of tricks to reduce the number of copies you perform.

Instrument your copy methods If it were me, the first thing I would do, even before trying to improve anything, is to count the number of times my copy methods were invoked.

class Foo {
private:
  static int numberOfConstructors;
  static int numberofCopyConstructors;
  static int numberofAssignments;
  Foo() { ++numberOfConstructors; ...; }
  Foo(const Foo& f) : vec1(f.vec1), vec2(f.vec2), bimap(f.bimap) {
    ++numberOfCopyConstructors;
    ...;
  }
  Foo& operator=(const Foo& f) {
    ++numberOfAssignments;
    ...;
  }
};

Run your program with and without your improvements. Print out the value of those static members to see if your changes had any effect.

Avoid assignments in function calls by using references If you pass objects of type Foo to functions, consider if you can do it by reference. If you don't change the passed copy, passing it by const reference is a no-brainer.

// WAS:
extern SomeFuncton(Foo f);
// EASY change -- if this compiles, you know that it is correct
extern SomeFunction(const Foo& f);
// HARD change -- you have to examine your code to see if this is safe
extern SomeFunction(Foo& f);

Avoid copies by using Foo::swap If you use the copy methods (either explicitly or implicitly) a lot, consider whether the assigned-from item could give up its data, rather than copying it.

// Was:
vectorOfFoo.push_back(myFoo);
// maybe faster:
vectorOfFoo.push_back(Foo());
vectorOfFoo.back().swap(myFoo);

// Was:
newFoo = oldFoo;
// maybe faster
newfoo.swap(oldFoo);

Of course, this only works if myFoo and oldFoo no longer need access to their data. And, you have to implement Foo::swap

void Foo::swap(Foo& old) {
    std::swap(this->vec1, old.vec1);
    std::swap(this->vec2, old.vec2);
    ...
}

Whatever you do, measure your program before and after your change. Measure the number of times your copy methods are invoked, and the total time improvement in your program.


Your class doesn't seem that bad, but you do not show how you use it.

If there is lots of copying, then you need to pass objects of those class by reference (or if possible const reference). If that class has to be copied, then you can not do anything.


If it's really a problem, you might consider implementing the pimpl idiom. But I doubt it's a problem, though I'd have to see your use of the class to be sure.


Copying of huge vectors unlikely can be cheap. The most promising way is to copy rarer. While it's quite easy (may be too easy) in C++ to invoke copy without intention, there are ways to avoid needless copying:

  • passing by const and non-const reference
  • move-constructors
  • smart pointers with ownership transfer

These techniques may leave only copies which are required by algorithm.

Sometimes it's possible to avoid even some of those copying. For example, if you need two objects where the second one is reversed copy of the first one, a wrapper object may be created which acts like reversed, but instead of storing entire copy has only a reference.


The obvious way to reduce copying is to use something like a shared_ptr. With multithreading, however, this cure can be worse than the disease -- incrementing and decrementing reference counts needs to be done atomically, which can be quite expensive. If, however, you typically end up modifying the copies and need each copy to act unique (i.e., modifying a copy doesn't affect the original) you can end up with worse performance still, paying for the atomic increment/decrement for reference counting, and still doing lots of copies anyway.

There are a couple of obvious ways to avoid that. One is to move unique objects instead of copying at all -- this is great if you can make it work. Another is to use non-atomic reference counting most of the time, and do deep copies only when moving data between threads.

There is no one answer that'a universal and really clean though.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜