Does a copy constructor/operator/function need to make clear which copy variant it implements?
Yesterday I asked a question about copying objects in C#, and most answers focussed on the difference between deep copy and shallow copy, and the fact that it should be made clear which of both copy variants a given copy constructor (or operator, or function) implements. I find this odd.
I wrote a lot of software in C++, a language that heavily relies on copying, and I never ever needed multiple copy variants. The only kind of copy operation I ever used is the one I call "deep enough copy". It does the following:
- In case the object has ownership开发者_如何学Go over the member variable (cf. composition), it is copied recursively.
- In case the object has no ownership over the member variable (cf. aggregation), only the link is copied.
Now, my question is threefold:
- 1) Does an object ever need more than one copy variant?
- 2) Does a copy function need to make clear which copy variant it implements?
- 3) As an aside, is there a better term for what I call "deep enough copy"? I asked a related question about the definition of the term "deep copy".
The distinction between of "deep copy" versus "shallow copy" makes sense as an implementation detail, but allow it to leak beyond that generally indicates a flawed abstraction which will likely manifest itself in other ways as well.
If an object Foo
holds an object reference purely for the purpose of encapsulating immutable aspects, other than identity, of the object contained therein, then a correct copy of Foo
may either contain a duplicate of the reference or a reference to a duplicate of the encapsulated object.
If an object Foo
holds an object reference purely for the purpose of encapsulating mutable and immutable aspects of an object other than identity, but no reference to that object will ever be exposed to anything that would mutate it, the same situation applies.
If an object Foo
holds an object reference purely for the purpose of encapsulating mutable and immutable aspects of an object other than identity, and the object in question is going to be mutated, then a correct copy of Foo
must contain a reference to a duplicate of the encapsulated object.
If an object Foo
holds an object reference purely for the purpose of encapsulating immutable aspects of the object including identity, then a correct copy of Foo
must contain a duplicate of the reference; it must NOT contain a reference to a duplicated object.
If an object Foo
holds an object reference for the purpose of encapsulating both mutable state and object identity, then it is not possible to produce a correct copy of Foo
in isolation. A correct copy of Foo
may only be produced by duplicating the entire set of objects to which it is attached.
The only time it makes sense to talk about a "shallow copy" is when an incomplete operation is used as one of the steps in making a correct copy. Otherwise, there is only one correct copy "depth", controlled by the type of state encapsulated in object references.
An object only needs to copy what it needs to copy. Though this question is marked language agnostic, and you mentioned C++, I prefer to explain in C# terms (since, that's what I'm most familiar with). However, the concepts are similar.
Value types are like structs. They live directly in an object instance. Therefore, when you copy the object, you have no choice but to copy the value type. So, you generally don't have to worry about those.
Reference types are like pointers, and this is where it gets tricky. Depending on what the reference type is, you may or may not want a deep copy. A general rule of thumb is that if a reference type (as a member of the object) depends on the state of the outer object, it should be cloned. If not, and never will, it doesn't have to be.
Another way of thinking is that an object passed in to your object from the outside probably should NOT be cloned. An object generated BY your class, should be.
Okay, I lied, I will use some C++ since it will best explain what I mean.
class MyClass {
int foo;
char * bar;
char * baz;
public: MyClass(int f, char * str) {
this->foo = f;
bar = new char[f];
this->baz = str;
}
};
With this object, there are two string buffers that need to be dealt with. The first one, bar
, is created and managed by the class itself. When you clone the object, you should allocate a new buffer.
baz
, on the other hand, should not be. In fact, you can't, since you don't have enough information to do so. The pointer should just be copied.
And, of course, foo
is just a number. Just copy it, there's nothing else to worry about :)
In summary, to answer your questions directly:
- 99% of the time, no. There's only one way to copy that makes sense. What that way is, however, varies.
- Not directly. Documenting it is a good idea, but anything internal should stay internal.
- Just "Deep copy". You should (Edit: ALMOST) never try to clone an object or pointer you don't control, so that's exempt from the rules :)
Most C++ programmers do not use the terms "shallow copy" and "deep copy" for the very good reason that there is only normally one way to copy an object. This is particularly true in C++ because the compiler uses the copy constructor in many situations where the programmer could tell it which copy constructor to use - for example:
void f( std::string s );
there is no way of telling the compiler how to copy the string.
A bit of a late answer, but c++11 has you more or less covered:
The solution is to, as detailed in this answer to Which kind of pointer do I use when? , use different pointertypes to express the kind of (shared) ownership you have.
As a std::unique_ptr
is non-copiable, you will be forced to make a copy of the data owned by the unique pointer. Stating everything in terms of member ownership will probably always make clear what kind of copy to use on which member.
精彩评论