C++ smart pointers: sharing pointers vs. sharing data
In this insightful article, one of the Qt programmers tries to explain the different kinds of smart pointers Qt implements. In the beginning, he makes a distinction between sharing data and sharing the pointers themselves:
First, let’s get one thing straight: there’s a difference between sharing pointers and sharing data. When you share pointers, the value of the pointer and its lifetime is protected by the smart pointer class. In other words, the pointer is the invariant. However, the object that the pointer is pointing to is completely outside its control. We don’t know if the object is copiable or not, if it’s assignable or not.
Now, sharing of data involves the smart pointer class knowing something about the data being shared. In fact, the whole point is that the data is being shared and we don’t care how. The fact that pointers are being used to share the dat开发者_如何学运维a is irrelevant at this point. For example, you don’t really care how Qt tool classes are implicitly shared, do you? What matters to you is that they are shared (thus reducing memory consumption) and that they work as if they weren’t.
Frankly, I just don't undersand this explanation. There was a clarification plea in the article comments, but I didn't find the author's explanation sufficient.
If you do understand this, please explain. What is this distinction, and how are other shared pointer classes (i.e. from boost or the new C++ standards) fit into this taxonomy?
Thanks in advance
In a later comment, he clears up the matter a bit
This is the important point I tried to get through in the first section. When you use QSharedPointer, you’re sharing the ownership of the pointer. The class controls and handles the pointer only — anything else (like access to the data) is outside its scope. When you use QSharedDataPointer, you’re sharing the data. And that class is intended for implicit sharing: so it may split up.
Trying to interpret that:
What's important to see is that "pointer" does not mean the object storing the address in this case, but it means the storage location where the object is located (the address itself). So strictly, i think, you have to say you are sharing the address. boost::shared_ptr
is thus a smart pointer sharing the "pointer". boost::intrusive_ptr
or another intrusive smart pointer seems to share the pointer too, albeit knowing something about the object pointed to (that it has a reference-count member or functions incrementing/decrementing it).
Example: If someone shares a black box with you and he doesn't know what is in the black box, it is similar to sharing the pointer (which represents the box), but not the data (what is inside the box). In fact, you can't even know that what is inside the box is sharable (what if the box contains nothing at all?). The smart pointers are represented by you and the other guy (and you aren't shared, of course), but the address is the box, and it is shared.
Sharing the data means the smart pointer knows sufficiently enough of the data pointed to that it may change the address pointed to (and this needs to copy over data, etc). So, the pointers now may point to different addresses. Since the address is different, the address isn't shared anymore. This is what std::string
does on some implementations, too:
std::string a("foo"), b(a);
// a and b may point to the same storage by now.
std::cout << (void*)a.c_str(), (void*)b.c_str();
// but now, since you could modify data, they will
// be different
std::cout << (void*)&a[0], (void*)&b[0];
Sharing data does not necessarily mean you have a pointer presented to you. You may use a std::string
by pure means of a[0]
and cout << a;
and never touch any of the c_str()
functions. Still sharing may go on behind the scene. The same thing happens with many Qt classes and classes of other widget toolkits too, which is called implicit sharing (or copy on write). So i think one may sum it up like this:
- Sharing the pointer: We always point to the same address when we copy a smart pointer, implying that we share the pointer value.
- Sharing the data: We may point to different addresses at different times. Implying that we know how to copy data from one address to the other.
So trying to categorize
boost::shared_ptr
,boost::intrusive_ptr
: Share the pointer, not the data.QString
,QPen
,QSharedDataPointer
: Share the data it contains.std::unique_ptr
,std::auto_ptr
(and alsoQScopedPointer
): Neither share the pointer, nor the data.
Say we had this class
struct BigArray{
int operator[](size_t i)const{return m_data[i];}
int& operator[](size_t i){return m_data[i];}
private:
int m_data[10000000];
};
And now say we had two instances:
BigArray a;
a[0]=1;//initializaation etc
BigArray b=a;
At this point we want this invariant
assert(a[0]==b[0]);
The default copy ctor ensures this invariant, however at the expense of deep copying the entire object. We may attempt a speedup like this
struct BigArray{
BigArray():m_data(new int[10000000]){}
int operator[](size_t i)const{return (*m_data)[i];}
int& operator[](size_t i){return (*m_data)[i];}
private:
shared_ptr<int> m_data;
};
This will also meet the invariant, without making the deep copy, so all is good so far. Now using this new implementation we did
b[0]=2;
Now we want this to work the same as the deep copy case assert(a[0]!=b[0]); But it fails. To solve this we need a slight change:
struct BigArray{
BigArray():m_data(new int[10000000]){}
int operator[](size_t i)const{return (*m_data)[i];}
int& operator[](size_t i){
if(!m_data.unique()){//"detach"
shared_ptr<int> _tmp(new int[10000000]);
memcpy(_tmp.get(),m_data.get(),10000000);
m_data=_tmp;
}
return (*m_data)[i];
}
private:
shared_ptr<int> m_data;
};
Now we have a class that is shallow copied when only const access is needed, and deep copied when non-const access is needed. This is the idea behind the "shared_data" pointer concept. const calls will not deep copy (they call it "detach"), while non-const will deep copy if it is shared. It also adds some semantics on top of operator== so that it is not just comparing the pointer but the data as well, so that this would work:
BigArray b=a;//shallow copy
assert(a==b);//true
b[0]=a[0]+1;//deep copy
b[0]=a[0];//put it back
assert(a==b);//true
This technique is call COW (Copy on Write) and has been around since the dawn of C++. It is also extremely brittle -- the above example seems to work because it is small and has few use cases. In practice it is rarely worth the trouble, and in fact C++0x has deprecated COW strings. So use with caution.
In the first case, you add a level of indirection to the pointer, such that the object that is represented by the smart pointer wraps the original pointer. There is only one pointer to the object and it's the wrapper's job to keep track of the references to the original pointer. A very simplistic bit of code might look like this:
template<typename T>
struct smart_ptr {
T *ptr_to_object;
int *ptr_to_ref_count;
};
When you copy the struct, your copy/assign code will have to make sure that the reference count is incremented (or decremented if the object gets destroyed) but the pointer to the actual wrapped object will never change and can just be shallow copied. As the struct is pretty small, it's easy and cheap to copy and "all" you have to do is to manipulate the reference count.
In the second case, it reads to me more like an object repository. The 'implicitly shared' part suggests that you might ask the framework for a FooWidget
by doing something like BarFoo.getFooWidget()
and even though it looks like the pointer - smart or not - that you get back is a pointer to a new object, you're actually being handed a pointer to an existing object that's held in some sort of object cache. In that sense it might be more akin to a Singleton-like object that you get by invoking a factory method.
At least that's what the distinction sounds like to me, but I might be so far off the mark that I'd need Google Maps to find my way back.
精彩评论