C++ pimpl idiom wastes an instruction vs. C style?
(Yes, I know that one machine instruction usually doesn't matter. I'm asking this question because I want to understand the pimpl idiom, and use it in the best possible way; and because sometimes I do care about one machine instruction.)
In the sample code below, there are two classes, Thing
and
OtherThing
. Users would include "thing.hh".
Thing
uses the pimpl idiom to hide it's implementation.
OtherThing
uses a C style – non-member functions that return and take
pointers. This style produces slightly better machine code. I'm
wondering: is there a way to use C++ style – ie, make the functions
into member functions – and yet still save the machine instruction. I like this style because it doesn't pollute the namespace outside the class.
Note: I'm only looking at calling member functions (in this case, calc
). I'm not looking at object allocation.
Below are the files, commands, and the machine code, on my Mac.
thing.hh:
class ThingImpl;
class Thing
{
Thing开发者_JAVA技巧Impl *impl;
public:
Thing();
int calc();
};
class OtherThing;
OtherThing *make_other();
int calc(OtherThing *);
thing.cc:
#include "thing.hh"
struct ThingImpl
{
int x;
};
Thing::Thing()
{
impl = new ThingImpl;
impl->x = 5;
}
int Thing::calc()
{
return impl->x + 1;
}
struct OtherThing
{
int x;
};
OtherThing *make_other()
{
OtherThing *t = new OtherThing;
t->x = 5;
}
int calc(OtherThing *t)
{
return t->x + 1;
}
main.cc (just to test the code actually works...)
#include "thing.hh"
#include <cstdio>
int main()
{
Thing *t = new Thing;
printf("calc: %d\n", t->calc());
OtherThing *t2 = make_other();
printf("calc: %d\n", calc(t2));
}
Makefile:
all: main
thing.o : thing.cc thing.hh
g++ -fomit-frame-pointer -O2 -c thing.cc
main.o : main.cc thing.hh
g++ -fomit-frame-pointer -O2 -c main.cc
main: main.o thing.o
g++ -O2 -o $@ $^
clean:
rm *.o
rm main
Run make
and then look at the machine code. On the mac I use otool -tv thing.o | c++filt
. On linux I think it's objdump -d thing.o
. Here is the relevant output:
Thing::calc():
0000000000000000 movq (%rdi),%rax 0000000000000003 movl (%rax),%eax 0000000000000005 incl %eax 0000000000000007 ret calc(OtherThing*): 0000000000000010 movl (%rdi),%eax 0000000000000012 incl %eax 0000000000000014 ret
Notice the extra instruction because of the pointer indirection. The first function looks up two fields (impl, then x), while the second only needs to get x. What can be done?
One instruction is rarely a thing to spend much time worrying over. Firstly, the compiler may cache the pImpl in a more complex use case, thus amortising the cost in a real-world scenario. Secondly, pipelined architectures make it almost impossible to predict the real cost in clock cycles. You'll get a much more realistic idea of the cost if you run these operations in a loop and time the difference.
Not too hard, just use the same technique inside your class. Any halfway decent optimizer will inline the trivial wrapper.
class ThingImpl;
class Thing
{
ThingImpl *impl;
static int calc(ThingImpl*);
public:
Thing();
int calc() { calc(impl); }
};
There's the nasty way, which is to replace the pointer to ThingImpl
with a big-enough array of unsigned chars and then placement/new reinterpret cast/explicitly destruct the ThingImpl
object.
Or you could just pass the Thing
around by value, since it should be no larger than the pointer to the ThingImpl
, though may require a little more than that (reference counting of the ThingImpl
would defeat the optimisation, so you need some way of flagging the 'owning' Thing
, which might require extra space on some architectures).
I disagree about your usage: you are not comparing the 2 same things.
#include "thing.hh"
#include <cstdio>
int main()
{
Thing *t = new Thing; // 1
printf("calc: %d\n", t->calc());
OtherThing *t2 = make_other(); // 2
printf("calc: %d\n", calc(t2));
}
- You have in fact 2 calls to new here, one is explicit and the other is implicit (done by the constructor of
Thing
. - You have 1 new here, implicit (inside 2)
You should allocate Thing
on the stack, though it would not probably change the double dereferencing instruction... but could change its cost (remove a cache miss).
However the main point is that Thing
manages its memory on its own, so you can't forget to delete the actual memory, while you definitely can with the C-style method.
I would argue that automatic memory handling is worth an extra memory instruction, specifically because as it's been said, the dereferenced value will probably be cached if you access it more than once, thus amounting to almost nothing.
Correctness is more important than performance.
Let the compiler worry about it. It knows far more about what is actually faster or slower than we do. Especially on such a minute scale.
Having items in classes has far, far more benefits than just encapsulation. PIMPL's a great idea, if you've forgotten how to use the private keyword.
精彩评论