why my C++ output executable is so big?
I have a rather simple C++ project, which uses boost::regex library. The output I'm getting is 3.5Mb in size. As I understand I'm statically开发者_运维知识库 linking all boost .CPP files, including all functions/methods. Maybe it's possible somehow to instruct my linker to use only necessary elements from boost, not all of them? Thanks.
$ c++ —version
i686-apple-darwin10-g++-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5659)
This is what size
says:
$ size a.out
__TEXT __DATA __OBJC others dec hex
1556480 69632 0 4296504912 4298131024 100304650
I tried strip
:
$ ls -al
... 3946688 May 21 13:20 a.out
$ strip a.out
$ ls -al
... 3847248 May 21 13:20 a.out
ps. This is how my code is organized (maybe this is the main cause of the problem):
// file MyClass.h
class MyClass {
void f();
};
#include "MyClassImpl.h"
// file MyClassImpl.h
void MyClass::f() {
// implementation...
}
// file main.cpp
#include "MyClass.h"
int main(int ac, char** av) {
MyClass c;
c.f();
}
What do you think?
Did you compile with debugging symbols enabled? That could account for a large portion of the size. Also how are you determining the size of the binary? Assuming you're on a UNIX-like platform are you using a straight "ls -l
" or the "size
" command. The two could give greatly different results if the binary contains debugging symbols. For example, here are the results I get when building the Boost.Regex "credit_card_example.cpp" example.
$ g++ -g -O3 foo.cpp -lboost_regex-mt
$ ls -l a.out
-rwxr-xr-x 1 void void 483801 2010-05-20 10:36 a.out
$ size a.out
text data bss dec hex filename
73330 492 336 74158 121ae a.out
Similar results occur when just generating the object file:
$ g++ -c -g -O3 foo.cpp
$ ls -l foo.o
-rw-r--r-- 1 void void 622476 2010-05-20 10:40 foo.o
$ size foo.o
text data bss dec hex filename
49119 4 40 49163 c00b foo.o
EDIT: Added some static linking results ...
Here's the binary size when statically linking. It's closer to what you're getting:
$ g++ -static -g -O3 foo.cpp -lboost_regex-mt -lpthread
$ ls -l a.out
-rwxr-xr-x 1 void void 2019905 2010-05-20 11:16 a.out
$ size a.out
text data bss dec hex filename
1204517 5184 41976 1251677 13195d a.out
It's also possible that much of the large size is coming from other libraries the Boost.Regex library depends on. On my Ubuntu box, the dependencies for the Boost.Regex shared library are the following:
$ ldd /usr/lib/libboost_regex-mt.so.1.38.0
linux-gate.so.1 => (0x0053f000)
libicudata.so.40 => /usr/lib/libicudata.so.40 (0xb6a38000)
libicui18n.so.40 => /usr/lib/libicui18n.so.40 (0x009e0000)
libicuuc.so.40 => /usr/lib/libicuuc.so.40 (0x00672000)
librt.so.1 => /lib/tls/i686/cmov/librt.so.1 (0x001e2000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x001eb000)
libm.so.6 => /lib/tls/i686/cmov/libm.so.6 (0x00110000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x009be000)
libpthread.so.0 => /lib/tls/i686/cmov/libpthread.so.0 (0x00153000)
libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0x002dd000)
/lib/ld-linux.so.2 (0x00e56000)
The ICU libraries can get quite large. Besides debugging symbols, perhaps they are the primary contributors to the size of your binary. Furthermore, in the statically linked case, it looks like the Boost.Regex library itself is comprised of large object files:
$ size --totals /usr/lib/libboost_regex-mt.a | sort -n
0 0 0 0 0 regex_debug.o (ex /usr/lib/libboost_regex-mt.a)
0 0 0 0 0 usinstances.o (ex /usr/lib/libboost_regex-mt.a)
0 0 0 0 0 w32_regex_traits.o (ex /usr/lib/libboost_regex-mt.a)
text data bss dec hex filename
435 0 0 435 1b3 regex_raw_buffer.o (ex /usr/lib/libboost_regex-mt.a)
480 0 0 480 1e0 static_mutex.o (ex /usr/lib/libboost_regex-mt.a)
1543 0 36 1579 62b cpp_regex_traits.o (ex /usr/lib/libboost_regex-mt.a)
3171 632 0 3803 edb regex_traits_defaults.o (ex /usr/lib/libboost_regex-mt.a)
5339 8 13 5360 14f0 c_regex_traits.o (ex /usr/lib/libboost_regex-mt.a)
5650 8 16 5674 162a wc_regex_traits.o (ex /usr/lib/libboost_regex-mt.a)
9075 4 32 9111 2397 regex.o (ex /usr/lib/libboost_regex-mt.a)
17052 8 4 17064 42a8 fileiter.o (ex /usr/lib/libboost_regex-mt.a)
61265 0 0 61265 ef51 wide_posix_api.o (ex /usr/lib/libboost_regex-mt.a)
61787 0 0 61787 f15b posix_api.o (ex /usr/lib/libboost_regex-mt.a)
80811 8 0 80819 13bb3 icu.o (ex /usr/lib/libboost_regex-mt.a)
116489 8 112 116609 1c781 instances.o (ex /usr/lib/libboost_regex-mt.a)
117874 8 112 117994 1ccea winstances.o (ex /usr/lib/libboost_regex-mt.a)
131104 0 0 131104 20020 cregex.o (ex /usr/lib/libboost_regex-mt.a)
612075 684 325 613084 95adc (TOTALS)
You could get up to ~600K coming from Boost.Regex alone if some or all of those object files get linked into your binary.
The -O3
flag will not optimize your code for size, but rather for execution speed. So maybe e.g. some loop-unroling will cause a bigger file. Try to compile with some other optimization flag. The -Os
flag will optimize for a small executable.
If you are statically linking then most linkers will only include the objects that are needed.
3.5Mb is not that big - on a PC system so size could depend on OS etc
If you have your link order set correctly (most dependent followed by least dependent) the linker should only grab symbols that your program actually uses. Additionally, a lot (but not all, and I can't speak for regex) boost functionality is header-only due to template use.
More likely is that debugging information/symbol table/etc is taking up space in your binary. Template names (for example iostream and standard containers) are very long and create large entries in the symbol table.
You don't say what OS you're using but if it's a unix variant as a test you can actually strip
a copy of your binary to remove all the extra info and see what's left:
cp a.out a.out.test
strip a.out.test
ls -l a.out*
On one binary I tested it removed about 90% of the file size. Note that if you do this any cores will be pretty useless without a copy of the unstripped binary to debug against - you won't have any symbol names or anything, just assembly and addresses. 3.5 MB is really a tiny file in modern times. Most likely there just is that much debugging/symbol information even from only 10Ksloc of source.
You say you have 3 files. For me, MyClassImpl.h is probably a .cpp since it contains implementation.
Anyway, if you are actually compiling two files including boost::regex, you will end up having two times the size of boost::regex (precisely if you are using the same functionality in both files you will have twice the cost in space).
This due to the fact that most boost functionality are inlined templates.
best,
if you have ldd available, you can use it to check if you are really linking with all boost libraries.
Another possibility is that the size is the side effect of use of headers only libraries, many boost libraries are of the kind and including them can inline more code that you could believe. You can also generate some kind of combinatory explosion due to use of several different template parameters.
To get a better diagnostic you should try to create a really short program using regex and see the size you get. If your program is really short 3.5 Mo is quite large. My current projet executable also use BOOST (but not regex) and is about the same size . But I'm speaking of around 20000 lines of C++. Hence there should be a catch somewhere.
精彩评论