Are there any tools for tracking down bloat in C++?
A carelessly written template here, some excessive inlining there - it's all too easy to write bloated code in C++. In principle, refactoring to reduce that bloat isn't too hard. The problem is tracing the worst offending templates and inlines - tracing those items that are causing real bloat in real programs.
With that in mind, and because I'm certain that my libraries are a bit more bloat-prone than they should be, I was wondering if there's any tools that can track down those worst offenders automatically - i.e. identify those items that contribute most (including all their repeated instantiations and calls) to the size of a particular target.
I'm not much interested in performance at this point - it's all about the executable file size.
Are there any tools for this job, usable on Windows, and fitting with either MinGW GCC or Visual Studio?
EDIT - some context
I have a set of multiway-tree templates that act as replacements for the red-black tree standard containers. They are written as wrappers around non-typesafe non-template code, but they were also written a long time ago and as an "will better cache friendliness boost real performance" experiment. The point being, they weren't really written for long-term use.
Because they support some handy tricks, though (search based on custom comparisons/partial keys, efficient subscripted access, search for smallest unused key)开发者_StackOverflow they ended up being in use just about everywhere in my code. These days, I hardly ever use std::map.
Layered on top of those, I have some more complex containers, such as two-way maps. On top of those, I have tree and digraph classes. On top of those...
Using map files, I could track down whether non-inline template methods are causing bloat. That's just a matter of finding all the instantiations of a particular method and adding the sizes. But what about unwisely inlined methods? The templates were, after all, meant to be thin wrappers around non-template code, but historically my ability to judge whether something should be inlined or not hasn't been very reliable. The bloat impact of those template inlines isn't so easy to measure.
I have some idea which methods are heavily used, but that's the well-known opimization-without-profiling mistake.
Check out Symbol Sort. I used it a while back to figure out why our installer had grown by a factor of 4 in six months (it turns out the answer was static linking of the C runtime and libxml2).
Map file analysis
I have seen a problem like this some time ago, and I ended up writing a custom tool which analysed map file (Visual Studio linker can be instructed to produce one). The tool output was:
- list of function sorted descending by code size, listing only first N
- list of source files sorted descending by code size, listing only first N
Parsing map file is relatively easy (function code size can be computed as a difference between current and following line), the hardest part is probably handling mangled names in a reasonable way. You might find some ready to use libraries for both of this, I did it a few years ago and I do not know the current situation.
Here is a short excerpt from a map file, so that you know what to expect:
Address Publics by Value Rva+Base Lib:Object 0001:0023cbb4 ?ApplyScheme@Input@@QAEXPBVParamEntry@@@Z 0063dbb4 f mainInput.obj 0001:0023cea1 ?InitKeys@Input@@QAEXXZ 0063dea1 f mainInput.obj 0001:0023cf47 ?LoadKeys@Input@@QAEXABVParamEntry@@@Z 0063df47 f mainInput.obj
Symbol Sort
As posted in Ben Staub's answer, Symbol Sort is a ready to use command line utility (comes with a complete C# source) which does all of this, with the only difference of not analysing map files, but rather pdb/exe files.
So what I'm reading based on your question and your comments is that the library is not actually too large.
The only tool you need to determine this is a command shell, or Windows File explorer. Look at the file size. Is it so big that it causes real actual problems? (Unacceptable download times, won't fit in memory on the target platform, that kind of thing)?
If not, then you should worry about code readability and maintainability and nothing else. And the tool for that is your eyes. Read the code, and take the actions needed to make it more readable if necessary.
If you can point to an actual reason why the executable size is a problem, please edit that into your question, as it is important context.
However, assuming the file size is actually a problem:
Inlined functions are generally not a problem, because the compiler, and no one else, chooses which functions to inline. Simply marking something inline
does not inline the actual generated code. The compiler inlines if it determines the trade-off between larger code and less indirection to be worth it. If a function is called often, it will not be inlined, because that would dramatically affect code size, which would hurt performance.
If you're worried that inlined functions cause code bloat, simply compile with the "optimize for size" flag. Then the compiler will restrict inlining to the cases where it doesn't affect executable size noticeably.
For finding out which symbols are biggest, parse the map file as @Suma suggested.
But really, you said it yourself when you mentioned "the well-known opimization-without-profiling mistake."
The very first act of profiling you need to do is to ask is the executable size actually a problem? In the comments you said that you "have a feeling", which, in a profiling context is useless, and can be translated into "no, the executable size is not a problem".
Profile. Gather data and identify trouble spots. Before worrying about how to bring down the executable size, find out what the executable size is, and identify whether or not that is actually a problem. You haven't done that yet. You read in a book that "code bloat is a problem in C++", and so you assume that code bloat is a problem in your program. but is it? Why? How do you determine that it is?
http://www.sikorskiy.net/prj/amap/index.html
This is wonderful object file in lib/library size analysis GUI tool generated from Visual studio compiler map file . this tool analyses and generates report from map file . you can do filtering also and it dynamically display size . just input the map file to this tool and this tool will list what function are occupying which size the given map fiel generated by dll/exe check the screenshots of it in above file/ you can sort on size also.
Basically, you are looking for costly things that you don't need. Suppose there is some category of functions that you don't need taking some large percent of the space, like 20%. Then if you picked 20 random bytes out of the image size, on the average 4 of them (20 * 20%) will be in that category, and you will be able to see them. So basically, you take those samples, look at them, and if you see an obvious pattern of functions that you don't really need, then remove them. Then do it again because other categories of routines that used less space are now taking a higher percentage.
So I agree with Suma that parsing the map file is a good start. Then I would write a routine to walk through it, and every 5% of the way (space-wise) print the routine I am in. That way I get 20 samples. Often I find that a large chunk of object space results from a very small number (like 1) of lines of source code that I could easily have done another way.
You are also worried about too much inlining making functions larger than they could be. To figure that out, I would take each of those sample, and since it represents a specific address in a specific function, I would trace that back to the line of code it is in. That way, I can tell if it is in an expanded function. This is a bit of work, but doable.
A similar problem is how to find tumors when disks get full. The same idea there is to walk the directory tree, adding up the file sizes, Then you walk it again, and as you pass each 5% point, you print out the path of the file you are in. This tells you not only if you have large files, it tells you if you have large numbers of small files, and it doesn't matter how deeply they are buried or how widely they are scattered. When you clean out one category of files that you don't need, you can do it again to get the next category, and so on.
Good luck.
Your question seems to tend towards run-time rather than compile-time bloat. However, if compile-time bloat (plus binary bloat resulting from inefficient compilation) is relevant, then I have to mention clang tool IWYU. Since IWYU likely will manage to toss quite a number of #include:s in your code areas, this should also manage to reduce binary bloat. At least for my own environment I can certainly confirm a useful reduction in build time.
精彩评论