How much disk space do shared libraries really save in modern Linux distros?
In the static vs shared libraries debates, I've often heard that shared libraries eliminate duplication and reduces overall disk space. But how much disk space do shared libraries really save in modern Linux distros? How much more space would be needed if all programs were compiled using static libraries? Has anyone crunched the numbers for a typical desktop Linux distro such as Ubuntu? Are there any statistics available?
开发者_如何学编程ADDENDUM:
All answers were informative and are appreciated, but they seemed to shoot down my question rather than attempt to answer it. Kaleb was on the right track, but he chose to crunch the numbers for memory space instead of disk space (my question was for disk space).
Because programs only "pay" for the portions of static libraries that they use, it seems practically impossible to quantitatively know what the disk space difference would be for all static vs all shared.
I feel like trashing my question now that I realize it's practically impossible to answer. But I'll leave it here to preserve the informative answers.
So that SO stops nagging me to choose an answer, I'm going to pick the most popular one (even if it sidesteps the question).
I'm not sure where you heard this, but reduced disk space is mostly a red herring as drive space approaches pennies per gigabyte. The real gain with shared libraries comes with security and bugfix updates for those libraries; applications using static libraries have to be individually rebuilt with the new libraries, whereas all apps using shared libraries can be updated at once by replacing only a few files.
Not only do shared libraries save disk space, they also save memory, and that's a lot more important. The prelinking step is important here... you can't share the memory pages between two instances of the same library unless they are loaded at the same address, and prelinking allows that to happen.
Shared libraries do not necessarily save disk space or memory.
When an application links to a static library, only those parts of the library that the application uses will be pulled into the application binary. The library archive (.a) contains object files (.o), and if they are well factored, the application will use less memory by only linking with the object files it uses. Shared libraries will contain the whole library on disk and in memory whether parts of it are used by applications or not.
For desktop and server systems, this is less likely to result in a win overall, but if you are developing embedded applications, it's worth trying static linking all the applications to see if that gives you an overall saving.
I was able to figure out a partial quantitative answer without having to do an obscene amount of work. Here is my (hair-brained) methodology:
1) Use the following command to generate a list of packages with their installed size and list of dependencies:
dpkg-query -Wf '${Package}\t${Installed-Size}\t${Depends}
2) Parse the results and build a map of statistics for each package:
struct PkgStats
{
PkgStats() : kbSize(0), dependantCount(0) {}
int kbSize;
int dependentCount;
};
typedef std::map<std::string, PkgStats> PkgMap;
Where dependentCount
is the number of other packages that directly depend on that package.
Results
Here is the Top 20 list of packages with the most dependants on my system:
Package Installed KB # Deps Dup'd MB
libc6 10096 750 7385
python 624 112 68
libatk1.0-0 200 92 18
perl 18852 48 865
gconf2 248 34 8
debconf 988 23 21
libasound2 1428 19 25
defoma 564 18 9
libart-2.0-2 164 14 2
libavahi-client3 160 14 2
libbz2-1.0 128 12 1
openoffice.org-core 124908 11 1220
gcc-4.4-base 168 10 1
libbonobo2-0 916 10 8
cli-common 336 8 2
coreutils 12928 8 88
erlang-base 6708 8 46
libbluetooth3 200 8 1
dictionaries-common 1016 7 6
where Dup'd MB
is the number of megabytes that would be duplicated if there was no sharing (= installed_size * (dependants_count - 1)
, for dependants_count > 1
).
It's not surprising to see libc6 on top. :) BTW, I have a typical Ubuntu 9.10 setup with a few programming-related packages installed, as well as some GIS tools.
Some statistics:
- Total installed packages: 1717
- Average # of direct dependents: 0.92
- Total duplicated size with no sharing (ignoring indirect dependencies): 10.25GB
- Histogram of # of direct dependents (note logarithmic Y scale):
Note that the above totally ignores indirect dependencies (i.e. everything should be at least be indirectly dependent on libc6). What I really should have done is built a graph of all dependencies and use that as the basis for my statistics. Maybe I'll get around to it sometime and post a lengthy blog article with more details and rigor.
Ok, perhaps not an answer, but the memory savings is what I'd consider. The savings is going to be based on the number of times a library is loaded after the first application, so lets find out how much savings per library are on the system using a quick script:
#!/bin/sh
lastlib=""
let -i cnt=1
let -i size=0
lsof | grep 'lib.*\.so$' | awk '{print $9}' | sort | while read lib ; do
if [ "$lastlib" == "$lib" ] ; then
let -i cnt="$cnt + 1"
else
let -i size="`ls -l $lib | awk '{print $5}'`"
let -i savings="($cnt - 1) * $size"
echo "$lastlib: $savings"
let -i cnt=1
fi
lastlib="$lib"
done
That will give us savings per lib, as such:
...
/usr/lib64/qt4/plugins/crypto/libqca-ossl.so: 0
/usr/lib64/qt4/plugins/imageformats/libqgif.so: 540640
/usr/lib64/qt4/plugins/imageformats/libqico.so: 791200
...
Then, the total savings:
$ ./checker.sh | awk '{total = total + $2}END{print total}'
263160760
So, roughly speaking on my system I'm saving about 250 Megs of memory. Your mileage will vary.
精彩评论