开发者

How much disk space do shared libraries really save in modern Linux distros?

In the static vs shared libraries debates, I've often heard that shared libraries eliminate duplication and reduces overall disk space. But how much disk space do shared libraries really save in modern Linux distros? How much more space would be needed if all programs were compiled using static libraries? Has anyone crunched the numbers for a typical desktop Linux distro such as Ubuntu? Are there any statistics available?

开发者_如何学编程ADDENDUM:

All answers were informative and are appreciated, but they seemed to shoot down my question rather than attempt to answer it. Kaleb was on the right track, but he chose to crunch the numbers for memory space instead of disk space (my question was for disk space).

Because programs only "pay" for the portions of static libraries that they use, it seems practically impossible to quantitatively know what the disk space difference would be for all static vs all shared.

I feel like trashing my question now that I realize it's practically impossible to answer. But I'll leave it here to preserve the informative answers.

So that SO stops nagging me to choose an answer, I'm going to pick the most popular one (even if it sidesteps the question).


I'm not sure where you heard this, but reduced disk space is mostly a red herring as drive space approaches pennies per gigabyte. The real gain with shared libraries comes with security and bugfix updates for those libraries; applications using static libraries have to be individually rebuilt with the new libraries, whereas all apps using shared libraries can be updated at once by replacing only a few files.


Not only do shared libraries save disk space, they also save memory, and that's a lot more important. The prelinking step is important here... you can't share the memory pages between two instances of the same library unless they are loaded at the same address, and prelinking allows that to happen.


Shared libraries do not necessarily save disk space or memory.

When an application links to a static library, only those parts of the library that the application uses will be pulled into the application binary. The library archive (.a) contains object files (.o), and if they are well factored, the application will use less memory by only linking with the object files it uses. Shared libraries will contain the whole library on disk and in memory whether parts of it are used by applications or not.

For desktop and server systems, this is less likely to result in a win overall, but if you are developing embedded applications, it's worth trying static linking all the applications to see if that gives you an overall saving.


I was able to figure out a partial quantitative answer without having to do an obscene amount of work. Here is my (hair-brained) methodology:

1) Use the following command to generate a list of packages with their installed size and list of dependencies:

dpkg-query -Wf '${Package}\t${Installed-Size}\t${Depends}

2) Parse the results and build a map of statistics for each package:

struct PkgStats
{
    PkgStats() : kbSize(0), dependantCount(0) {}
    int kbSize;
    int dependentCount;
};

typedef std::map<std::string, PkgStats> PkgMap;

Where dependentCount is the number of other packages that directly depend on that package.

Results

Here is the Top 20 list of packages with the most dependants on my system:

Package             Installed KB    # Deps  Dup'd MB
libc6               10096           750     7385
python              624             112     68
libatk1.0-0         200             92      18
perl                18852           48      865
gconf2              248             34      8
debconf             988             23      21
libasound2          1428            19      25
defoma              564             18      9
libart-2.0-2        164             14      2
libavahi-client3    160             14      2
libbz2-1.0          128             12      1
openoffice.org-core 124908          11      1220
gcc-4.4-base        168             10      1
libbonobo2-0        916             10      8
cli-common          336             8       2
coreutils           12928           8       88
erlang-base         6708            8       46
libbluetooth3       200             8       1
dictionaries-common 1016            7       6

where Dup'd MB is the number of megabytes that would be duplicated if there was no sharing (= installed_size * (dependants_count - 1), for dependants_count > 1).

It's not surprising to see libc6 on top. :) BTW, I have a typical Ubuntu 9.10 setup with a few programming-related packages installed, as well as some GIS tools.

Some statistics:

  • Total installed packages: 1717
  • Average # of direct dependents: 0.92
  • Total duplicated size with no sharing (ignoring indirect dependencies): 10.25GB
  • Histogram of # of direct dependents (note logarithmic Y scale):

    How much disk space do shared libraries really save in modern Linux distros?

Note that the above totally ignores indirect dependencies (i.e. everything should be at least be indirectly dependent on libc6). What I really should have done is built a graph of all dependencies and use that as the basis for my statistics. Maybe I'll get around to it sometime and post a lengthy blog article with more details and rigor.


Ok, perhaps not an answer, but the memory savings is what I'd consider. The savings is going to be based on the number of times a library is loaded after the first application, so lets find out how much savings per library are on the system using a quick script:

#!/bin/sh

lastlib=""
let -i cnt=1
let -i size=0
lsof | grep 'lib.*\.so$' | awk '{print $9}' | sort | while read lib ; do
    if [ "$lastlib" == "$lib" ] ; then
        let -i cnt="$cnt + 1"
    else
        let -i size="`ls -l $lib | awk '{print $5}'`"
        let -i savings="($cnt - 1) * $size"
        echo "$lastlib: $savings"
        let -i cnt=1
    fi
    lastlib="$lib"
done

That will give us savings per lib, as such:

...
/usr/lib64/qt4/plugins/crypto/libqca-ossl.so: 0
/usr/lib64/qt4/plugins/imageformats/libqgif.so: 540640
/usr/lib64/qt4/plugins/imageformats/libqico.so: 791200
...

Then, the total savings:

$ ./checker.sh | awk '{total = total + $2}END{print total}'
263160760

So, roughly speaking on my system I'm saving about 250 Megs of memory. Your mileage will vary.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜