开发者

gcc/g++: error when compiling large file

I have a auto-generated C++ source file, around 40 MB in size. It largely consists of push_back commands for some vectors and string constants that shall be pushed.

When I try to compile this file, g++ exits and says that it couldn't reserve enough virtual memory (around 3 GB). Googling this problem, I found that using the command line switches

--param ggc-min-expand=0 --param ggc-min-heapsize=4096

may solve the problem. They, however, only seem to work when optimization is turned on.

1) Is this really the solution that I am looking for?

2) Or is there a faster, better (compiling takes ages with these options acitvated) way to do this?

Best wishes,

Alexander

Update: Thanks for all the good ideas. I tried most of them. Using an array instead of several push_back() operations reduced memory usage, but as the file that I was trying to compile was so big, it still crashed, only later. In a way, this behaviour is really interesting, as there is not much to optimize in such a setting -- what does the GCC do behind the scenes that costs so much memory? (I compiled with deactivating all optimizations as well and got the same result开发者_Python百科s)

The solution that I switched to now is reading in the original data from a binary object file that I created from the original file using objcopy. This is what I originally did not want to do, because creating the data structures in a higher-level language (in this case Perl) was more convenient than having to do this in C++.

However, getting this running under Win32 was more complicated than expected. objcopy seems to generate files in the ELF format, and it seems that some of the problems I had disappeared when I manually set the output format to pe-i386. The symbols in the object file are by standard named after the file name, e.g. converting the file inbuilt_training_data.bin would result in these two symbols: binary_inbuilt_training_data_bin_start and binary_inbuilt_training_data_bin_end. I found some tutorials on the web which claim that these symbols should be declared as extern char _binary_inbuilt_training_data_bin_start;, but this does not seem to be right -- only extern char binary_inbuilt_training_data_bin_start; worked for me.


You may be better off using a constant data table instead. For example, instead of doing this:

void f() {
    a.push_back("one");
    a.push_back("two");
    a.push_back("three");
    // ...
}

try doing this:

const char *data[] = {
    "one",
    "two",
    "three",
    // ...
};

void f() {
    for (size_t i = 0; i < sizeof(data)/sizeof(data[0]); i++) {
        a.push_back(data[i]);
    }
}

The compiler will likely be much more efficient generating a large constant data table, rather than huge functions containing many push_back() calls.


Can you do the same problem without generating 40 MB worth of C++? That's more than some operating systems I've used. A loop and some data files, perhaps?


It sounds like your autogenerated app looks like this:

push_back(data00001);
...
push_back(data99999);

Why don't you put the data into an external file and let the program read this data in a loop?


If you're just generating a punch of calls to push_back() in a row, you can refactor it into something like this:

// Old code:
v.push_back("foo");
v.push_back("bar");
v.push_back("baz");

// Change that to this:
{
    static const char *stuff[] = {"foo", "bar", "baz"};
    v.insert(v.end(), stuff, stuff + ARRAYCOUNT(stuff));
}

Where ARRAYCOUNT is a macro defined as follows:

#define ARRAYCOUNT(a) (sizeof(a) / sizeof(a[0]))

The extra level of braces is just to avoid name conflicts if you have many such blocks; alternatively, you can just generate a new unique name for the stuff placeholder.

If that still doesn't work, I suggest breaking your source file up into many smaller source files. That's easy if you have many separate functions; if you have one enormous function, you'll have to work a little harder, but it's still very doable.


To complement some of the answers here, you may be better off generating a binary object file and linking it directly -- as opposed to compiling files consisting of const char[]'s.

I had a similar problem working with gcc lately. (Around 60 MB of PNG data split into some 100 header files.) Including them all is the worst option: The amount of memory needed seems to grow exponentially with the size of the compilation unit.


if you cannot refactor your code, you could try to increment amount of swap space you have, provided your operating system supports large address space. This should work for 64-bit computers, but 3 gigabytes might be too much for 32 bit system.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜