Overload new operator to store objects in mmap'd file
I have a Linux C++ program with fairly large memory requirements. Most of the memory is consumed by just a few classes, and is accessed reasonably infrequent. I want to move these classes from main memory to disk-based storage, while changing as little existing code as possible.
The idea was to override the new
operator for these objects and have them allocated into an mmap()
'd memory region. This way my code modifications stay very limited, the rest of the program can happily access these objects without knowing that anything changed, and the kernel will make sure the objects I need are in memory while the others are on disk. I know this is very similar as to how the swap works, but the swap partitio开发者_StackOverflow中文版n is usually too small for what my program needs.
Some questions I have:
- Is this a very bad idea? Do you know something better to achieve the same?
- Would I need to allocate the maximum file size beforehand, and will I require all of this space to be allocated on disk? If so, would mapping to a sparse file help?
- I don't want to write my own heap allocator. Can I use an existing one?
- When my program finishes, the mmap'd file will be deleted. This means I don't want any pages to be written to disk unless the kernel will actually remove them from memory. Is there something like a lazy flag to mmap to achieve this, or is this automatic?
Looking at each question in turn
- Is this a very bad idea? Do you know something better to achieve the same?
Its not really clear what you hope to achieve by this. Linux already backs memory used by swap space (so if your data exceeds physical memory, some will be swapped to disk). Are you having problems with running out of address space, or running slowly due to excessive paging? Using an mmap backed store won't really affect either.
- Would I need to allocate the maximum file size beforehand, and will I require all of this space to be allocated on disk? If so, would mapping to a sparse file help?
Yes, you need the file to be as big as the space you are mmaping. You can however start with a small file/mmap and grow the file (and mmap additional pages) later as needed. You can also use a sparse file, so that disk space isn't used until the pages are written to.
- I don't want to write my own heap allocator. Can I use an existing one?
There are heap managers that use mmap-backed storage. I've seen versions of the Doug Lea malloc, and various other bibop allocators that do so.
- When my program finishes, the mmap'd file will be deleted. This means I don't want any pages to be written to disk unless the kernel will actually remove them from memory. Is there something like a lazy flag to mmap to achieve this, or is this automatic?
In this case, you could just use MAP_ANON and not have a file at all. However, this gets back to the first question, as this is essentially duplicating what the system malloc (and new) does. In fact on some OSes (Solaris?) that's exactly what the system malloc does. The main reason I've seen custom mmap-based mallocs in the past is for persistent storage (so the file would remain after the process exits and would be remapped on restart).
I can think of a few problems with the approach you would like to take, so this isn't an answer yet.
- When you do "swap" something out, i.e. the problem you are facing is that it's consuming too much memory to keep the objects around, so when do you remove them (effectively unmap)? i.e. make the same decision that the memory manager of you OS makes?
- Though you may be able to store the binary representation of the class in a mmaped block, if the class is not a POD, then the process of "swapping" will not do what you expect (for example, if there are members which are heap allocated - what happens to them?)
- mmap'd memory will still count against your process, as such, your problems will not go away...
I think your best bet here is to look at your design and consider when these classes are needed and for how long. And construct, use and discard when not needed - are they expensive to construct? May be they would be cheaper to serialize into some local file and reconstruct (when I say serialize, I mean not simply mem copy!)
The best option is likely to be to specify that your program requires a minimum amount of swap to be configured, rather than trying to simulate more swap using mmap()
. In particular, your last point can't really be overcome - dirty pages in file-backed mappings are generally preferentially written out.
精彩评论