Does any operating system implement buffering for malloc()?
A lot of c/malloc()'s in a for/while/do can consume a lot of time so I am curious if any operating system buffers memory for fast mallocs.
I have been pondering if I could speed up malloc's by writing a "greedy" wrapper for malloc. E.g. when I ask for 1MB of memory the initial allocator would allocate 10MB and on the 2nd, 3rd, 4th etc... call the malloc fun开发者_如何学JAVAction would simply return memory from the chunk first allocated the "normal" way. Of course if there is not enough memory available you would need to allocate a new greedy chunk of memory.
Somehow I think someone must have done this or something similar before. So my question is simply: Is this something that will speed up the memory allocation process significantly. (yes I could have tried it before asking the question but I am just to lazy to write such a thing if there is no need to do it)
All versions of malloc()
do buffering of the type you describe to some extent - they will grab a bigger chunk than the current request and use the big chunk to satisfy multiple requests, but only up to some request size. This means that multiple requests for 16 bytes at a time will only require more memory from the o/s once every 50-100 calls, or something along those general lines.
What is less clear is what the boundary size is. It might well be that they allocate some relatively small multiple of 4 KiB at a time. Bigger requests - MiB size requests - will go back to the system for more memory every time that the request cannot be satisfied from what is in the free list. That threshold is typically considerably smaller than 1 MiB, though.
Some versions of malloc()
allow you to tune their allocation characteristics, to greater or lesser extents. It has been a fertile area of research - lots of different systems. See Knuth 'The Art of Computer Programming' Vol 1 (Fundamental Algorithms) for one set of discussions.
As I was browsing the Google Chrome code some time ago, I found http://www.canonware.com/jemalloc/. It is a free, general-purpose and scalable malloc implementation.
Apparrently, it is being used in numerous projects, as it usually outperforms standard malloc's implementations in many real-world scenarios (many small allocations instead of few big ones).
Definitely worth a look!
That technique is called Slab Allocator, and most operating systems support it, but i can't find information that is it available to userspace malloc, just for kernel allocations.
You can find the paper by Jeff Bonwick here, which describes the original technique on Solaris.
Google has a greedy implementation of malloc() that does roughly what you're thinking of. It has some drawbacks, but it's very fast in many use cases.
What you're saying is probably done, I don't really know. However, I don't know that the latency in buffering your malloc() at the system level would decrease latency much. You've still got to take the time to enter into priv. mode for a system call, potentially lock kernel-level structures (which means more system calls and WAITING for the locks), and things of that nature.
If you can write your own memory manager in user-space for your program and only call malloc() when you need more memory for your pool, you may likely see a decrease in latency.
精彩评论