Custom malloc for lots of small, fixed size blocks?
I need to allocate, and free lots of fixed size, small (16 byte) blocks of memory, in no fixed order. I could just call malloc and free for each, but that will probably be very inefficient. A better solution probably would be to call malloc and free for larger blocks, and handle the allocation within those blocks itself.
The question is, how best to do this?
It seems that this should not be a very unusual problem or rare problem, and that it should have been "solved", but I can't seem to find anything. Any pointers?
To clarify, I am aware that mem开发者_开发知识库ory pool libraries, and what-not exist, but these also take a size parameter. If the size is constant, then different options for more efficient algorithms are available, are there any implementations of these?
You're right, it's a common problem [Edit: how to do fixed-size allocation, I mean. "malloc
is slowing down my application" is less common than you might think].
If your code is too slow and malloc
a plausible culprit, then a simple cell allocator (or "memory pool") might improve things. You can almost certainly find one somewhere, or it's easy to write:
Allocate a large block, and put a singly-linked list node at the start of each 16-byte cell. Link them all together. To allocate, take the head off the list and return it. To free, add the cell to the head of the list. Of course if you try to allocate and the list is empty, then you have to allocate a new large block, divide it into cells, and add them all to the free list.
You can avoid that big up-front work if you want. When you allocate a big block, just store a pointer to the end of it. To allocate, move your pointer back 16 bytes through the block and return the new value. Unless it was already at the start of the block[*], of course. If that happens, and the free list is also empty, you need a new large block. Free doesn't change - just add the node to the free list.
You have an option whether to deal out of the block first, and check the free list if that's exhausted, or to check the free list first, and deal off the block if that's empty. I don't know which tends to be faster - the good thing about a last-in-first-out free list is that it's cache-friendly, since you're using memory that was used recently, so I'd probably try that first.
Note that the list node is not needed while the cell is allocated, so there's essentially zero overhead per cell. Quite aside from speed, this is likely to be an advantage over malloc
, or other general-purpose allocators.
Do be aware that dropping the whole allocator is pretty much the only way to release memory back to the system, so users who are planning to allocate a lot of cells, use them, and free them all, should create their own allocator, use it, and then destroy it. Both for performance (you don't have to free all the cells) and to prevent the fragmentation-style effect where a whole block must be kept if any of its cells are in use. If you can't do this, your memory use will be the high-water-mark of the time your program has been running. For some programs that's a problem (for instance a long-running program with occasional big spikes in memory use, on a system where memory is constrained). For others it's absolutely fine (for instance if the number of cells in use increases until very near the end of the program, or fluctuates within a range where you really don't care that you're using more memory than you strictly could). For some its actively desirable (if you know how much memory you're going to use, you can allocate it all up-front and not have to worry about failures). For that matter, some implementations of malloc
have difficulty releasing memory back from the process to the OS.
[*] Where "start of the the block" probably means "the start of the block, plus the size of some node used to maintain a list of all blocks, so they can all be freed when the cell allocator is destroyed".
Before embarking on the onerous task of re-writing malloc
, the standard advice applies. Profile your code, and make sure that this is actually a problem!
The best way to do this is to not assume it will be inefficient. Instead try out the solution with malloc, measure the performance and prove that it is either efficient or not. Then once it's provide to be innefficient (likely won't be) is the only time you should move to a custom allocator. Without the proof you'll never know if your solution is actually faster or not.
for your requirement your custom allocator would be really simple. just calloc a large array memory
calloc(N * 16)
and then you can just hand out array entries. inorder to track which array locations are in use you could use a simple bitmap and then with a few clever bit operations and pointer subtraction your custom malloc/free
operations should be pretty easy. if you run out of space you can just realloc
some more, but having a suitable fixed default would be a little easier.
though really you should just use malloc
first. malloc
creates pools of free memory blocks of different sizes, i would bet that there is a pool for 16 byte memory blocks (different implementations may or may not do this but its a pretty common optimization) and since all your allocations are the same size fragmentation should not be an issue. (plus debugging your allocator might be a bit of a nightmare.)
What you're looking for is called a memory pool. There are existing implementations, although it's not hard (and good practice) to make your own.
The simplest implementation for a pool of same-sized data is just a wrapper containing a buffer of n*size, and a stack of n pointers. "malloc" from the pool pops the pointer off the top. "free" to the pool puts the pointer back into the stack.
You could try overriding malloc/free with an alternative implementation that's suited to lots of small allocations.
Due to academic interest I was working on a solution to that problem a few days ago. The implementation is very simple but complete and you mentioned that you are looking for a drop-in replacement, so I think my implementation could work for you.
Basically it works like patros described, except that it automatically requests more memory if there are no free blocks any more. The code was tested with a large linked list (about 6 millon nodes, each 16 bytes in sizes) against a naive malloc()/free() scheme and performed roughly 15% faster than that. So supposedly it is usable for your intention. It is easy to adjust it to different block-sizes, since the block size specified when creating such a large chunk of memory.
The code is available on github: challoc
Example usage:
int main(int argc, char** argv) {
struct node {
int data;
struct node *next, *prev;
};
// reserve memory for a large number of nodes
// at the moment that's three calls to malloc()
ChunkAllocator* nodes = chcreate(1024 * 1024, sizeof(struct node));
// get some nodes from the buffer
struct node* head = challoc(nodes);
head->data = 1;
struct node* cur = NULL;
int i;
// this loop will be fast, since no additional
// calls to malloc are necessary
for (i = 1; i < 1024 * 1024; i++) {
cur = challoc(nodes);
cur->data = i;
cur = cur->next;
}
// the next call to challoc(nodes) will
// create a new buffer to hold double
// the amount of `nodes' currently holds
// do something with a few nodes here
// put a single node back into the buffer
chfree(nodes,head);
// mark the complete buffer as `empty'
// this also affects any additional
// buffers that have been created implicitly
chclear(nodes);
// give all memory back to the OS
chdestroy(nodes);
return 0;
}
Wilson, Johnstone, Neely, and Boles wrote a nice paper surveying all sorts of different allocators.
In my experience, the performance and overhead difference between a good fixed pool allocator and just relying on dlmalloc can be massive in cases where you're making lots and lots of short-lived small allocations in a limited address space (such as a system with no page file). In the app I'm working on at this moment, our main loop jumps from 30ms to >100ms if I replace our block allocator with simple calls to malloc()
(and it eventually crashes due to fragmentation).
The following code is pretty ugly, but the purpose is not the beauty but to find out how big is the block allocated by malloc.
I asked for 4 bytes, and malloc requested and received 135160 bytes from OS.
#include <stdio.h>
#include <malloc.h>
int main()
{
int* mem = (int*) malloc( sizeof(int) ) ;
if(mem == 0) return 1;
long i=1L;
while(i)
{
mem[i-1] = i;
printf("block is %d bytes\n", sizeof(int) * i++);
}//while
free(mem);
return 0 ;
}
$ g++ -o file file.cpp
$ ./file
...
block is 135144 bytes
block is 135148 bytes
block is 135152 bytes
block is 135156 bytes
block is 135160 bytes
Segmentation fault
This malloc is serious business.
realloc doesn't do any system call if the requested size is less than it has available due to internal pooling.
After realloc copied the memory to a larger zone, it does nor destroy the previous block, not return it to system immediately. This can be still accessed (of course totally unsafe).
With all these, it doesn't make sense to me, someone would need an additional memory pool.
精彩评论