Custom malloc for lots of small, fixed size blocks?

2023-01-13 21:15 问答作者：

I need to allocate, and free lots of fixed size, small (16 byte) blocks of memory, in no fixed order. I could just call malloc and free for each, but that will probably be very inefficient. A better solution probably would be to call malloc and free for larger blocks, and handle the allocation within those blocks itself.

The question is, how best to do this?

It seems that this should not be a very unusual problem or rare problem, and that it should have been "solved", but I can't seem to find anything. Any pointers?

To clarify, I am aware that mem开发者_开发知识库ory pool libraries, and what-not exist, but these also take a size parameter. If the size is constant, then different options for more efficient algorithms are available, are there any implementations of these?

You're right, it's a common problem [Edit: how to do fixed-size allocation, I mean. "malloc is slowing down my application" is less common than you might think].

If your code is too slow and malloc a plausible culprit, then a simple cell allocator (or "memory pool") might improve things. You can almost certainly find one somewhere, or it's easy to write:

Allocate a large block, and put a singly-linked list node at the start of each 16-byte cell. Link them all together. To allocate, take the head off the list and return it. To free, add the cell to the head of the list. Of course if you try to allocate and the list is empty, then you have to allocate a new large block, divide it into cells, and add them all to the free list.

You can avoid that big up-front work if you want. When you allocate a big block, just store a pointer to the end of it. To allocate, move your pointer back 16 bytes through the block and return the new value. Unless it was already at the start of the block[*], of course. If that happens, and the free list is also empty, you need a new large block. Free doesn't change - just add the node to the free list.

You have an option whether to deal out of the block first, and check the free list if that's exhausted, or to check the free list first, and deal off the block if that's empty. I don't know which tends to be faster - the good thing about a last-in-first-out free list is that it's cache-friendly, since you're using memory that was used recently, so I'd probably try that first.

Note that the list node is not needed while the cell is allocated, so there's essentially zero overhead per cell. Quite aside from speed, this is likely to be an advantage over malloc, or other general-purpose allocators.

Do be aware that dropping the whole allocator is pretty much the only way to release memory back to the system, so users who are planning to allocate a lot of cells, use them, and free them all, should create their own allocator, use it, and then destroy it. Both for performance (you don't have to free all the cells) and to prevent the fragmentation-style effect where a whole block must be kept if any of its cells are in use. If you can't do this, your memory use will be the high-water-mark of the time your program has been running. For some programs that's a problem (for instance a long-running program with occasional big spikes in memory use, on a system where memory is constrained). For others it's absolutely fine (for instance if the number of cells in use increases until very near the end of the program, or fluctuates within a range where you really don't care that you're using more memory than you strictly could). For some its actively desirable (if you know how much memory you're going to use, you can allocate it all up-front and not have to worry about failures). For that matter, some implementations of malloc have difficulty releasing memory back from the process to the OS.

[*] Where "start of the the block" probably means "the start of the block, plus the size of some node used to maintain a list of all blocks, so they can all be freed when the cell allocator is destroyed".

Before embarking on the onerous task of re-writing malloc, the standard advice applies. Profile your code, and make sure that this is actually a problem!

The best way to do this is to not assume it will be inefficient. Instead try out the solution with malloc, measure the performance and prove that it is either efficient or not. Then once it's provide to be innefficient (likely won't be) is the only time you should move to a custom allocator. Without the proof you'll never know if your solution is actually faster or not.

for your requirement your custom allocator would be really simple. just calloc a large array memory

calloc(N * 16)

and then you can just hand out array entries. inorder to track which array locations are in use you could use a simple bitmap and then with a few clever bit operations and pointer subtraction your custom malloc/free operations should be pretty easy. if you run out of space you can just realloc some more, but having a suitable fixed default would be a little easier.

though really you should just use malloc first. malloc creates pools of free memory blocks of different sizes, i would bet that there is a pool for 16 byte memory blocks (different implementations may or may not do this but its a pretty common optimization) and since all your allocations are the same size fragmentation should not be an issue. (plus debugging your allocator might be a bit of a nightmare.)

What you're looking for is called a memory pool. There are existing implementations, although it's not hard (and good practice) to make your own.

The simplest implementation for a pool of same-sized data is just a wrapper containing a buffer of n*size, and a stack of n pointers. "malloc" from the pool pops the pointer off the top. "free" to the pool puts the pointer back into the stack.

You could try overriding malloc/free with an alternative implementation that's suited to lots of small allocations.

Due to academic interest I was working on a solution to that problem a few days ago. The implementation is very simple but complete and you mentioned that you are looking for a drop-in replacement, so I think my implementation could work for you.

Basically it works like patros described, except that it automatically requests more memory if there are no free blocks any more. The code was tested with a large linked list (about 6 millon nodes, each 16 bytes in sizes) against a naive malloc()/free() scheme and performed roughly 15% faster than that. So supposedly it is usable for your intention. It is easy to adjust it to different block-sizes, since the block size specified when creating such a large chunk of memory.

The code is available on github: challoc

Example usage:

int main(int argc, char** argv) {
    struct node {
           int data;
       struct node *next, *prev;
    };
    // reserve memory for a large number of nodes
    // at the moment that's three calls to malloc()
    ChunkAllocator*  nodes = chcreate(1024 * 1024, sizeof(struct node));

    // get some nodes from the buffer
    struct node* head = challoc(nodes);
    head->data = 1;
    struct node* cur = NULL;
    int i;
    // this loop will be fast, since no additional
    // calls to malloc are necessary
    for (i = 1; i < 1024 * 1024; i++) {
            cur = challoc(nodes);
        cur->data = i;
        cur = cur->next;
    }

    // the next call to challoc(nodes) will
    // create a new buffer to hold double
    // the amount of `nodes' currently holds

    // do something with a few nodes here

    // put a single node back into the buffer
    chfree(nodes,head);

    // mark the complete buffer as `empty'
    // this also affects any additional
    // buffers that have been created implicitly
    chclear(nodes);

    // give all memory back to the OS
    chdestroy(nodes);

    return 0;
}

Wilson, Johnstone, Neely, and Boles wrote a nice paper surveying all sorts of different allocators.

In my experience, the performance and overhead difference between a good fixed pool allocator and just relying on dlmalloc can be massive in cases where you're making lots and lots of short-lived small allocations in a limited address space (such as a system with no page file). In the app I'm working on at this moment, our main loop jumps from 30ms to >100ms if I replace our block allocator with simple calls to malloc() (and it eventually crashes due to fragmentation).

The following code is pretty ugly, but the purpose is not the beauty but to find out how big is the block allocated by malloc.
I asked for 4 bytes, and malloc requested and received 135160 bytes from OS.

#include <stdio.h>
#include <malloc.h>


int main()
{
  int* mem = (int*) malloc( sizeof(int) ) ;
  if(mem == 0) return 1;
  long i=1L;

  while(i)
    {
      mem[i-1] = i;
      printf("block is %d bytes\n", sizeof(int) * i++);
    }//while

  free(mem);
  return 0 ;
}

$ g++ -o file file.cpp
$ ./file
...
block is 135144 bytes
block is 135148 bytes
block is 135152 bytes
block is 135156 bytes
block is 135160 bytes
Segmentation fault

This malloc is serious business.
realloc doesn't do any system call if the requested size is less than it has available due to internal pooling.
After realloc copied the memory to a larger zone, it does nor destroy the previous block, not return it to system immediately. This can be still accessed (of course totally unsafe). With all these, it doesn't make sense to me, someone would need an additional memory pool.

继续阅读：c malloc

Custom malloc for lots of small, fixed size blocks?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？