C++ custom allocator that utilizes a underlying memory pool
I'm using a memory pool class which reuses allocated memory addresses and a custom allocator which wraps that class. The following code snippet gives you a basic idea of the interface.
template<class alloc>
class memory_pool
: boost::noncopyable,
public allocator_traits<void>
{
public:
memory_pool(typename alloc::size_type alloc_size);
memory_pool(typename alloc::size_type alloc_size, alloc const&);
template<typename U> memory_pool(typename alloc::size_type alloc_size,
typename alloc::rebind<U>::other const&);
virtual ~memory_pool();
pointer allocate (); /*throw(std::bad_alloc)*/
void collect ();
void deallocate(pointer) throw(); /*noexcept*/
};
pointer allocate()
{/*
Checks if a suitable chunk of memory is available in a internal linked list.
If true, then the chunk is returned and the next chunk moves up.
Otherwise, new memory is allocated by the underlying allocator.
*/}
void deallocate(pointer)
{/*
Interprets the passed pointer as a chunk of memory and stores it in a linked list.
Please note that memory isn't actually deallocated.
*/}
void collect()
{/*
Effectively deallocates开发者_开发问答 the cunks in the linked list.
This will be called at least once during destruction.
*/}
Sure, the need for something like this is limitated. However, it's very useful in situations where you need to: - Specifiy a allocator type for a class which uses that allocator in a very naive way (e.g. Avoids allocation of larger pieces even if it would be advisable). - Allocate and deallocate repeatedly the same size of memory. - The type you wish to use the allocator for is of very small size (e.g. built-in types like char, short, int etc.).
Theoretically, an implementation could take advantage of a memory_pool which allocates a multiple of the actual allocation size, each time it needs to do it (from the underlying memory manager). Objects that are close together are more suitable for any cache and / or prefetching algorithm. I've implemented such a memory pool with some overhead to handle the correct allocation, splitting and deallocation (We cannot deallocate each address the user will pass to deallocate. We need to deallocate only that addresses which are the beginning of each memory block we have been previously allocated).
I've tested both cases with the following really simple code:
std::list<int, allocator<int>> list;
std::clock_t t = std::clock();
for (int i = 0; i < 1 << 16; ++i)
{
for (int j = 0; j < 1 << 16; ++j)
list.push_back(j);
list.unique();
for (int j = 0; j < 1 << 16; ++j)
list.pop_back();
}
std::cout << (std::clock() - t) / CLOCKS_PER_SEC << std::endl;
std::list
is calling allocactor::allocate(1, 0)
each time push_back
is called. unique()
makes sure that each element will be touched and compared to the next element.
However, the result was disappointing. The minimal overhead which is needed to manage the blockwise allocating memory pool is greater than any possible advantage the system gets.
Can you think of a scenario where it will improve performance?
EDIT:
Of course, it's much faster than std::allocator
.
C++0x has better support for scoped allocators such as memory pools.
Profile your code., it's very hard to predict what advantages this will confer unless your algorithm performs very regular allocation/deallocation patterns such as LIFO.
It's dead easy to write a very fast allocator when all the allocated objects are the same size. Once I wrote something along the lines of
template <size_t ObjectSize> class allocator {
// ...
};
template <typename T> class allocator : public allocator <sizeof (T)> {
// ...
};
Before you can design an allocator you need to be sure what will be allocated and how. The answers for operator new
are "anything" and "anyhow", which is why it's sometimes sub-optimal. If you can't answer these questions properly, your allocator probably won't be a big improvement.
Can you think of a scenario where it will improve performance?
Something that does a lot (10k+ per sec) of allocations and deallocations would benefit from not having to run into complex queries every time to allocs/frees, however, its only if the combined saving on delaying the allocs/frees into groups is more than it takes to handle a group (basically you need to amortize the group over head with the per-unit saving).
the use of contiguous memory would help any node/pointer based structures like trees (but only to a point). however, the real world benefits can be drastically different (or non-existent!) from what they where planned to be, which is why when walking down the road of creating custom systems like this, you should be profiling your code and already have a picture in mind of how it will be used (ie: there is no point in making a custom pool allocator for something that does so few allocations that the speed gain doesn't matter at all).
Something like this can be handy for debugging however, as you have a nice interface for tagging memory for monitoring leaks and overwrites/invalid writes, so even if it has the same performance as the standard system, you can gain in other ways.
精彩评论