Queue Algorithms
I am trying to make a linked list based queue for fast operations, now that's what i have:
template< typename T,
template< typename > class Allocator = default_allocator
>
class fast_queue
{
public:
typedef T element_type;
typedef T* element_ptr;
private:
typedef struct e_node
{
element_type val;
e_node * next;
}node_type;
typedef node_type * node_ptr;
no开发者_运维知识库de_ptr parent;
node_ptr child;
int m_size;
typedef Allocator< node_type > allocator_type;
public:
fast_queue()
: parent( 0 )
, child( 0 )
, m_size( 0 )
{
}
~fast_queue() {
this->clear();
}
void push(element_type& i)
{
node_ptr n = allocator_type::alloc();
n->val = i;
n->next = (node_ptr)0;
if(child) {
child->next = n;
} else {
parent = n;
}
child = n;
m_size++;
}
element_type pop()
{
element_type ret = parent->val;
node_ptr p = parent;
parent = parent->next;
m_size--;
allocator_type::dealloc(p);
return ret;
}
inline int size() const
{
return m_size;
}
inline bool empty() const
{
return (m_size == 0);
}
void clear()
{
while(!empty()) {
pop();
}
child = 0;
}
};
Pretty straightforward, now what i'm having a problem with is the clear() function. It seems to be taking way too much time deallocating all the nodes in the queue(7 seconds). So the question is, what might be a better algorithm? I tried to understand MSVC's implementation of std::deque but the code is so bulky for me to understand.
EDIT: The queue should be generic, allowing arbitrary types of data to be queued. Here is my testing code (Windows)
DWORD time1 = timeGetTime();
fast_queue<int> queue;
for(int i = 0; i < 100000; i++) {
queue.push(i);
}
queue.clear();
cout << "Test time: " << (int)(timeGetTime() - time1) << " milliseconds" << endl;
You're constructing a linked list. The deque implementation stores many, many elements in each allocation. You, however, allocate and deallocate individually for each element. This is why your queue is so slow.
In addition to this, the Standard's queue interface says that you should take a complete Allocator type, not a template, although the reality of fulfilling the Standard's allocator requirements are that it must be a template anyway.
There's not much you can do by changing the push/pop/clear algorithms, because 95% of the time goes to allocation and deallocation of the nodes. But there are some things you could do:
1) Use some kind of memory pool for the nodes. You could either use a pool allocator (boost::pool_alloc is a good one if you don't want to implement your own), or you could use an internal node cache in the queue class. So instead of deleting nodes you just push it to the node cache, and when creating nodes you pop them from the cache.
2) Store multiple items in one node. For example if you have 8 items in one node you only have to alloc/dealloc once every 8 pushes/pops. Of course this requires slightly more complicated code; in addition to having pointers to the head and tail nodes, you would also need index variables for both of them to keep track of how many items are actually in use.
I had to take out all the Allocator stuff to get it to compile (under g++ 4.40), but it runs in no time at all. Even if I push 100 times more elements, it only takes about half a second to populate the queue and half a second to clear it. Have you tried using new and delete?
As suggested by other people, memory allocation/deallocation is the largest performance problem here.
I suggest that you try boost::circular_buffer. If the default size is set high enough, it will only cause one memory allocation over its lifetime.
精彩评论