Critique my concurrent queue [closed]
This is a concurrent queue I wrote which I plan on using in a thread pool I'm writing. I'm wondering if there are any performance improvements I could make. atomic_counter
is pasted below if you're curious!
#ifndef NS_CONCURRENT_QUEUE_HPP_INCLUDED
#define NS_CONCURRENT_QUEUE_HPP_INCLUDED
#include <ns/atomic_counter.hpp>
#include <boost/noncopyable.hpp>
#include <boost/smart_ptr/detail/spinlock.hpp>
#include <cassert>
#include <cstddef>
namespace ns {
template<typename T,
typename mutex_type = boost::detail::spinlock,
typename scoped_lock_type = typename mutex_type::scoped_lock>
class concurrent_queue : boost::noncopyable {
struct node {
node * link;
T const value;
explicit node(T const & source) : link(0), value(source) { }
};
node * m_front;
node * m_back;
atomic_counter m_counter;
mutex_type m_mutex;
public:
// types
typedef T value_type;
// construction
concurrent_queue() : m_front(0), m_mutex() { }
~concurrent_queue() { clear(); }
// capacity
std::size_t size() const { return m_counter; }
bool empty() const { return (m_counter == 0); }
开发者_运维技巧 // modifiers
void push(T const & source);
bool try_pop(T & destination);
void clear();
};
template<typename T, typename mutex_type, typename scoped_lock_type>
void concurrent_queue<T, mutex_type, scoped_lock_type>::push(T const & source) {
node * hold = new node(source);
scoped_lock_type lock(m_mutex);
if (empty())
m_front = hold;
else
m_back->link = hold;
m_back = hold;
++m_counter;
}
template<typename T, typename mutex_type, typename scoped_lock_type>
bool concurrent_queue<T, mutex_type, scoped_lock_type>::try_pop(T & destination) {
node const * hold;
{
scoped_lock_type lock(m_mutex);
if (empty())
return false;
hold = m_front;
if (m_front == m_back)
m_front = m_back = 0;
else
m_front = m_front->link;
--m_counter;
}
destination = hold->value;
delete hold;
return true;
}
template<typename T, typename mutex_type, typename scoped_lock_type>
void concurrent_queue<T, mutex_type, scoped_lock_type>::clear() {
node * hold;
{
scoped_lock_type lock(m_mutex);
hold = m_front;
m_front = 0;
m_back = 0;
m_counter = 0;
}
if (hold == 0)
return;
node * it;
while (hold != 0) {
it = hold;
hold = hold->link;
delete it;
}
}
}
#endif
atomic_counter.hpp
#ifndef NS_ATOMIC_COUNTER_HPP_INCLUDED
#define NS_ATOMIC_COUNTER_HPP_INCLUDED
#include <boost/interprocess/detail/atomic.hpp>
#include <boost/noncopyable.hpp>
namespace ns {
class atomic_counter : boost::noncopyable {
volatile boost::uint32_t m_count;
public:
explicit atomic_counter(boost::uint32_t value = 0) : m_count(value) { }
operator boost::uint32_t() const {
return boost::interprocess::detail::atomic_read32(const_cast<volatile boost::uint32_t *>(&m_count));
}
void operator=(boost::uint32_t value) {
boost::interprocess::detail::atomic_write32(&m_count, value);
}
void operator++() {
boost::interprocess::detail::atomic_inc32(&m_count);
}
void operator--() {
boost::interprocess::detail::atomic_dec32(&m_count);
}
};
}
#endif
I think you will run into performance problems with a linked list in this case because of calling new
for each new node. And this isn't just because calling the dynamic memory allocator is slow. It's because calling it frequently introduces a lot of concurrency overhead because the free store has to be kept consistent in a multi-threaded environment.
I would use a vector that you resize to be larger when it's too small to hold the queue. I would never resize it smaller.
I would arrange the front and back values so the vector is a ring buffer. This will require that you move elements when you resize though. But that should be a fairly rare event and can be mitigated to some extent by giving a suggested vector size at construction.
Alternatively you could keep the linked list structure, but never destroy a node. Just keep adding it to a queue of free nodes. Unfortunately the queue of free nodes is going to require locking to manage properly, and I'm not sure you're really in a better place than if you called delete and new all the time.
You will also get better locality of reference with a vector. But I'm not positive how that will interact with the cache lines having to shuttle back and forth between CPUs.
Some others suggest a ::std::deque
and I don't think that's a bad idea, but I suspect the ring buffer vector is a better idea.
Herb Sutter proposed an implementation of a lock-free queue that would surely outperform yours :)
The main idea is to use a buffer ring, forgoing the dynamic allocation of memory altogether during the run of the queue. This means that the queue can be full (and thus you may have to wait to put an element in), which may not be acceptable in your case.
As Omnifarious noted, it would be better not to use a linked-list (for cache locality), unless you allocate for a pool. I would try using a std::deque
as a backend, it's much more memory friendly and guarantees there won't be any reallocation as long as you only pop and push (at the front and back) which is the case for a queue normally.
精彩评论