Access violation with lockless queue in multithreaded app
I wrote a simple lockless queue based off of the principles outlined in the msdn article below, and from the DXUT lock free pipe code also below:
http://msdn.microsoft.com/en-us/library/ee418650(v=vs.85).aspx
http://code.google.com/p/bullet/source/browse/trunk/Demos/DX11ClothDemo/DXUT/Optional/DXUTLockFreePipe.h?r=2127
So, I have a producer/consumer model setup where my main thread feeds rendering instructions, and an rendering thread consumes available messages and issues the corresponding opengl calls. Things work fine if I sleep my main thread each loop/iteration for a sufficient amount of time, but if I don't sleep it long enough (or not at all), I get an access violation exception:
First-chance exception at 0x00b28d9c in Engine.exe: 0xC0000005: Access violation reading location 0x00004104.
Unhandled exception at 0x777715ee in Engine.exe: 0xC0000005: Access violation reading location 0x00004104.
My call stack is:
ntdll.dll!777715ee()
[Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll]
ntdll.dll!777715ee()
ntdll.dll!7776015e()
Engine.exe!RingBuffer<2048>::BeginRead(void * & ppMem=, unsigned long & BytesAvailable=) Line 52 + 0x10 bytes C++
Engine.exe!Thread::ThreadMain(void * lpParam=0x00107d94) Line 41 + 0xf bytes C++
I can't quite figure out what the problem might be. The Code for my lockless queue is below:
template <uint32 BufferSize>
class RingBuffer
{
public:
RingBuffer()
: m_ReadOffset(0)
, m_WriteOffset(0)
{}
~RingBuffer()
{}
bool Empty() const
{
return (m_WriteOffset == m_ReadOffset);
}
void Begi开发者_开发问答nRead(void*& ppMem, uint32& BytesAvailable)
{
const uint32 ReadOffset = m_ReadOffset;
const uint32 WriteOffset = m_WriteOffset;
AppReadWriteBarrier();
const uint32 Slack = (WriteOffset > ReadOffset) ?
(WriteOffset - ReadOffset) :
(ReadOffset > WriteOffset) ?
(c_BufferSize - ReadOffset) :
(0);
ppMem = (m_Buffer + ReadOffset);
BytesAvailable = Slack;
}
void EndRead(const uint32 BytesRead)
{
uint32 ReadOffset = m_ReadOffset;
AppReadWriteBarrier();
ReadOffset += BytesRead;
ReadOffset %= c_BufferSize;
m_ReadOffset = ReadOffset;
}
void BeginWrite(void*& ppMem, uint32& BytesAvailable)
{
const uint32 ReadOffset = m_ReadOffset;
const uint32 WriteOffset = m_WriteOffset;
AppReadWriteBarrier();
const uint32 Slack = (WriteOffset > ReadOffset || WriteOffset == ReadOffset) ?
(c_BufferSize - WriteOffset) :
(ReadOffset - WriteOffset);
ppMem = (m_Buffer + WriteOffset);
BytesAvailable = Slack;
}
void EndWrite(const uint32 BytesWritten)
{
uint32 WriteOffset = m_WriteOffset;
AppReadWriteBarrier();
WriteOffset += BytesWritten;
WriteOffset %= c_BufferSize;
m_WriteOffset = WriteOffset;
}
private:
const static uint32 c_BufferSize = NEXT_POWER_OF_2(BufferSize);
const static uint32 c_SizeMask = c_BufferSize - 1;
private:
byte8 m_Buffer[ c_BufferSize ];
volatile ALIGNMENT(4) uint32 m_ReadOffset;
volatile ALIGNMENT(4) uint32 m_WriteOffset;
};
I'm having difficulty debugging it as the read/write offsets and buffer pointer look fine from the watch window. Unfortunately, when the app breaks, I can't watch autos/local variables from the BeginRead function. If anyone has experience working with lockless programming, any help on this problem or advice in general would be greatly appreaciated.
You might find these articles of some interest...
Lock-Free Code: A False Sense of Security
Writing Lock-Free Code: A Corrected Queue
In the first article Herb Sutter discusses another author's implementation of a lock-free queue and points out some of the things that can go wrong. In the second article Herb shows some corrections to the original implementation.
As a learning exercise, trying to build your own lock-free queue is a pretty good idea. But for production work you'd probably be safer finding a pre-existing implementation from a reliable source and using that. For example, the Concurrency Runtime offers the concurrent_queue class
You haven't any memory fences. Access to volatile variables are only ordered with respect to each other, not to other operations.
In C++0x, you'll be able to use std::atomic<T>
to get the appropriate fences. Until then you'll need OS-specific threading APIs, such as Win32 InterlockedExchange
, or a wrapper library such as boost::thread.
Ok, I see that AppReadWriteBarrier
is supposed to provide a memory fence. How's it implemented?
精彩评论