boost::async_write fails after writing for some time
I am having a very peculiar problem. I have written a server that writes data that it receives from a third party to connected clients. The server writes to the client(s) fine for a while, but after a while, async_write either fails or a write never returns. For my program, if an async_write never returns, then no subsequent writes will take place, and my server will queue up the data it receives from the third party until everything blows up.
I have included my code below:
void ClientPartitionServer::HandleSignal(const CommonSessionMessage& message, int transferSize) {
boost::lock_guard<boost::mutex开发者_JAVA百科> lock(m_mutex);
if(m_clientSockets.size() != 0) {
TransferToQueueBuffer(message.GetData(), transferSize);
}
if(m_writeCompleteFlag) {
// TransferToWriteBuffer();
for(vector<boost::asio::ip::tcp::socket*>::const_iterator i = m_clientSockets.begin(); i != m_clientSockets.end(); ++i) {
WriteToClient(*i);
}
}
}
void ClientPartitionServer::WriteToClient(boost::asio::ip::tcp::socket* clientSocket) {
m_writeCompleteFlag = false;
cout << "Iniating write: " << m_identifier << endl;
boost::asio::async_write(
*clientSocket,
boost::asio::buffer(m_queueBuffer.get(), m_queueBufferSize),
boost::bind(
&ClientPartitionServer::HandleWrite, this,
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred
));
}
void ClientPartitionServer::HandleWrite(const boost::system::error_code& ec, size_t bytes_transferred) {
boost::lock_guard<boost::mutex> lock(m_mutex);
if(ec != 0) {
cerr << "Error writing to client: " << ec.message() << " " << m_identifier << endl;
// return;
cout << "HandleWrite Error" << endl;
exit(0);
}
cout << "Write complete: " << m_identifier << endl;
m_writeCompleteFlag = true;
m_queueBuffer.reset();
m_queueBufferSize = 0;
}
Any help would be appreciated.
Thank you.
Without seeing all the code it's hard to say, but it's a red flag to me that you hold the mutex across multiple (or even one) WriteToClient
calls. Typically holding a lock of any kind across I/O (even async as you have here) is at best bad for performance and at worst a recipe for weird deadlocks under load. What happens if the async write completes inline and you get called back on HandleWrite
in the same thread/callstack, for instance?
I would try to refactor this so that the lock is released during the write calls.
Whatever the solution turns out to be, more general advice:
- don't lock across I/Os
- add some diagnostic output - what thread calls each handler, and in what order?
- try debugging once you hit the
quiescent state. Should be able
to diagnose a deadlock from the
process state.
Use strands to serialize access to particular connection objects. In particular, check out strand::wrap(). To see other examples of using strands, check out a a few different timer examples (though the code works for any async_*()
call).
First of all, I don't agree with the comments indicating that holding locks across an async operation is a problem.
Holding locks across:
Any function that invokes callbacks is bad.
Any blocking operation is bad.
async_write
explicitly guarantees to neither block, nor call the handler, so it looks good to me to hold the lock.
However, I can see a bug in your code that violates another requirement that async_write
has. You are not allowed to call async_write
until the completion handler has been invoked. That's what you violate.
The m_writeCompleteFlag
is set to true
whenever one of the handlers have been invoked. This means that you are likely to violate the async_write
rules for some of the other N-1 sockets under high load.
精彩评论