开发者

boost::async_write fails after writing for some time

I am having a very peculiar problem. I have written a server that writes data that it receives from a third party to connected clients. The server writes to the client(s) fine for a while, but after a while, async_write either fails or a write never returns. For my program, if an async_write never returns, then no subsequent writes will take place, and my server will queue up the data it receives from the third party until everything blows up.

I have included my code below:

void ClientPartitionServer::HandleSignal(const CommonSessionMessage& message, int transferSize) {
  boost::lock_guard<boost::mutex开发者_JAVA百科> lock(m_mutex);
  if(m_clientSockets.size() != 0) {
    TransferToQueueBuffer(message.GetData(), transferSize);
  }
  if(m_writeCompleteFlag) {
    // TransferToWriteBuffer();
    for(vector<boost::asio::ip::tcp::socket*>::const_iterator i = m_clientSockets.begin(); i != m_clientSockets.end(); ++i) {
      WriteToClient(*i);
    }
  }
}

void ClientPartitionServer::WriteToClient(boost::asio::ip::tcp::socket* clientSocket) {
  m_writeCompleteFlag = false;
  cout << "Iniating write: " << m_identifier << endl;
  boost::asio::async_write(
    *clientSocket,
    boost::asio::buffer(m_queueBuffer.get(), m_queueBufferSize),
    boost::bind(
      &ClientPartitionServer::HandleWrite, this,
      boost::asio::placeholders::error,
      boost::asio::placeholders::bytes_transferred
  ));
}

void ClientPartitionServer::HandleWrite(const boost::system::error_code& ec, size_t bytes_transferred) {
  boost::lock_guard<boost::mutex> lock(m_mutex);
  if(ec != 0) {
    cerr << "Error writing to client: " << ec.message() << " " << m_identifier << endl;
    // return;
    cout << "HandleWrite Error" << endl;
    exit(0);
  }
  cout << "Write complete: " << m_identifier << endl;
  m_writeCompleteFlag = true;
  m_queueBuffer.reset();
  m_queueBufferSize = 0;
}

Any help would be appreciated.

Thank you.


Without seeing all the code it's hard to say, but it's a red flag to me that you hold the mutex across multiple (or even one) WriteToClient calls. Typically holding a lock of any kind across I/O (even async as you have here) is at best bad for performance and at worst a recipe for weird deadlocks under load. What happens if the async write completes inline and you get called back on HandleWrite in the same thread/callstack, for instance?

I would try to refactor this so that the lock is released during the write calls.

Whatever the solution turns out to be, more general advice:

  • don't lock across I/Os
  • add some diagnostic output - what thread calls each handler, and in what order?
  • try debugging once you hit the quiescent state. Should be able to diagnose a deadlock from the
    process state.


Use strands to serialize access to particular connection objects. In particular, check out strand::wrap(). To see other examples of using strands, check out a a few different timer examples (though the code works for any async_*() call).


First of all, I don't agree with the comments indicating that holding locks across an async operation is a problem.

Holding locks across:

  1. Any function that invokes callbacks is bad.

  2. Any blocking operation is bad.

async_write explicitly guarantees to neither block, nor call the handler, so it looks good to me to hold the lock.

However, I can see a bug in your code that violates another requirement that async_write has. You are not allowed to call async_write until the completion handler has been invoked. That's what you violate.

The m_writeCompleteFlag is set to true whenever one of the handlers have been invoked. This means that you are likely to violate the async_write rules for some of the other N-1 sockets under high load.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜