Socket server stops accepting connections after a period of time
We have an async socket server written in C#. (running on Windows Web Server 2008)
It works flawlessly up until it stop accepting new connections for an unknown reason.
We have about 200 concurrent connections on average, however we keep a count of both connections created and connections dropped. These figures can reach as high as 10,000 or as low as only 1000 before it just stops! It can run for up to around 8 hours sometimes before it stops or it can run for about half hour, at the moment it's running for about an hour before we have another application bring it back up automatically when that can't开发者_StackOverflow中文版 connect (not exactly ideal).
It doesn't appear like we're running out of sockets as we're closing them properly, we're also logging all errors and nothing is happening immediately before it stops.
We can figure this out. Does anyone have any ideas what might be going on?
I can paste code, but it generally just the same old async beginaccept/send code you see everywhere.
Who initiates the active close, the client or the server? If it's the server then you may be accumulating socket's in TIME_WAIT
state on the server and this may prevent you from accepting new connections. This is more likely if the client connections can be short lived and you go through periods when lots of short lived client connections occur.
Oh and if you ARE accumulating socket's in TIME_WAIT
then please don't just assume that changing the machine-wide time wait period length is the best or only solution.
I'm pretty sure OP was running into this fatal combination of issues we ran into:
- A call to
SslStream.AuthenticateAsServer
after accepting a connection was blocking forever, most likely due to the client dropping out after connecting, e.g., the half-open connection issue. This call issues a synchronous read under the covers, hence the possibility of blockage. - .NET was calling the callback passed to
Socket.BeginAccept
synchronously on the same thread that initiated the accept, i.e., your server's listening thread. This is completely unexpected but they do document it, see remarks on BeginAccept.
Combining these issues, you get this series of events:
- Your main listening thread calls
Socket.BeginAccept
. - .NET decides to call your accept callback synchronously on the listening thread.
- Your accept code calls
SslStream.AuthenticateAsServer
(or any other blocking call), and waits for a response that never comes in... bingo, your listening thread is blocked forever!
We fixed this by doing the following:
- Set a
ReceiveTimeout
on the socket you get after accepting a connection. This preventsSslStream.AuthenticateAsServer
, or any other sync read, from blocking forever. Check whether the accept callback completed synchronously, and if so, turn around and manually spawn another thread to run the rest of your accept logic, so the listening thread is never tied up doing any processing. That is, pass a callback to
BeginAccept
that does something like this:private void AcceptCallbackWithSyncCheck(IAsyncResult asyncResult) { if (asyncResult.CompletedSynchronously) { // Force the accept logic to run async, to keep our listening // thread free. Action accept = () => this.ActualAcceptCallback(asyncResult); accept.BeginInvoke(accept.EndInvoke, null); } else { this.ActualAcceptCallback(asyncResult); } }
For the curious, we figured this out by hitting the service with tons of simultaneous calls (using a client simulator), and when the problem happened, we attached to the service process with Visual Studio's remote debugging tool. This allowed us to see right away that the listening thread was blocking, and where. However, this was only after spending a couple weeks banging our heads against the wall, so, I do hope this helps the poor souls that have to deal with this in the future...
Without seeing code, it is almost imposible to wage a guess. But I'll try anyway, one thing that comes to mind is that you might not be maintaining a reference to the listening socket and at some point the GC collects the socket and your listening stops.
Now of course the fact that this sometime runs for hours makes this an almost unlikely reason, it is one that came to minds and thought worth mentioning.
精彩评论