.NET Client - Server application gets frozen, need ideas to fix!
Helloes good people of Stack Overflow,
I have a .NET client-server application running with a few hundred of clients. The project was migrated from VB6 to .NET about a year ago and it's a platform for card/board games. Although I'll be trying to give as much detail as I can below, the problem is getting a channel frozen when there are 40-70 players inside.Architecture
1. Server (.NET 4.0)
- Divided into three projects: ServerNET, Listener, Channel
- Listener acts like a login server where clients connect first. It is responsible for checking stuff like version and account info. Also lets the client choose which channel to connect. It's basicly a TCPListener in a do-while, listening anyone trying to connect forever. It is not the reason why both sides get frozen.
- Channel represents a single port, clients get connected to Channels after they are done with Listener. Much like a space shuttle, this is the main part. Similar to a MIRC channel, it binds all users inside, most of the data is sent to people within the same channel like chat and the games you can join created by other players, hosted by server. This is a console application and serves as a hub for players. Player info is held in "Client" class which includes a TCPClient and some other properties. Each client runs with a thread and makes async calls which are handled by the server. Also these "Client" objects are held in a collection class named "ClientCollection". Channel gets frozen when there are roughly 40-70 players inside. There is a maximum limit of 100 players permitted per channel.
- ServerNET is the body and does all other general stuff related by the whole system, not channel spesific. This is a form-application and runs stuff like server options.
2. Client (.NET 2.0)
- Runs with TCPClient, mostly single thread whereas server is multi-thread.
- Must use .NET 2.0.
- Mostly consisting of visuals and other non-important stuff.
When there are 40+ clients connected to a single channel, it starts to get frozen totally randomly (or that's what we have right now, got no evidance or enough datas to point out what's totally wrong). We really don't think network traffic is the issue (not quite sure yet) since we have tried it on different server machines with various setups. All the server machines we have used are capable of handling that much of process hardware-wise. So it is about the approach and what's going on code side.
The reason why we are struggling to address the issue is we are not exactly sure what could be causing it. Please check out the following example:
System A has 55 people online in their Channel #1 and it doesn't get frozen anyhow. System A uses A1 IP and the channel is on 16xxx port. System B has 25 people online in their Channel #4 and it gets frozen like one or two minutes randomly. System B uses B1 IP and 18xxx for the channel port. It's on the same machine with System A which doesn't get frozen.As a conclusion, it looks irrelevant with the number of online people but it occurs more often when numbers rise.
We tried rolling an Application.DoEvents() in an endless do-loop in Channel project thinking that some X process causing the channel to go frozen state for a few minutes, thus resulting a pause in channel. Then it performs every action which was queued while it was frozen, in a few seconds. CPU usage is averagely between 7%-20% per channel, it looks like it is getting better. However it was no permanant and effective solution.
Things we suspect:
- ClientCollection that holds players and TCPClients is inherited from CollectionBase. Maybe this is causing some chaos during sync'ing. This used to be an array back in the day and we were having less of these problems. Maybe it shouldn't be inherited from CollectionBase, but something else?
- We are using SyncLock (lock in C#) to sync ClientCollection. (although we had this problem before we started using locks)
Server info
Intel Xeon X3460 2.80GHz 16 GB RAM 64-bit Windows Server 2008 EnterpriseI know it is impossible to address the issue without seeing the whole code, but I regret that I'm unable to post the codes. Instead I'm looking for an idea to put me into some direction. However we are happy to share any other info for resolving this problem.
Thanks to eve开发者_JAVA百科ryone helping!
We had a very similar problem on a very similar app (ours pushed out stats to ~1300 users).
My best guess is that on your TCPClient, you have an infinite timeout set. This is, unfortunately, the default behavior. So, when TCPClient blocks on a read, it sometimes gets completely frozen.
Set the timeout to 30 seconds (or something more suitable to your situation).
TcpClient newClient = incoming.AcceptTcpClient();
newClient.NoDelay = true; // Send & receive immediately, even when the buffers aren't full
newClient.ReceiveTimeout = 30000;
newClient.SendTimeout = 30000;
is it possible to take a full proces hang dump when the system is frozen ? then you can see what each thread was doing to better understand why.
- to take a hang dump you need download Windwos Deugging Tools,comes with the .net 4.0 SDK
- and run the AdPlus.vbs with the -Hang flag and process id.
- the in winDbg run the
~*e !clrstack
command to get all of the call stacks
For me, using synchronous sockets is a big no no in server applications. Do not use one thread per connected client. Do not use TcpClient.Read
/TcpClient.Send
.
Read about the BeginRead
/EndRead
+ BeginSend
/EndSend
methods. They scale a lot better than using threads and synchronous methods.
Update
Reading asynchronously doesn't mean that you cannot handle the read command synchronously. The reason to read async is to be able to get a complete command without having to use your own thread for each client.
Do something like this for reading:
- BeginRead
- In OnRead (the BeginRead callback). call EndRead
- 0 bytes = disconnect
- Append received data to the buffer (do not use a string as buffer if your commands are strings, use StringBuilder)
- Check if the buffer contains a complete package.
- Invoke your method/event/delegate that processes complete packages
- Invoke BeginRead
As you see, handling can still be synchronous and you do not have to create a new thread per client. AFAIK, .Net uses IO Completion ports for their socket IO operations which scales really well.
Using BeginSend/EndSend isn't really necessary when starting with sockets, since you usually just fire and forget when sending. It's the read thread per client that really hurts performance.
精彩评论