TCP Socket Sending Delays and Retransmission
I have a .NET 3.5 C# application that sends 2000-6000 byte packets to a linux machine running sles 10. The machines are on the same subnet.
About 90% of the time, everything works fine. The linux machine processes my request and responds in 5-15ms. But about 10% of the time, there is an approx 200ms-800ms delay.
Looking at the logs on the linux machine, it seems the delay is on my end. That is, if my call to socket.Send(...) returns at 1:15:00.000 and I get a response at 1:15:00.210, the log on the linux machine says that it received the request at 1:15:00.200 and then processed it in 10ms. (I'm using System.Diagnostics.Stopwatch for timing on my machine.)
To debug, I captured the traffic using wireshark. Here is the traffic. Between No. 8 and 9 is where a 600 ms delay occurs. (137.34.210.108 is my machine and 137.34.210.95 is the linux machine).
"1","11:56:27.380318","137.34.210.95","137.34.210.108","TCP","20700 > 17479 [PSH, ACK] Seq=1 Ack=1 Win=32767 Len=76"
"2","11:56:27.380393","HewlettP_29:37:0f","Broadcast","ARP","Who has 137.34.210.95? Tell 137.34.210.108"
"3","11:56:27.380558","HewlettP_29:39:93","HewlettP_29:37:0f","ARP","137.34.210.95 is at 00:1b:78:29:39:93"
"4","11:56:27.380564","137.34.210.108","137.34.210.95","TCP","17479 > 20700 [ACK] Seq=1 Ack=77 Win=65459 [TCP CHECKSUM INCORRECT] Len=0"
"5","12:04:48.096892","HewlettP_29:37:0f","Broadcast","ARP","Who has 137.34.210.95? Tell 137.34.210.108"
"6","12:04:48.097216","HewlettP_29:39:93","HewlettP_29:37:0f","ARP","137.34.210.95 is at 00:1b:78:29:39开发者_如何学运维:93"
"7","12:04:48.097229","137.34.210.108","137.34.210.95","TCP","17480 > 20600 [PSH, ACK] Seq=1 Ack=1 Win=64198 [TCP CHECKSUM INCORRECT] Len=458"
"8","12:04:48.097457","137.34.210.95","137.34.210.108","TCP","20600 > 17480 [ACK] Seq=1 Ack=4294964377 Win=32767 Len=0 SLE=1 SRE=459"
"9","12:04:49.700966","137.34.210.108","137.34.210.95","TCP","17479 > 20700 [ACK] Seq=1 Ack=77 Win=65459 [TCP CHECKSUM INCORRECT] Len=1460"
"10","12:04:49.701190","137.34.210.108","137.34.210.95","TCP","[TCP Retransmission] 17480 > 20600 [ACK] Seq=4294964377 Ack=1 Win=64198 [TCP CHECKSUM INCORRECT] Len=1460"
"11","12:04:49.703970","137.34.210.95","137.34.210.108","TCP","20600 > 17480 [ACK] Seq=1 Ack=4294965837 Win=32767 Len=0 SLE=1 SRE=459"
"12","12:04:49.703993","137.34.210.108","137.34.210.95","TCP","[TCP Retransmission] 17480 > 20600 [ACK] Seq=4294965837 Ack=1 Win=64198 [TCP CHECKSUM INCORRECT] Len=1460"
"13","12:04:49.704002","137.34.210.108","137.34.210.95","TCP","[TCP Retransmission] 17480 > 20600 [PSH, ACK] Seq=1 Ack=1 Win=64198 [TCP CHECKSUM INCORRECT] Len=458"
"14","12:04:49.704211","137.34.210.95","137.34.210.108","TCP","20600 > 17480 [ACK] Seq=1 Ack=459 Win=32767 Len=0"
"15","12:04:49.704215","137.34.210.95","137.34.210.108","TCP","[TCP Dup ACK 14#1] 20600 > 17480 [ACK] Seq=1 Ack=459 Win=32767 Len=0 SLE=1 SRE=459"
"16","12:04:49.705425","137.34.210.95","137.34.210.108","TCP","20700 > 17479 [PSH, ACK] Seq=77 Ack=1461 Win=32767 Len=44"
Can someone help me to interpret this? I see that a re-transmit is occurring. But I'm not sure why. The switch shows no dropped packets. And even if the packets are being lost, why would it take 600ms to re-transmit?
I thought that this (http://support.microsoft.com/kb/328890) might have something to do with the 200ms delays but I've tried changing the TcpAckFrequency and it didn't help.
Thanks, Mike
Let's start by pruning some of that Wireshark output. We can toss the ARPs in packets 2, 3, 5 and 6. Looking at the rest, you have two sets of traffic in there. Packets 8 and 9 are two different connections, so you can't compare them. 7, 8 and 10, however, are part of one connection so let's examine those.
Packet 7 is 458 bytes of data being sent to the Linux box with a TCP sequence number of 1. However, the ACK that the Linux box returns is 4294964377. This means that Wireshark is showing relative TCP values and that the Linux box is not sending an ACK for packet 7, but for an earlier packet. Your PC then waits for a follow-up ACK and, when it doesn't get one, retransmits the required data. In this case the 458 bytes from packet 7 along with a previous 1002 bytes. That's why the sequence number from packet 10 matches the ACK from packet 8.
Unfortunately this doesn't tell you why data is being dropped. Packet 8 shows the Linux box indicating it still has a full 32k of input buffer available for this connection ("Win=32767").
This only shows the TCP packets on the Linux machine, but I'd recommend to look at the ip stats with the 'netstat -s' command. One reason for the retransmissions might be socket buffer overflows, which will be shown with this command.
I don't recall if Windows has it, but on UNIX you'd enable TCP_NODELAY
.
This disables TCP's Nagle Algorithm which makes the system wait for a small time in case more data is going to be added to the transmit buffer.
int nodelay = 1;
setsockopt(s, IPPROTO_TCP, TCP_NODELAY, &nodelay, sizeof(nodelay));
精彩评论