SO_KEEPALIVE does not work during a call to write()?
I'm developing a socket application, which must must be to be robust to network failures.
The application has 2 running threads, one waiting messages from the socket (a read() loop) and the other send messages to the socket (a write() loop).
I'm currently trying to use SO_KEEPALIVE to handle the network failures. It works ok if I'm only blocked on read(). A few seconds after the connection is lost (network cable removed), read() will fail with the message 'Connection timed out'.
But, if I try to wrte() after the network is disconnected (and before the timeout ends), both write() and read() will block forever, without error.
This is a stripped sample code which directs stdin/stdout to the socket. It listens on port 565开发者_Go百科6:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
int socket_fd;
void error(const char *msg) {
perror(msg);
exit(1);
}
//Read from stdin and write to socket
void* write_daemon (void* _arg) {
while (1) {
char c;
int ret = scanf("%c", &c);
if (ret <= 0) error("read from stdin");
int ret2 = write(socket_fd, &c, sizeof(c));
if (ret2 <= 0) error("write to socket");
}
return NULL;
}
//Read from socket and write to stdout
void* read_daemon (void* _arg) {
while (1) {
char c;
int ret = read(socket_fd, &c, sizeof(c));
if (ret <= 0) error("read from socket");
int ret2 = printf("%c", c);
if (ret2 <= 0) error("write to stdout");
}
return NULL;
}
//Enable and configure KEEPALIVE - To detect network problems quickly
void config_socket() {
int enable_no_delay = 1;
int enable_keep_alive = 1;
int keepalive_idle =1; //Very short interval. Just for testing
int keepalive_count =1;
int keepalive_interval =1;
int result;
//=> http://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/#setsockopt
result = setsockopt(socket_fd, SOL_SOCKET, SO_KEEPALIVE, &enable_keep_alive, sizeof(int));
if (result < 0)
error("SO_KEEPALIVE");
result = setsockopt(socket_fd, SOL_TCP, TCP_KEEPIDLE, &keepalive_idle, sizeof(int));
if (result < 0)
error("TCP_KEEPIDLE");
result = setsockopt(socket_fd, SOL_TCP, TCP_KEEPINTVL, &keepalive_interval, sizeof(int));
if (result < 0)
error("TCP_KEEPINTVL");
result = setsockopt(socket_fd, SOL_TCP, TCP_KEEPCNT, &keepalive_count, sizeof(int));
if (result < 0)
error("TCP_KEEPCNT");
}
int main(int argc, char *argv[]) {
//Create Server socket, bound to port 5656
int listen_socket_fd;
int tr=1;
struct sockaddr_in serv_addr, cli_addr;
socklen_t clilen = sizeof(cli_addr);
pthread_t write_thread, read_thread;
listen_socket_fd = socket(AF_INET, SOCK_STREAM, 0);
if (listen_socket_fd < 0)
error("socket()");
if (setsockopt(listen_socket_fd,SOL_SOCKET,SO_REUSEADDR,&tr,sizeof(int)) < 0)
error("SO_REUSEADDR");
bzero((char *) &serv_addr, sizeof(serv_addr));
serv_addr.sin_family = AF_INET;
serv_addr.sin_addr.s_addr = INADDR_ANY;
serv_addr.sin_port = htons(5656);
if (bind(listen_socket_fd, (struct sockaddr *) &serv_addr, sizeof(serv_addr)) < 0)
error("bind()");
//Wait for client socket
listen(listen_socket_fd,5);
socket_fd = accept(listen_socket_fd, (struct sockaddr *) &cli_addr, &clilen);
config_socket();
pthread_create(&write_thread, NULL, write_daemon, NULL);
pthread_create(&read_thread , NULL, read_daemon , NULL);
close(listen_socket_fd);
pthread_exit(NULL);
}
To reproduce the error, use telnet 5656. If will exit after a couple os seconds after the connection is lost, unless I try to write something in the terminal. In this case, it will block forever.
So, the questions are: what's wrong? how to fix it? Are there other alternatives?
Thanks!
I've tried using Wireshark to inspect the network connection. If I don't call write(), I can see TCP keep-alive packages being sent and the connection is close after a few seconds.
If, instead, I try to write(), it stops sending the Keep-Alive packets, and starts sending TCP retransmissions instead (It seems OK to me). The problem is, the time between the retransmissions grows bigger and bigger after each failure, and it seems to never give-up and close the socket.
Is there a way to set the maximum number of retransmissions, or anything similar? Thanks
I have found the TCP_USER_TIMEOUT socket option (rfc5482), which closes the connection if the sent data is not ACK'ed after the specified interval.
It works fine for me =)
//defined in include/uapi/linux/tcp.h (since Linux 2.6.37)
#define TCP_USER_TIMEOUT 18
int tcp_timeout =10000; //10 seconds before aborting a write()
result = setsockopt(socket_fd, SOL_TCP, TCP_USER_TIMEOUT, &tcp_timeout, sizeof(int));
if (result < 0)
error("TCP_USER_TIMEOUT");
Yet, I feel I shouldn't have to use both SO_KEEP_ALIVE and TCP_USER_TIMEOUT. Maybe it's bug somewhere?
Not sure if someone else will give you a better alternative, but in several projects I've been involved with, we've run into very similar situations.
For us the solution was to simply take control into your own hands and not rely on underlying OS/drivers to tell you when connection dies. If you control both client and server sides, you could introduce your own ping messages which bounce between the client and the server. This way you can a) control your own connection timeouts and b) easily keep a record indicating the health of the connection.
In the most recent application, we've hid these pings as in-band control messages within the communication library itself so as far as actual client/server application code was concerned, connection timeouts just worked.
TCP Keep Alive is specified in RFC1122. The Keep Alive feature of TCP is not to detect short-term network outages, but instead to clean up TCP Control Blocks/Buffers that might be using up precious resources. That RFC was also written in 1989. The RFC explicitly states that TCP Keep Alives are not to be sent more than once every two hours, and then, it is only necessary if there was no other traffic. If a higher-level protocol needs to detect a loss of connection, it is the higher-level protocol's job to do it itself. The BGP routing protocol, which operates above TCP, sends it's own form of Keep Alive message once every 60 seconds by default. The BGP Spec says a connection is to be considered dead if there has been no new traffic seen in the last 3*keep_alive_interval seconds. OpenSSH implements it's own keep alive in the form of a ping and pong. It will retry sending up to X pings which it expects a response (pong) to within Y time or it kills the connection. TCP itself tries really hard to deliver data in the face of temporary network outages and isn't useful by itself to detect network outage.
Normally, if you want to implement a keep alive and want to avoid blocking, one would switch to non-blocking I/O and maintain a timer for which can be used with select()/poll() calls with a timeout. Another option could be to use a separate timer thread or even a more crude approach of using SIGALARM. I recommend using the O_NONBLOCK with fcntl() to set the socket to non-blocking I/O. You can then use gettimeofday() to record when incoming I/O is received and sleep with select() until either the next Keep Alive is due or I/O happens.
Did you received sucesfully a byte or an ACK from the other side before disconnecting the cable? Maybe this is related to the behaviour described in http://lkml.indiana.edu/hypermail/linux/kernel/0508.2/0757.html :
Your test case is questionable, because you do not receive even one ACK in established state, thus the tp->rcv_tstamp variable has no way to get initialized. The only ACK you receive is the one in response to the connection setup SYN, and we don't initialize tp->rcv_stamp for that ACK.
The keepalive time checks absolutely require that tp->rcv_tstamp has a valid value, and until you process an ACK in ESTABLISHED state it does not.
If you send successfully or receive successfully at least one byte over the connection, and thusly process at least one ACK in ESTABLISHED state, I think you'll find that the keepalives behave properly.
It's an obscure SO_KEEPALIVE behaviour.
In write_daemon()
, you are storing the return value of write()
into the ret2
variable, but then checking for a socket error using the ret
variable instead, so you will never actually catch any write()
errors.
That's because of tcp retransmission acted by tcp stack without your consciousness. Here are solutions.
Even though you already set keepalive option to your application socket, you can't detect in time the dead connection state of the socket, in case of your app keeps writing on the socket. That's because of tcp retransmission by the kernel tcp stack. tcp_retries1 and tcp_retries2 are kernel parameters for configuring tcp retransmission timeout. It's hard to predict precise time of retransmission timeout because it's calculated by RTT mechanism. You can see this computation in rfc793. (3.7. Data Communication)
https://www.rfc-editor.org/rfc/rfc793.txt
Each platforms have kernel configurations for tcp retransmission.
Linux : tcp_retries1, tcp_retries2 : (exist in /proc/sys/net/ipv4)
http://linux.die.net/man/7/tcp
HPUX : tcp_ip_notify_interval, tcp_ip_abort_interval
http://www.hpuxtips.es/?q=node/53
AIX : rto_low, rto_high, rto_length, rto_limit
http://www-903.ibm.com/kr/event/download/200804_324_swma/socket.pdf
You should set lower value for tcp_retries2 (default 15) if you want to early detect dead connection, but it's not precise time as I already said. In addition, currently you can't set those values only for single socket. Those are global kernel parameters. There was some trial to apply tcp retransmission socket option for single socket(http://patchwork.ozlabs.org/patch/55236/), but I don't think it was applied into kernel mainline. I can't find those options definition in system header files.
For reference, you can monitor your keepalive socket option through 'netstat --timers' like below. https://stackoverflow.com/questions/34914278
netstat -c --timer | grep "192.0.0.1:43245 192.0.68.1:49742"
tcp 0 0 192.0.0.1:43245 192.0.68.1:49742 ESTABLISHED keepalive (1.92/0/0)
tcp 0 0 192.0.0.1:43245 192.0.68.1:49742 ESTABLISHED keepalive (0.71/0/0)
tcp 0 0 192.0.0.1:43245 192.0.68.1:49742 ESTABLISHED keepalive (9.46/0/1)
tcp 0 0 192.0.0.1:43245 192.0.68.1:49742 ESTABLISHED keepalive (8.30/0/1)
tcp 0 0 192.0.0.1:43245 192.0.68.1:49742 ESTABLISHED keepalive (7.14/0/1)
tcp 0 0 192.0.0.1:43245 192.0.68.1:49742 ESTABLISHED keepalive (5.98/0/1)
tcp 0 0 192.0.0.1:43245 192.0.68.1:49742 ESTABLISHED keepalive (4.82/0/1)
In addition, when keepalive timeout ocurrs, you can meet different return events depending on platforms you use, so you must not decide dead connection status only by return events. For example, HP returns POLLERR event and AIX returns just POLLIN event when keepalive timeout occurs. You will meet ETIMEDOUT error in recv() call at that time.
In recent kernel version(since 2.6.37), you can use TCP_USER_TIMEOUT option will work well. This option can be used for single socket.
精彩评论