Server daemon's sockets stop working
I have an application that binds two ports: 6961 and 6963. It is an application of the form client-server-client where one client controls the other.
The application is working great, but after a seemingly random amount of accepted and closed connections, the server refuses to receive or send data thru the sockets. I can make the connection with telnet
, but when I type something, I don't get back a response.
I have had times the server accepted up to 370 connections until it refused working, but last time it only accepted 70 connections.
I don't think it has to do with the closing of the sockets, which I think I do properly. This is my netstat
and lsof
output when I start the application. But I haven't really got a clue how to interpret them. I just found these when googling.
$ sudo /etc/init.d/icontrold restart
Stopping daemon
Starting daemon
$ sudo netstat | grep -E 696[13]
tcp6 0 0 ::ffff:192.168.1.1:6963 ::ffff:192.168.1.:50005 TIME_WAIT
tcp6 0 0 ::ffff:192.168.1.1:6963 ::ffff:192.168.1.:50759 ESTABLISHED
tcp6 0 0 ::ffff:192.168.1.1:6963 ::ffff:192.168.1.:50758 TIME_WAIT
tcp6 0 0 ::ffff:192.168.1.1:6963 ::ffff:192.168.1.:50764 FIN_WAIT2
tcp6 0 0 ::ffff:192.168.1.1:6963 ::ffff:192.168.1.:50761 TIME_WAIT
tcp6 0 0 ::ffff:192.168.1.1:6963 ::ffff:192.168.1.:50763 TIME_WAIT
tcp6 0 0 ::ffff:192.168.1.1:6963 ::ffff:192.168.1.:50762 TIME_WAIT
$ sudo lsof | grep icontrol
icontrold 5765 root cwd DIR 8,1 4096 884738 /home/ief2
icontrold 5765 root rtd DIR 8,1 4096 2 /
icontrold 5765 root txt REG 8,1 212372 5431298 /usr/sbin/icontrold
icontrold 5765 root mem REG 0,0 0 [heap] (stat: No such file or directory)
icontrold 5765 root mem REG 8,1 77808 5425003 /usr/lib/libz.so.1.2.3
icontrold 5765 root mem REG 8,1 9640 671771 /lib/tls/i686/cmov/libdl-2.4.so
icontrold 5765 root mem REG 8,1 1248904 671768 /lib/tls/i686/cmov/libc-2.4.so
icontrold 5765 root mem REG 8,1 40208 671760 /lib/libgcc_s.so.开发者_C百科1
icontrold 5765 root mem REG 8,1 149284 671772 /lib/tls/i686/cmov/libm-2.4.so
icontrold 5765 root mem REG 8,1 888612 5425516 /usr/lib/libstdc++.so.6.0.8
icontrold 5765 root mem REG 8,1 95056 671782 /lib/tls/i686/cmov/libpthread-2.4.so
icontrold 5765 root mem REG 8,1 1268568 5458256 /usr/lib/i686/cmov/libcrypto.so.0.9.8
icontrold 5765 root mem REG 8,1 255648 5458257 /usr/lib/i686/cmov/libssl.so.0.9.8
icontrold 5765 root mem REG 8,1 105112 673124 /lib/ld-2.4.so
icontrold 5765 root 0u IPv6 16962 TCP *:6963 (LISTEN)
icontrold 5765 root 1u IPv6 16965 TCP *:6961 (LISTEN)
icontrold 5765 root 4u IPv6 16968 TCP 192.168.1.10:6963->192.168.1.4:50759 (ESTABLISHED)
$
This is the output of both commands when the server stops accepting:
$ sudo lsof | grep icontrol
icontrold 4645 root cwd DIR 8,1 4096 7913473 /root
icontrold 4645 root rtd DIR 8,1 4096 2 /
icontrold 4645 root txt REG 8,1 212372 5431298 /usr/sbin/icontrold
icontrold 4645 root mem REG 0,0 0 [heap] (stat: No such file or directory)
icontrold 4645 root mem REG 8,1 77808 5425003 /usr/lib/libz.so.1.2.3
icontrold 4645 root mem REG 8,1 9640 671771 /lib/tls/i686/cmov/libdl-2.4.so
icontrold 4645 root mem REG 8,1 1248904 671768 /lib/tls/i686/cmov/libc-2.4.so
icontrold 4645 root mem REG 8,1 40208 671760 /lib/libgcc_s.so.1
icontrold 4645 root mem REG 8,1 149284 671772 /lib/tls/i686/cmov/libm-2.4.so
icontrold 4645 root mem REG 8,1 888612 5425516 /usr/lib/libstdc++.so.6.0.8
icontrold 4645 root mem REG 8,1 95056 671782 /lib/tls/i686/cmov/libpthread-2.4.so
icontrold 4645 root mem REG 8,1 1268568 5458256 /usr/lib/i686/cmov/libcrypto.so.0.9.8
icontrold 4645 root mem REG 8,1 255648 5458257 /usr/lib/i686/cmov/libssl.so.0.9.8
icontrold 4645 root mem REG 8,1 105112 673124 /lib/ld-2.4.so
icontrold 4645 root 0u IPv6 13679 TCP *:6963 (LISTEN)
icontrold 4645 root 2u IPv6 13683 TCP *:6961 (LISTEN)
icontrold 4645 root 3u IPv6 15276 TCP 192.168.1.10:6963->192.168.1.4:50730 (ESTABLISHED)
icontrold 4645 root 4u IPv6 13685 TCP 192.168.1.10:6963->192.168.1.4:50005 (ESTABLISHED)
$ sudo netstat | grep 6963
tcp6 0 0 ::ffff:192.168.1.1:6963 ::ffff:192.168.1.:50005 ESTABLISHED
tcp6 9 0 ::ffff:192.168.1.1:6963 ::ffff:192.168.1.:50730 ESTABLISHED
I haven't got an idea where to start looking for the bug.
Your code clearly has a bug, and you haven't given enough information. So, start breaking down your code and figuring out what is broken. Check what you are passing to your blocking function (select/poll/kqueue/whatever) and make sure it makes sense. If it doesn't, figure out why.
I expect that you will find that you stop waiting for a handle where you should be waiting, but of course you could have a more interesting bug.
精彩评论