Tracking down MySQL connection leaks
I have an application server (jetty 6 on a linux box) hosting 15 individuals applications (individual war's). Every 3 or 4 days I get an alert from nagios regarding the number of open TCP connections. Upon inspection, I see that the vast majority of these connections are to the MySQL server.
netstat -ntu | grep TIME_WAIT
Shows 10,000+ connections on the MySQL server from the application server (notice the state is TIME_WAIT). If I restart jetty the connections drop to almost zero.
Some interesting values from a show status:
mysql> show status;
+--------------------------+-----------+
| Variable_name | Value |
+--------------------------+-----------+
| Aborted_clients | 244 |
| Aborted_connects | 695853860 |
| Conn开发者_JAVA百科ections | 697203154 |
| Max_used_connections | 77 |
+--------------------------+-----------+
A "show processlist" doesn't show anything out of the ordinary (which is what I would expect since most of the connections are idle - remember the TIME_WAIT state from above).
I have a TEST env for this server but it never has any issues. It obviously doesn't get much traffic and the application server is constantly getting restarted so debugging there isn't much help. I guess I could dig into each individual app and write a load test which would hit the database code, but this would take a lot of time / hassle.
Any ideas how I could track down the application that is grabbing all these connections and never letting go?
The answer seems to be adding the following entries in my.cnf under [mysqld] :
wait_timeout=60
interactive_timeout=60
I found it here (all the way at the bottom): http://community.livejournal.com/mysql/82879.html
The default wait time to kill a stale connection is 22800 seconds. To verify:
mysql> show variables like 'wait_%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wait_timeout | 60 |
+---------------+-------+
EDIT: I forgot to mention, I also added the following to my /etc/sysctl.conf:
net.ipv4.tcp_fin_timeout = 15
This is supposed to help lower the threshold the OS waits before reusing connection resources.
EDIT 2: /etc/init.d/mysql reload won't really reload your my.cnf (see the link below)
Possibly the connection pool(s) are misconfigured to hold on to too many connections and they're holding on to too many idle processes.
Aside from that, all I can think of is that some piece of code is holding onto a result set, but that seems less likely. To catch if it's a slow query that's timing out you can also set MySQL to write to a slow query log in the conf file, and it'll then write all queries that are taking longer than X seconds, default is 10 seconds.
Well, one thing that comes to mind (although I'm not an expert on this) is to increase the logging on mySQL and hunt down all the connect/close messages. If that doesn't work, you can write a tiny proxy to sit in between the actual mySQL server and your suite of applications which does the extra logging and you'll know who is connecting/leaving.
SHOW PROCESSLIST shows the user, host and database for each thread. Unless all of your 15 apps are using the same combination, then you should be able to differentiate using this information.
I had the same problem with +30,000 TIME_WAIT on my client server. Fixed the problem by adding, in /etc/sysctl.conf
:
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_fin_timeout = 30
Then :
/sbin/sysctl -p
After 2 or 3 minutes, TIME_WAIT connections went from 30 000 to 7 000.
/proc/sys/net/ipv4/tcp_fin_timeout was 60 in RHEL7.tcp_tw_reuse and tcp_tw_recycle was changed to 1 and the performance improved.
精彩评论