排查Failed to validate connection com.mysql.cj.jdbc.ConnectionImpl
目录
- 问题描述及排查
- 彩蛋:Hikari中max-lifetime参数作用及实现
- 总结
整个问题排查过程中基本把Hikari连接池源码看完了,也发现了Hikari内部的一些实现细节,更多细节参见:Hikari 数据库连接池内部源码实现细节
问题描述及排查
最近发布项目测试环境后,发现后台经常隔28分钟时间就打印mysql 连接已经被关闭警告。(应用程序每28分钟访问一次数据库)
- 数据库: mysql 5.7
- 数据库驱动:mysql-connector-Java-8.0.23
- 连接池:HikariCP-3.4.5
- SpringBoot 2.4.3
连接池配置如下:
spring.datasource.username=${name} spring.datasource.password=${pwd} spring.datasource.url=jdbc:mysql://${ip}:3306/${dbname}?useUnicode=true&characterEncoding=utf-8&useSSL=false&serverTimezone=Asia/Shanghai spring.datasource.type=com.zaxxer.hikari.HikariDataSource spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver spring.datasource.hikari.minimum-idle=10 spring.datasource.hikari.maximum-pool-size=20 spring.datasource.hikari.connection-test-query=SELECT 1 FROM DUAL spring.datasource.hikari.validation-timeout=10000 spring.datasource.hikari.auto-commit=true
警告信息如下:
2021-03-19 13:25:58.486 WARN 21806 --- [scheduling-1] com.zaxxer.hikari.pool.PoolBase :
slave - Failed to validate connection com.mysql.cj.jdbc.ConnectionImpl@2e2ce118 (No operations allowed after connection closed.). Possibly consider using a shorter maxLifetime value.
虽然这只是一个警告,并不影响程序执行,但还是很好奇想要排查一下具体原因。
warn信息的描述,程序通过已经关闭的连接来访问数据数据库,这是不允许的操作,并且警告中提醒减小maxLifetime
的值来解决这个问题。
看到这个警告之后,笔者首先想到的就是mysql 8小时主动断开问题:如果hikari的maxLifetime
的值超过了mysql默认的wait_timeout
设置的时间28800s (8小时),实际上mysql已经主动关闭了连接。
但示例中并没有配置hikari的maxLifetime
选项,采用的是默认。
查看hik编程客栈ari的源码发现了如下几个默认配置
com.zaxxer.hikari.HikariConfig
源码中的类变量
private static final long CONNECTION_TIMEOUT = SECONDS.toMillis(30); 30s private static final long VALIDATION_TIMEOUT = SECONDS.toMillis(5); 5s private static final long IDLE_TIMEOUT = MINUTES.toMillis(10); 20分钟 private static final long MAX_LIFETIME = MINUTES.toMillis(30); 30分钟 private static final int DEFAULT_POOL_SIZE = 10;
hikari中默认连接存活的最大时间是30分钟,小于mysql默认的wait_timeout
28800s (8小时),按道理不会是服务端主动主动关闭连接导致。
那会不会是mysql的wait_timeout
被修改过??? 通过如下语句查询发现并没有被修改。
SHOW VARIABLES LIKE '%timeout%'
当排除这个常规问题后,笔者尝试搜索了有没有其它可能导致这个警告,最终还是有一个小发现:
搜索到的案例是:虽然mysql服务端通过SHOW VARIABLES LIKE '%timeout%'
查出来的时间28800s,但是实际上,/etc/my.cnf
中配置的超时时间更小,导致mysql服务端主动关闭连接。
但是经过和dba确认,也排除了这个情况。
既然以上两种情况都排除了,那只能深入源码,在测试环境远程断点调试。
先搜索下警告打印的位置,只有一处在isConnectionAlive
方法中.
源码如下: com.zaxxer.hikari.pool.PoolBase#isConnectionAlive
boolean isConnectionAlive(final Connection connection) { try { try { setNetworkTimeout(connection, validationTimeout); final int validationSeconds = (int) Math.max(1000L, validationTimeout) / 1000; if (isUseJdbc4Validation) { return connection.isValid(validationSeconds); } try (Statement statement = connection.createStatement()) { if (isNetworkTimeoutSupported != TRUE) { setQueryTimeout(statement, validationSeconds); } statement.execute(config.getConnectionTestQuery()); } } finally { setNetworkTimeout(connection, networkTimeout); if (isIsolateInternalQueries && !isAutoCommit) { connection.rollback(); } } return true; } catch (Exception e) { lastConnectionFailure.set(e); logger.warn("{} - Failed to validate connection {} ({}). Possibly consider using a shorter maxLifetime value.", poolName, connection, e.getMessage()); return false; } }
该方法用于检测connection的有效性,catch到异常后,只是简单打印文章开头所提的异常警告信息,并返回false,表示连接失效。
然后会继续从连接池中获取有效的连接。
获取连接的代码如下: com.zaxxer.hikari.pool.HikariPool#getConnection(long hardTimeout)
public Connection getConnection(final long hardTimeout) throws SQLException { suspendResumeLock.acquire(); final long startTime = currentTime(); try { long timeout = hardTimeout; do { PoolEntry poolEntry = connectionBag.borrow(timeout, MILLISECONDS); if (poolEntry == null) { break; // We timed out... break and throw exception } final long now = currentTime(); if (poolEntry.isMarkedEvicted() || (elapsedMillis(poolEntry.lastAccessed, now) > aliveBypassWindowMs && !isConnectionAlive(poolEntry.connection))) { closeConnection(poolEntry, poolEntry.isMarkedEvicted() ? EVICTED_CONNECTION_MESSAGE : DEAD_CONNECTION_MESSAGE); timeout = hardTimeout - elapsedMillis(startTime); } else { metricsTracker.recordBorrowStats(poolEntry, startTime); return poolEntry.createProxyConnection(leakTaskFactory.schedule(poolEntry), now); } } while (timeout > 0L); metricsTracker.recordBorrowTimeoutStats(startTime); throw createTimeoutException(startTime); } catch (InterruptedException e) { Thread.currentThread().interrupt(); throw new SQLException(poolName + " - Interrupted during connection acquisition", e); } finally { suspendResumeLock.release(); } }
为了打印更多的异常栈信息,笔者修改了isConnectionAlive
方法源码,在catch到异常后,打印整个异常栈信息,最终得到如下日志:
2021-03-19 13:25:58.486 WARN 21806 --- [scheduling-1] com.zaxxer.hikari.pool.PoolBase : slave - Failed to validate connection com.mysql.cj.jdbc.ConnectionImpl@2e2ce118 (No operations allowed after connection closed.). Possibly consider using a shorter maxLifetime value.
2021-03-19 13:25:58.489 ERROR 21806 --- [scheduling-1] com.zaxxer.hikari.pool.PoolBase : poolBase:java.sql.SQLNonTransientConnectionException: No operations allowed after connection closed.
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:110) ~[mysql-connector-java-8.0.23.jar!/:8.0.23] at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97) ~[mysql-connector-java-8.0.23.jar!/:8.0.23] at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:89) ~[mysql-connector-java-8.0.23.jar!/:8.0.23] at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:63) ~[mysql-connector-java-8.0.23.jar!/:8.0.23] at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:73) ~[mysql-connector-java-8.0.23.jar!/:8.0.23] at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:73) ~[mysql-connector-java-8.0.23.jar!/:8.0.23] at com.mysql.cj.jdbc.ConnectionImpl.setNetworkTimeout(ConnectionImpl.java:2495) ~[mysql-connector-java-8.0.23.jar!/:8.0.23] at com.zaxxer.hikari.pool.PoolBase.setNetworkTimeout(PoolBase.java:563) ~[classes!/:na] at com.zaxxer.hikari.pool.PoolBase.isConnectionAlive(PoolBase.java:173) ~[classes!/:na] at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:186) [HikariCP-3.4.5.jar!/:na] at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:162) [HikariCP-3.4.5.jar!/:na] at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:100) [HikariCP-3.4.5.jar!/:na] at org.springframework.jdbc.datasource.lookup.AbstractRoutingDataSource.getConnection(AbstractRoutingDataSource.java:194) [spring-jdbc-5.3.4.jar!/:5.3.4] at org.springframework.jdbc.datasource.DataSourceUtils.fetchConnection(DataSourceUtils.java:158) [spring-jdbc-5.3.4.jar!/:5.3.4] at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:116) [spring-jdbc-5.3.4.jar!/:5.3.4] at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:79) [spring-jdbc-5.3.4.jar!/:5.3.4] .........................Caused by: com.mysql.cj.exceptions.ConnectionIsClosedException: No operations allowed after connection closed. at sun.reflect.GeneratedConstructorAccessor32.newInstance(Unknown Source) ~[na:na] at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.8.0_91] at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[na:1.8.0_91] at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:61) ~[mysql-connector-java-8.0.23.jar!/:8.0.23] at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:105) ~[mysql-connector-java-8.0.23.jar!/:8.0.23] at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:151) ~[mysql-connector-java-8.0.23.jar!/:8.0.23] at com.mysql.cj.NativeSession.checkClosed(NativeSession.java:1171) ~[mysql-connector-java-8.0.23.jar!/:8.0.23] at com.mysql.cj.jdbc.ConnectionImpl.checkClosed(ConnectionImpl.java:573) ~[mysql-connector-java-8.0.23.jar!/:8.0.23] at com.mysql.cj.jdbc.ConnectionImpl.setNetworkTimeout(ConnectionImpl.java:2491) ~[mysql-connector-java-8.0.23.jar!/:8.0.23] ... 44 common frames omittedCaused by: com.mysql.cj.exceptions.CJCommunicationsException: Communications link failureThe last packet successfully received from the server was 1,758,048 milliseconds ago. The last packet sent successfully to the server was 1,758,049 milliseconds ago.
at sun.reflect.GeneratedConstructorAccessor31.newInstance(Unknown Source) ~[na:na] at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.8.0_91] at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[na:1.8.0_91] at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:61) ~[mysql-connector-java-8.0.23.jar!/:8.0.23] at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:105) ~[mysql-connector-java-8.0.23.jar!/:8.0.23] at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:151) ~[mysql-connector-java-8.0.23.jar!/:8.0.23] at com.mysql.cj.exceptions.ExceptionFactory.createCommunicationsException(ExceptionFactory.java:167) ~[mysql-connector-java-8.0.23.jar!/:8.0.23] at com.mysql.cj.protocol.a.NativeProtocol.send(NativeProtocol.java:577) ~[mysql-connector-java-8.0.23.jar!/:8.0.23] at com.mysql.cj.protocol.a.NativeProtocol.sendCommand(NativeProtocol.java:632) ~[mysql-connector-java-8.0.23.jar!/:8.0.23] at com.mysql.cj.protocol.a.NativeProtocol.sendQueryPacket(NativeProtocol.java:948) ~[mysql-connector-java-8.0.23.jar!/:8.0.23] at com.mysql.cj.protocol.a.NativeProtocol.sendQueryString(NativeProtocol.java:894) ~[mysql-connector-java-8.0.23.jar!/:8.0.23] at com.mysql.cj.NativeSession.execSQL(NativeSession.java:1073) ~[mysql-connector-java-8.0.23.jar!/:8.0.23] at com.mysql.cj.jdbc.StatementImpl.executeInternal(StatementImpl.java:722) ~[mysql-connector-java-8.0.23.jar!/:8.0.23] at com.mysql.cj.jdbc.StatementImpl.execute(StatementImpl.java:646) ~[mysql-connector-java-8.0.23.jar!/:8.0.23] at com.zaxxer.hikari.pool.PoolBase.isConnectionAlive(PoolBase.java:169) ~[classes!/:na] ... 42 common frames omittedCaused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.8.0_91] at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) ~[na:1.8.0_91] at java.net.SocketOutputStream.write(SocketOutputStream.java:153) ~[na:1.8.0_91] at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[na:1.8.0_91] at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) ~[na:1.8.0_91] at com.mysql.cj.protocol.a.SimplePacketSender.send(SimplePacketSender.java:55) ~[mysql-connector-java-8.0.23.jar!/:8.0.23] at com.mysql.cj.protocol.a.TimeTrackingPacketSender.send(TimeTrackingPacketSender.java:50) ~[mysql-connector-java-8.0.23.jar!/:8.0.23] at com.mysql.cj.protocol.a.NativeProtocol.send(NativeProtocol.java:568) ~[mysql-connector-java-8.0.23.jar!/:8.0.23] ... 49 common frames omitted
从最后的Caused by: java.net.SocketException: Broken pipe 似乎只能看出在发送数据时socket已经被断开。
于是想到是不是socket的connectionTime
和 socketTimeout
太短,网络较差导致发送sql请求时超时???
又是一通debug之后,深入到mysql-connector-java客户端源码中,最终发现了如下两段重要的代码,用于初始化mysql的连接。
其中成员变量propertySet
则保存了全部的连接参数以及数据连接配置参数。
com.mysql.cj.NativeSession#connect
源码
com.mysql.cj.protocol.StandardSocketFactory#configureSocket
源码
com.mysql.cj.CoreSession#propertySet
中保存的连接配置属性如下:
resulphpt = {Properties@10050} size = 167 "useCompression" -> "false" "maxRows" -> "-1" "tcpTrafficClass" -> "0" "allowNanAndInf" -> "false" "databaseTerm" -> "CATALOG" "sendFractionalSeconds" -> "true" "loadBalanceblocklistTimeout" -> "0" "tcpKeepAlive" -> "true" "allowReplicaDownConnections" -> "false" "maxQuerySizeToLog" -> "2048" "verifyServerCertificate" -> "false" "cacheServerConfiguration" -> "false" "dontCheckOnDuplicateKeyUpdateInSQL" -> "false" "paranoid" -> "false" "allowSourceDownConnections" -> "false" 编程客栈"xdevapi.ssl-truststore-type" -> "JKS" "fallbackToSystemTrustStore" -> "true" "enableQueryTimeouts" -> "true" "blobSendChunkSize" -> "1048576" "useNanosForElapsedTime" -> "false" "readOnlyPropagatesToServer" -> "true" "profilerEventHandler" -> "com.mysql.cj.log.LoggingProfilerEventHandler" "secondsBeforeRetrySource" -> "30" "parseInfoCacheFactory" -> "com.mysql.cj.PerConnectionLRUFactory" "includeThreadNamesAsStatementComment" -> "false" "xdevapi.compression" -> "PREFERRED" "socketTimeout" -> "5000" "socksProxyPort" -> "1080" "requireSSL" -> "false" "largeRowSizeThreshold" -> "2048" "xdevapi.fallback-to-system-keystore" -> "true" "allowMultiQueries" -> "false" "rewriteBATchedStatements" -> "false" "enablePacketDebug" -> "false" "allowpublicKeyRetrieval" -> "false" "noDatetimeStringSync" -> "false" "includeThreadDumpInDeadlockExceptions" -> "false" "enableEscapeProcessing" -> "true" "failOverReadOnly" -> "true" "readFromSourceWhenNoReplicas" -> "false" "clientCertificateKeyStoreType" -> "JKS" "overrideSupportsIntegrityEnhancementFacility" -> "false" "disconnectOnExpiredPasswords" -> "true" "clobberStreamingResults" -> "false" "trustCertificateKeyStoreType" -> "JKS" "jdbcCompliantTruncation" -> "false" "useLocalTransactionState" -> "false" "cacheCallableStmts" -> "false" "rollbackOnPooledClose" -> "true" "cacheResultSetMetadata" -> "false" "loadBalanceValidateConnectionOnSwapServer" -> "false" "selfDestructOnPingMaxOperations" -> "0" "elideSetAutoCommits" -> "false" "loadBalanceHostRemovalGracePeriod" -> "15000" "serverConfigCacheFactory" -> "com.mysql.cj.util.PerVmServerConfigCacheFactory" "cacheDefaultTimeZone" -> "true" "allowLoadLocalInfile" -> "false" "loadBalanceAutoCommitStatementThreshold" -> "0" "getProceduresReturnsFunctions" -> "true" "forceConnectionTimeZoneToSession" -> "false" "useInformationSchema" -> "false" "reportMetricsIntervalMillis" -> "30000" "alwaysSendSetIsolation" -> "true" "tinyInt1isBit" -> "true" "xdevapi.ssl-keystore-type" -> "JKS" "xdevapi.auth" -> "PLAIN" "processEscapeCodesForPrepStmts" -> "true" "emptyStringsConvertToZero" -> "true" "sslMode" -> "DISABLED" "xdevapi.connect-timeout" -> "10000" "zeroDateTimeBehavior" -> "EXCEPTION" "sendFractionalSecondsForTime" -> "true" "maxReconnects" -> "3" "autoDeserialize" -> "false" "callableStmtCacheSize" -> "100" "prepStmtCacheSqlLimit" -> "256" "resultSetSizeThreshold" -> "100" "useSSL" -> "false" "metadataCacheSize" -> "50" "useHostsInPrivileges" -> "true" "tcpRcvbuf" -> "0" "dontTrackOpenResources" -> "false" "autoGenerateTestcaseScript" -> "false" "characterEncoding" -> "utf-8" "xdevapi.fallback-to-system-truststore" -> "true" "nullDatabaseMeansCurrent" -> "false" "useLocalSessionState" -> "false" "detectCustomCollations" -> "false" "useoldAliasMetadataBehavior" -> "false" "autoReconnect" -> "false" "autoReconnectForPools" -> "false" "traceProtocol" -> "false" "maxAllowedPacket" -> "16777216" "populateInsertRowWithDefaultValues" -> "false" "useUnbufferedInput" -> "true" "ignoreNonTxTables" -> "false" "useAffectedRows" -> "false" "ha.loadBalanceStratjsegy" -> "random" "loadBalancePingTimeout" -> "0" "compensateOnDuplicateKeyUpdateCounts" -> "false"
最后发现在验证connection有效性时,connectionTime
和 socketTimeout
设置的就是数据源中配置的connection-timeout
和validation-timeout
,并通过ping命令检测发现,网络并不是很慢,所以网络请求超时这个可能也排除。
调试过程中发现mysql客户端和服务端之间Socket通信,心跳是通过TCP协议中默认的tcpKeepAlive
参数来维护心跳的。
通过搜索得知,linux中默认是2小时。
会不会服务器tcp_keepalive_time时间被修改过,但因为网络缓慢,心跳未接收到,导致服务器关闭了连接???
可以通过linux的如下命令查看
cat /proc/sys/net/ipv4/tcp_keepalive_time
在笔者的应用服务器上发现这个时间确实被改成了较短的180s,那么mysql所在的机器,很可能也是被改过。
于是找到dba查看该项的配置。同样是180s,但这个时间也不至于让心跳超时啊!
最终原因: 好在dba询问了一下为什么要查这个参数,然后笔者将警告信息以及排查的过程描述了一下,结果dba那边竟然告诉我有定时任务, 删除4分钟内没有请求的连接,而我的程序访问数据库时间间隔是28分钟一次,所以早就在mysql服务端关闭了(捂脸哭的表情)
那如何让这个告警不打印呢?
既然数据库服务端会在4分钟后删除没有请求的连接,我们可以减小Hikari中maxLifetime
的值小于该值,这样达到maxLifetime
时Hikari连接池会通过软驱逐的方式来主动关闭该连接,下次就不会获取到因超时而关闭的连接,也就不会出现警告了。
彩蛋:Hikari中max-lifetime参数作用及实现
该参数就是用于控制连接池中一个connection的最大存活时间。
Hikari内部有一个ScheduledExecutorService
对象houseKeepingExecutorService
,每创建一个连接,就会被封装成一个PoolEntry
对象,然后放在定时任务中,定时时间就是设置的max-lifetime
。只要到达这个时间就会采取软驱逐的方式从池中移除。
除此之外,houseKeepingExecutorService
还用于每隔30s来检查一次池中的空闲连接、最大连接情况,并维护池中连接数的平衡。
何谓软驱逐
- 如果这个连接正在被使用,则不立即关闭连接,但是通过PoolEntry对象中的
private volatile boolean evict;
字段来标记为需要关闭。下次有线程来获取到这个连接时,发现evict=true
则调用异步关闭方法。重新获取池中其它的连接。 - 如果这个连接未被使用,则立即调用异步关闭连接的方法。
源码见: com.zaxxer.hikari.pool.HikariPool#createPoolEntry
private PoolEntry createPoolEntry() { try { final PoolEntry poolEntry = newPoolEntry(); final long maxLifetime = config.getMaxLifetime(); if (maxLifetime > 0) { // variance up to 2.5% of the maxlifetime final long variance = maxLifetime > 10_000 ? ThreadLocalRandom.current().nextLong( maxLifetime / 40 ) : 0; final long lifetime = maxLifetime - variance; poolEntry.setFutureEol(houseKeepingExecutorService.schedule( () -> { if (softEvictConnection(poolEntry, "(connection has passed maxLifetime)", false /* not owner */)) { addBagIte编程m(connectionBag.getWaitingThreadCount()); } }, lifetime, M开发者_Go培训ILLISECONDS)); } return poolEntry; } catch (ConnectionSetupException e) { if (poolState == POOL_NORMAL) { // we check POOL_NORMAL to avoid a flood of messages if shutdown() is running concurrently logger.error("{} - Error thrown while acquiring connection from data source", poolName, e.getCause()); lastConnectionFailure.set(e); } } catch (Exception e) { if (poolState == POOL_NORMAL) { // we check POOL_NORMAL to avoid a flood of messages if shutdown() is running concurrently logger.debug("{} - Cannot acquire connection from data source", poolName, e); } } return null; }
总结
整个问题排查过程中基本把Hikari连接池源码看完了,也发现了Hikari内部的一些实现细节,更多细节参见:Hikari 数据库连接池内部源码实现细节
如果Hikari连接池遇到Failed to validate connection com.mysql.cj.jdbc.ConnectionImpl问题注意从下面三个方向排查。
- 1.Hikari连接池的
max-lifetime
(默认30分钟)(对应springboot中的spring.datasource.hikari.max-lifetime
)是否大于了MySQL中wait_timeout
和interactive_timeout
(默认8小时)。 - 2.检查MySQL服务的
/etc/my.cnf
配置文件中wait_timeout
和interactive_timeout
两个超时时间是否和通过sqlSHOW VARIABLES LIKE '%timeout%'
查出的结果一致。 - 3.检查是否和笔者的情况一下,dba为了节约连接资源,通过脚本定时清理了长时间没有请求的连接。
以上为个人经验,希望能给大家一个参考,也希望大家多多支持我们。
精彩评论