Sql Server JDBC Connection Reset Error : Only on Amazon EC2

2023-03-11 10:38 问答作者：

Context: The Cloud

We have a java-based web application that we normally host on our own servers. Recently we used Amazon Web Services (AWS EC2) cloud to host an instance.

This "cloud setup" matches our typical "on site" setup: one server for the app server, another server for the database server. (Several app servers point to the same database server)

The problem In this cloud setup, we receive intermittent "connection reset by peer errors" between the database and the jdbc driver, where at (seemingly) random intervals and at random points in the codebase, the database connection fails.

Here are a few error excerpts for the log

Stack Trace Example 1:

at com.participate.pe.genericdisplay.client.taglib.GenDisplayViewTag.doStartTag(GenDisplayViewTag.java:77)
    ... 75 more
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The connection is closed.
    at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:170)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.checkClosed(SQLServerConnection.java:304)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.getMetaData(SQLServerConnection.java:1734)
    at org.jboss.resource.adapter.jdbc.WrappedConnection.getMetaData(WrappedConnection.java:354)

Stack Trace Example 2

    at java.lang.Thread.run(Thread.java:619)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Connection reset
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:1368)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:1355)
    at com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:1532)
    at com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:3274)
    at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:4437)
    at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:4389)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection$1ConnectionCommand.doExecute(SQLServerConnection.java:1457)
    at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:4026)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1416)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectionCommand(SQLServerConnection.java:1462)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.setAutoCommit(SQLServerConnection.java:1610)
    at org.jboss.resource.adapter.jdbc.BaseWrapperManagedConnection.checkTransaction(BaseWrapperManagedConnection.java:429)

Technical Environment

Jboss 4.2.2.GA (Jboss-Web 2.0/ Tomcat 6)
MSSQL 2005 2.0 jdbc driver

Some points

We have never seen this problem in our own environment (i.e. own data centers) running the application for several years
This led me to conclude "something funny is going on with Amazon network environment". I may be wrong/missing something/etc.
This problem only occurs with our application. We have other jav开发者_开发百科a and php applications which have not had this problem. The other java application uses a different jdbc driver (jtds, afaik)
It doesn't seem like a simple connection timeout

Questions

-Has anyone seen this before? -If it's an EC2 "known issue", can we configure our way around the problem (i.e. make sure everything is on its own subnet or virtual private cloud (vpc) ? -Any jdbc driver settings to get past this problem?

** Update ** I've extended and increased the bounty on this question.

On extra bit of information: the two virtual servers (database and application server) were on different subnets--i.e. one hop between the two servers.

In a non-cloud environment we have "zero hops" bewtewn the two servers.

Our hosting admins said we had no control over the subnets of our EC2 instances. This made me wonder if virtual private cloud would help.

thanks in advance

will

Not sure if this is related or not. We experienced something similar with an app that we were running in the EC2 environment. Same symptom, that the database connection would intermittently close. We were using MSSQL 1.2 driver. Also, we would see the errors usually after a delay or idle time with the connection. Our assumption (never proven) was that something in the network layer was closing the connection and the client wasn't detecting it, so it became stale.

We were able to work around it because we were using commons connection pools, and had the pool recreate the connection on failure. We eventually moved the application out of EC2 and didn't see the issue again.

Just a word of caution on usind DBCP/connection pool features to mitigate the issue - the more you enable 'testOnBorrow' and other features, the more you can introduce latency or other performance changing affects on the system. I don't know if DBCP still does this or not, but a few years ago it would generate actual test queries to test the connection - full stack, database responses - not just at the network layer. The above link from Brian brings back horrific memories from the early 2000s on surrounding re-try logic for JDBC connection management.

Anyway, it's tough to really root cause this, other than gather evidence and eliminate the 'seemingly random' to a specific set of conditions:

You could try to throw up a Wireshark/PCAP trace, find when it happens, and send the results to both Amazon and Microsoft to see if they can root cause it
You could try the above with certain test harnesses to isolate the problem (JMeter tests to get concurrency up), bounce the network connection, watch for recovery, etc
You could try alternative versions of SQL Server to discount a SQL Server/JDBC driver bug that has since been fixed.
If DNS is used in connection strings, could use IP addresses to validate nslookup issues

I'm not a SQL Server expert, but another route for research could be within the related products domain - e.g. see if anyone experienced similar issues with TFS/Sharepoint (e.g. such as http://nickhoggard.wordpress.com/2009/12/07/further-experiences-with-tfs-2010-beta-2-on-amazon-ec2/ )

I have seen this issue in both the EC2 environment and the Windows Azure environment. I think connection retry logic needs to be a standard part of your design when working in a distributed computing environment.

This article is for SQL Azure - but I think it equally applies to EC2 and all drivers.

I can also confirm that this happens and will spin up a lower priority investigation since it's not production critical.
Our production servers are in our data center. We use developer laptops to run our applications. Neither of these get this issue once we configured c3p0 connection pool timeouts and test period (see article: http://www.codefin.net/2007/05/hibernate-and-mysql-connection-timeouts.html).

However, we do have a development staging server that is in EC2 and it does indeed happen there. If I find something that seems to work, I'll ping back. Also, I'm using mysql. I see that you are using MS SQL Server so it is across database vendors.

继续阅读：amazon-ec2 jakarta-ee jboss

Sql Server JDBC Connection Reset Error : Only on Amazon EC2

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？