开发者

What could cause socket ConnectException: Connection timed out?

We have a Webstart client that communicates to the server by sending serialized objects over HTTPS using java.net.HttpsURLConnection.

Everything works perfectly fine on my local machine and on test servers located in our office, but I'm experiencing a very, very strange issue which is only occurring on our production and staging servers (and sporadically at that). The main difference I know of between those servers and the ones in our office is that they are located elsewhere and client-server communication with them is considerably slower, but it worked fine for a long time in production prior to this as well.

Anyway, here's what's happening:

  • The client, after setting options such as read timeout and properties such as Content-Type on the HttpURLConnection, calls getOutputStream() on it to get the stream to write to.
  • At this point, from what I can tell, the client hangs for some period of time.
  • The client then throws the following exception:
java.net.ConnectException: Connection timed out: connect
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.PlainSocketImpl.doConnect(Unknown Source)
    at java.net.PlainSocketImpl.connectToAddress(Unknown Source)
    at java.net.PlainSocketImpl.connect(Unknown Source)
    at java.net.SocksSocketImpl.connect(Unknown Source)
    at java.net.Socket.connect(Unknown Source)
    at com.sun.net.ssl.internal.ssl.SSLSocketImpl.connect(Unknown Source)
    at com.sun.net.ssl.internal.ssl.BaseSSLSocketImpl.connect(Unknown Source)
    at sun.net.NetworkClient.doConnect(Unknown Source)
    at sun.net.www.http.HttpClient.openServer(Unknown Source)
    at sun.net.www.http.HttpClient.openServer(Unknown Source)
    at sun.net.www.protocol.https.HttpsClient.(Unknown Source)
    at sun.net.www.protocol.https.HttpsClient.New(Unknown Source)
    at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(Unknown Source)
    at sun.net.www.protocol.http.HttpURLConnection.plainConnect(Unknown Source)
    at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(Unknown Source)
    at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(Unknown Source)
    at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(Unknown Source)

Note开发者_如何学编程 that this is not a SocketTimeoutException, which the connect() method on HttpURLConnection says it throws if the timeout expires before a connection can be established. Also, when this happens I am able to call conn.getResponseCode() and I get a response code of 200.

  • On the server side, an EOFException is thrown in ObjectInputStream's constructor, which tries to read the serialization header but fails because the client never gets the OutputStream to write to.

In case it helps, here are the calls being made on the HttpsURLConnection prior to the call to getOutputStream() (edited to show only the calls being made rather than the whole structure of the code doing this):

HttpsURLConnection conn = (HttpsURLConnection) url.openConnection();
conn.setUseCaches(false);
conn.setReadTimeout(30000);
conn.setRequestProperty("Cookie", cookie);
conn.setDoOutput(true);
conn.setRequestProperty("Content-Type", "application/x-java-serialized-object");
conn.getOutputStream();

The thing is, I have no idea how any of this could be happening, especially given that it only happens occasionally (no clear pattern of activity that I can tell) and even then only when there's (relatively) high latency between the client and the server.

Given what I've been able to find so far about java.net.ConnectException: Connect timed out, I wondered if it weren't some network or firewall issue on the network our servers are running on... but that doesn't make much sense to me given that the request is clearly getting through to the servlet. Also, other apps running on the same network have not reported similar issues.

Does anyone have any idea what the cause of this could be, or even what I should investigate?


We have come across these in a similar case to yours. Usually at high load and not easy to reproduce on test. Have not fixed it yet but this is the steps we went through.

If it's a firewall issue, we would get a Connection Refused or the SocketTimeout exception.

1) Are you able to track these requests in the access log on the server - do they show an HTTP status 200 or 404 or something else? In our case, the server (IIS in this case) logs showed the client closed the connection and not the server. So that was a mystery.

Update: If the client always gets a 200, then the server has actually sent back some response but I suspect the response byte-size (if this is recorded in the access logs) will show a different value from that of the normal response size for that request.

If it shows the same size of response, then you have a (may be not plausible) condition that the server actually responded correctly but the client did not get the response back because the connection terminated somewhere in between.

2) The network admin teams looked at the TCP/IP traffic to determine which end (or intermediate router) is terminating the HTTP / TCP-IP conversation. And once we understand which end is terminating the connection is to look at why. Someone knowledgable enough could run snoop

3) Is there a max number of requests configured/restricted on the server - and is that throttling your connections?

4) Are there any intermediate load balancers at which requests could be dropped?

Update: One more thing we wanted to, but did not complete is to create a static route between client and server to reduce the number of hops in between and ensure no network related connection drops. See http://en.wikipedia.org/wiki/Static_routing

5) Another suggestion is setting the ConnectTimeout too to see if these work with a higher value. Update: You might want to try conn.getErrorStream()

Returns the error stream if the connection failed but the server sent useful data nonetheless. If the connection was not connected, or if the server did not have an error while connecting or if the server had an error but no error data was sent, this method will return null.

6) Could also try taking a set of thread dumps on the server 5 seconds apart, to see if any thread shows these incoming requests on the server.

Update: As of today we learnt to live with this problem, because we totalled the failure rate to be 200-300 out of 400,000 requests per day which is 0.00075 %


We also experience sporadic timeouts when using it on our servers. We are able to fix it with two things:

  1. Use specific ContentLength via setFixedLengthStreamingMode (brought down the error rate from ~150 to 10)
  2. Retry if a timeout occurs (Error rate from 10 to 0. After max. one retry everything went through)

pseudo code:

//set timeouts to 6s
try{
 //open connection here and write etc.
 //use a timeout of 6s (since retry is in place)
} 
catch (java.io.InterruptedIOException e) {
 //read- or connection time out try again                 
} 

Another theory why this is happening could be the following:

In the documentation of the HttpURLConnection/HttpsURLConnection one can read the following:

Each HttpURLConnection instance is used to make a single request but the underlying network connection to the HTTP server may be transparently shared by other instances.

So now calling close() only would be ok but also calling disconnect() would terminate the socket for the other users / transparently shared connections which would then run into a SocketTimeOut after the timeout period is reached.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜