What is causing spikes for JDBC calls to Oracle from within Websphere?
I was wondering whether someone can shed some light on the following issue:
We've been seeing spikes for JDBC calls from within a Spring 2.5.6 based web service run on Websphere 6.1 on AIX for calls into Oracle 64-bit 10.2.0.5.0 The JDBC driver version is 10.2.0.3.0.
We're hitting the database with a single thread, the average response time is for the web service is 16ms, but we're seeing 11 spikes of about 1 seconds or higher (amongst about 11,000 calls in 5 minutes). Introscope is telling us that about half these spikes are caused by "select 1 from dual" (which the Websphere connection pool uses to validate the connection).
On the database side, we've traced the sessions created by the Websphere connection pool, and none that does not indicate any spikes inside the database.
Any ideas/suggestions on what could be causing these spikes?
EDIT:
Our connection pool is set up with 20 connections, and m开发者_Python百科onitoring is showing that only one connection is used.
EDIT2:
We've upgraded our Oracle JDBC driver to 10.2.0.5 with no difference.
Perhaps it's a pool that's not sized properly.
11,000 calls in 5 minutes, or 300 seconds, means 37 calls per second. An average of 0.016 seconds per connection means that you can handle 2,313 calls per connection. A pool size of 4-5 should be able to handle the traffic. I don't know if one of those queries runs a little long if a request ends up waiting for a connection to become available.
The 'SELECT 1 FROM DUAL' query is what the pool will execute to check and see if the connection is live and usable.
You could try increasing the size of the pool or looking at some of the other parameters that govern what the pool does with a connection to ensure that it's live.
The answer to this problem ended up not being related to WebSphere or Oracle but was a good old fashioned network configuration problem which resulted in TCP retransmission timeouts between the WebSphere server and the Oracle RAC cluster.
In order to arrive at that diagnostic I was looking at the output of netstat -p tcp
before and after a test run and found that the
retransmit timeouts
stat was increasing. Now the Retransmission Timeout Algorithm configuration can be viewed using:
$ no -a
...
rto_high = 64
rto_length = 13
rto_limit = 7
rto_low = 1
Which indicates that the retransmission timeouts will take between 1 and 64 seconds and will back-off increasingly, which explains why we've been seeing spikes of 1 second, 2 seconds, 4 second, 10 seconds and 22 seconds but nothing away from these peaks (i.e. no 6 second spike).
Once the network config was fixed, the problem went away.
Does switching off "Pretest new connections" help?
精彩评论