Three hours taken for GC to bring down 1.2GB of heap, what could be the reason?
in one of our servers, Garbage Collection took nearly three hours to try to bring down (successfully) 1.2GB of heap memory. From 1.4GB to 200MB.
During this time the CPU usage was high, almost 80-100%. What could be the reason? We have 4 of such servers with the same configuration (JVM settings, server configuration, hardware, network), assuming nobody has made any changes to it, what could be the reason that the particular server ran a 3 hours GC.
All the other servers were taking only 5 to 10 minutes for each GC activity.
Kindly attached a graph from HP BAC for your easy reference. Shows the time where i suppose GC kicked in, and when GC stopped.
(As Stephen points 开发者_Go百科out for more conclusive findings) Providing these information when the server administrator gets back to me:
- The exact version of the JVM you are using. (Standard Java SE 1.4.2)
- The JVM options. (Coming)
- Details of the web container / server base. (Coming)
- Information about what the service does. Any relevant clues from the server / service log files (Coming)
- Any relevant patterns in the request logs (Coming)
- The GC logs for the time of the event. (If you don't currently have GC logging enabled, you may need to enable it and wait until the problem recurs.) (Coming)
There's not much data to work from here, but my hunch: you're swapping. The only time we ever see GC times go that high is when you've overcommitted the box and it's paging to disk. That can turn things into an order of magnitude (or more) performance degredation.
You need to gather OS (and potentially hypervisor if it applies) swapping statistics to prove or disprove this theory.
(I know CPU time is higher than I'd expect for swapping, but you never know.)
It would also be helpful if you posted the hardware configuration, "java -version" information, and JVM command line arguments (eg: -Xmx and -Xms) to help narrow down what you're really running.
You don't provide much information, but possible reasons might be:
Bugs in your application; e.g. a memory leak with some rather peculiar characteristics, or a task that kept on running out of memory and then restarting.
An accidental or deliberate denial of service attack; e.g. some client that keeps retrying an over-sized request with parameters that reduce the "problem size" each time.
A single extremely long-running request with certain characteristics.
Thrashing - see @Trent Gray-Donald's answer. (If you have overallocated memory, then the GC algorithms, which involve looking at lots objects scattered randomly over lots of pages, are highly likely to provoke thrashing. I'm just not sure that this would result in a gradually falling heap usage like you are seeing.)
A pathological combination of JVM settings.
A bug in the garbage collector in the particular JVM you are using.
Some combination of the above.
This is the kind of problem that would warrant getting an Oracle / Java support contract.
The following information might help diagnose this:
- The exact version of the JVM you are using.
- The JVM options.
- Details of the web container / server base.
- Information about what the service does.
- Any relevant clues from the server / service log files
- Any relevant patterns in the request logs
- The GC logs for the time of the event. (If you don't currently have GC logging enabled, you may need to enable it and wait until the problem recurs.)
精彩评论