开发者

How to debug JBoss or PostgreSQL out of memory problem?

I am trying to debug a JBoss out of memory problem. When JBoss starts up and runs for a while, it seems to use memory as intended by the startup configuration. However, it seems that when some unknown user action is taken (or the log file grows to a certain size) using the sole web application JBoss is serving up, memory increases dramatically and JBoss freezes. When JBoss freezes, it is difficult to kill the process or do anything because of low memory.

When the process is finally killed via a -9 argument and the server is restarted, the log file is very small and only contains outputs from the startup of the newly started process and not any information on why the memory increased so much. This is why it is so hard to debug: server.log does not have information from the killed process. The log is set to grow to 2 GB and the log file for the new process is only about 300 Kb though it grows properly during normal memory circumstances.

This is information on the JBoss configuration:

JBoss (MX MicroKernel) 4.0.3

JDK 1.6.0 update 22

PermSize=512m

MaxPermSize=512m

Xms=1024m

Xmx=6144m

This is basic info on the system:

Operating system: CentOS Linux 5.5

Kernel and CPU: Linux 2.6.18-194.26.1.el5 on x86_64

Processor information: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz, 8 cores

This is good example information on the system during normal pre-freeze conditions a few minutes after the jboss service startup:

Running processes: 183

CPU load averages: 0.16 (1 min) 0.06 (5 mins) 0.09 (15 mins)

CPU usage: 0% user, 0% kernel, 1% IO, 99% idle

Real memory: 17.38 GB total, 2.46 GB used

Virtual memory: 19.59 GB total, 0 bytes used

Local disk space: 113.37 GB total, 11.89 GB used

When JBoss freezes, system information looks like this:

Running processes: 225

CPU load averages: 4.66 (1 min) 1.84 (5 mins) 0.93 (15 mins)

CPU usage: 0% user, 12% kernel, 73% IO, 15% idle

Real memory: 17.38 GB total, 17.18 GB used

Virtual memory: 19.59 GB total, 706.29 MB used

Local disk space: 113.37 GB total, 11.89 GB used

===========================================================

UPDATE TO THIS QUESTION IS ADD开发者_StackOverflow社区ED BELOW

Thank you very much for your comments. We are posting an update to this question that will likely be helpful.

On 3 more occurrences of the memory issue, using the unix top utility seems to indicate that the JBoss process is the process consuming all the memory. When the problem occurs, it seems to happen very quickly. For example, after JBoss in running fine for a while (ex. several days), at some point users take certain mysterious actions after which it seems to take 1-3 minutes for memory consumption to ramp up to a level that causes major performance degradation and another 5-10 minutes for that degradation to become severe (ex. difficult to run simple bash commands through ssh). Of course, this pattern varies a bit depending on what users are doing on the web application.

For example, when sorting by memory, on one occurrence the JBoss process is reported to have the following statistics (note that the real memory is 17.38 GB total and JBoss is only given a 6 GB heap):

VIRT (total virtual memory): 23.1g

RES (resident set size): 15g

%CPU: 111.3%

%MEM: 97.6%

In that same example, 9 minutes later the JBoss process is reported to have the following statistics:

VIRT (total virtual memory): 39.1g

RES (resident set size): 17g

%CPU: 415.6%

%MEM: 98.4%

After killing the JBoss process with a SIGKILL signal (-9), the new JBoss process is reported to have the statistics similar to the following:

VIRT (total virtual memory): 7147m

RES (resident set size): 1.3g

%CPU: 11.6%

%MEM: 7.3%

Now that we know it is the JBoss process that is consuming all the memory, we want to figure out where it is going. We have tried jmap with a command such as jmap -dump:file=/home/dump.txt 16054 however this seems to make the server much less responsive and after some time nothing seems to happen (ex. prompt does not return). Our guess is because so little memory is available and the heap is so large something hangs.

Also, we set the JVM options -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/dumps when starting the JVM but nothing seems to be written to the path when the memory problem occurs.

These other options have been suggested:

[1] use pmap to produce a listing of the process address space and look for large chunks (particularly large chunks that have the name [anon])

[2] send SIGQUIT (kill -QUIT) to the process several times in succession and look for common stack traces

[3] use jstack to get a thread dump with a command such as jstack > tdump.out

[4] mess around with the JBoss Management Tools / Console that's included with JBoss and see what kind of objects are left hanging around as the thing starts to eat up memory

[5] explore Nagios as another monitoring solution

Here are some follow-up questions:

* From the above top report information, are there any new insights or thoughts on the problem?

* For the above options 1-5, which are the most likely to work under the extremely low memory circumstances that the problem creates?

* For the above options 1-5, which are the most likely to work under the very short time frame that the problem allows for diagnosis (ex. 1-3 minutes)?

* Is there a way to automatically write to a text file a time stamp when the memory use of a specific process reaches several specific percentage thresholds so this time stamp can be used when looking through he JBoss log files?

* Is there a way to automatically send an email with a time stamp when the memory use of a specific process reaches several specific percentage thresholds so this can be used for us to begin more focused monitoring?


I've worked through these types of problems before with this basic process:

  1. set the JVM options -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/dumps when starting the JVM.
  2. run the application, wait for failure (or cause it if you can), collect the dump (.hprof file)
  3. View the dump in Eclipse Memory Analyzer (MAT), which has a nice "Leak Suspects Report"
  4. The report hopefully say something like "82,302 instances of class XYZ are occupying 74% of heap space" You can then inspect some of those objects if you need more info.

Hopefully that would be enough to at least point you in the right direction to find your leak.

Happy debugging!


This is not enough information for a diagnosis.

But let's start with what we have. I don't know what you're using to show memory statistics, but it shows that your overall system memory consumption has jumped 15 GB. Which is strange, considering you've only given JBoss a 6 GB heap.

So the first thing to do is verify that JBoss is the actual problem. Easiest way to do this is with top, sorting either by total virtual memory (VIRT) or resident set size (RES). To change the sort field, type a capital "F" and then select the field in the screen that follows.

If it is the JBoss process that's consuming all that memory, then you need to figure out where it's going. Possibilities include large memory-mapped JARfiles, off-heap buffers allocated via Java, and memory allocated from a native module. Since you'll have the process ID from top, use pmap to produce a listing of the process address space and look for large chunks (particularly large chunks that have the name [anon]).

If it's not clear where the memory is being allocated, you can always send SIGQUIT (kill -QUIT) to the process, which will write a thread dump to stderr (which will either go to the console or -- hopefully -- to a logfile). Do this several times in succession, and look for common stack traces.


Based on your updates, which show the virtual size growing for the JBoss process, I think that examining the Java heap is a waste of time. While I suppose it's possible that the JVM is ignoring the -Xmx option, it's extremely unlikely.

So that means the growth is happening in non-heap memory. Some possibilities:

  • Use of direct ByteBuffers. If you're using buffers to cache results from the database, then it's very possible that you're allocating too many buffers. This would be diagnosed via pmap, looking for large [anon] blocks.
  • Uncontrolled thread creation. Each thread requires some amount of space for its thread stack. I wouldn't expect this to be the problem, because the amount of per-thread space is tiny (iirc, under 1 MB); you'd have to be creating tens of thousands of them. You can diagnose this with pmap, looking for small [anon] blocks, or by sending SIGQUIT to the JVM.
  • Native code that's allocating lots of memory on the C heap. You can probably diagnose with pmap, but a first step is to check your dependencies to see if there's a native library. And if there is, debug with gdb or equivalent.

As a final comment: rather than ask what is likely to work under low-memory conditions, I recommend just trying the options and seeing what does and doesn't work.


One solution is to use remote JMX to the JBoss server with VisualVM (included in the latest JDK).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜