Identifying crash with hs_err_pid*.log and gdb
Update Sept. 12, 2011
I was able to get the core file and immediately dissabled the instruction that crashed. As per advice I tracked the value of r28 (by the way, no registry entry was log to hs_err_pid*.log) and check where did the value come from (see below w/ <---). However, I was not able to determine the value of r32.
Could the reason for the miss-alignment is that r28 is a 8-byte integer loaded to a 4-byte integer r31?
;;; 1053 if( Transfer( len ) == FALSE ) {
0xc00000000c0c55c0:2 <TFM::PrintTrace(..)+0x32>: adds r44=0x480,r32;; <---
0xc00000000c0c55d0:0 <TFM::PrintTrace(..)+0x40>: ld8 r43=[ret2]
0xc00000000c0c55d0:1 <TFM::PrintTrace(..)+0x41>: (p6) st4 [r35]=ret3
0xc00000000c0c55d0:2 <TFM::PrintTrace(..)+0x42>: adds r48=28,r33
0xc00000000c0c55e0:0 <TFM::PrintTrace(..)+0x50>: mov ret0=0;;
0xc00000000c0c55e0:1 <TFM::PrintTrace(..)+0x51>: ld8.c.clr r62=[r45]
0xc00000000c0c55e0:2 <TFM::PrintTrace(..)+0x52>: cmp.eq.unc p6,p1=r0,r62
;;; 1056 throw MutexLock ;
0xc00000000c0c55f0:0 <TFM::PrintTrace(..)+0x60>: nop.m 0x0
0xc00000000c0c55f0:1 <TFM::PrintTrace(..)+0x61>: nop.m 0x0
0xc00000000c0c55f0:2 <TFM::PrintTrace(..)+0x62>: (p6) br.cond.dpnt.many _NZ10TFM07PrintTraceEPi+0x800;;
;;; 1057 }
0xc00000000c0c5600:0 <TFM::PrintTrace(..)+0x70>: adds r41=0x488,r32
0xc00000000c0c5600:1 <TFM::PrintTrace(..)+0x71>: adds r40=0x490,r32
0xc00000000c0c5600:2 <TFM::PrintTrace(..)+0x72>: br.call.dptk.many rp=0xc00000000c080620;;
;;; 1060 dwDataLen = len ;
0xc00000000c0c5610:0 <TFM::PrintTrace(..)+0x80>: ld8 r16=[r44] <---
0xc00000000c0c5610:1 <TFM::PrintTrace(..)+0x81>: mov gp=r36
0xc00000000c0c5610:2 <TFM::PrintTrace(..)+0x82>: (p1) mov r62=8;;
0xc00000000c0c5620:0 <TFM::PrintTrace(..)+0x90>: cmp.eq.unc p6=r0,r16
0xc00000000c0c5620:1 <TFM::PrintTrace(..)+0x91>: nop.m 0x0
0xc00000000c0c5620:2 <TFM::PrintTrace(..)+0x92>: (p6) br.cond.dpnt.many _NZ10TFM07PrintTraceEPi+0xda0;;
0xc00000000c0c5630:0 <TFM::PrintTrace(..)+0xa0>: adds r21=16,r16 <---
0xc00000000c0c5630:1 <TFM::PrintTrace(..)+0xa1>: (p1) mov r62=8;;
0xc00000000c0c5630:2 <TFM::PrintTrace(..)+0xa2>: nop.i 0x0
0xc00000000c0c5640:0 <TFM::PrintTrace(..)+0xb0>: ld8 r42=[r21];; <---
0xc00000000c0c5640:1 <TFM::PrintTrace(..)+0xb1>: cmp.eq.unc p6=r0,r42
0xc00000000c0c5640:2 <TFM::PrintTrace(..)+0xb2>: nop.i 0x0
0xc00000000c0c5650:0 <TFM::PrintTrace(..)+0xc0>: nop.m 0x0
0xc00000000c0c5650:1 <TFM::PrintTrace(..)+0xc1>: mov r47=5
0xc00000000c0c5650:2 <TFM::PrintTrace(..)+0xc2>: (p6) br.cond.dpnt.many _NZ10TFM07PrintTraceEPi+0xdf0;;
0xc00000000c0c5660:0 <TFM::PrintTrace(..)+0xd0>: ld4.a r27=[r48]
;;; 1064 if( dwDataLen <= dwViewLen ) {
0xc00000000c0c5660:1 <TFM::PrintTrace(..)+0xd1>: adds r28=28,r42 <--
0xc00000000c0c5660:2 <TFM::PrintTrace(..)+0xd2>: cmp.ne.unc p6=r0,r46;;
0xc00000000c0c5670:0 <TFM::PrintTrace(..)+0xe0>: ld4.sa r26=[r28],
0xc00000000c0c5670:1 <TFM::PrintTrace(..)+0xe1>: (p6) ld4 r31=[r28] <-- instruction that crashed
Let me know if register values are needed. I think I can acquire the register value using info reg command of gdb. This is the result of info registers (I excluded values of prXXX and brXXX), I don't have any idea how to map these to the disassembled instruction above.
gr1: 0x9fffffffbf716588
gr2: 0x9fffffff5f667c00
gr3: 0x9fffffff5f667c00
gr4: 0x6000000000e0b000开发者_开发知识库
gr5: 0x9fffffff8adfe2e0
gr6: 0x9fffffff8ada9000
gr7: 0x9fffffff8ad7a000
gr8: 0x1
gr9: 0x9fffffff8adfd0f0
gr10: 0
gr11: 0xc000000000000690
gr12: 0x9fffffff8adfd140
gr13: 0x6000000001681510
gr14: 0x9fffffffbf7d8e98
gr15: 0x1a
gr16: 0x60000000044dac60
gr17: 0x1f
gr18: 0
gr19: 0x9fffffff8ad023f0
gr20: 0x9fffffff8adfd0e0
gr21: 0x60000000044dac70
gr22: 0x9fffffff5f668000
gr23: 0xd
gr24: 0x1
gr25: 0xc0000000004341f0
gr26: NaT
gr27: 0x63
gr28: 0xc00000000c5f801c
gr29: 0xc00000000029db20
gr30: 0xc00000000029db20
gr31: 0x288
gr32: 0x60000000044796d0
gr33: 0x6000000001a78910
gr34: 0x7e
gr35: 0x6000000001d03a90
gr36: 0x9fffffffbf716588
gr37: 0xc000000000000c9d
gr38: 0xc00000000c0c4f70
gr39: 0x9
gr40: 0x6000000004479b60
gr41: 0x6000000004479b58
gr42: 0xc00000000c5f8000
gr43: 0x9fffffffbf7144e0
gr44: 0x6000000004479b50
gr45: 0x6000000004479b68
gr46: 0x6000000001d03a90
gr47: 0x5
gr48: 0x6000000001a7892c
gr49: 0x9fffffff8adfe110
gr50: 0xc000000000000491
gr51: 0xc00000000c0c5520
gr52: 0xc00000000c07dd10
gr53: 0x9fffffff8adfe120
gr54: 0x9fffffff8adfe0a0
gr55: 0xc00000000000058e
gr56: 0xc00000000042be40
gr57: 0x39
gr58: 0x3
gr59: 0x33
gr60: 0
gr61: 0x9fffffffbf7d2598
gr62: 0x8
gr63: 0x9fffffffbf716588
gr64: 0xc000000000000f22
gr65: 0xc00000000c0c5610
This is an update to my previous post. Since I was furnished a copy of the core file, I used gdb to examine the core file and executed the following command:
1) bt
2) frame n <- the frame where the abort occurred
3) disas
And here are the results.
(gdb) bt
#0 0xc0000000001e5350:0 in _lwp_kill+0x30 ()
from /usr/lib/hpux64/libpthread.so.1
#1 0xc00000000014c7b0:0 in pthread_kill+0x9d0 ()
from /usr/lib/hpux64/libpthread.so.1
#2 0xc0000000002e4080:0 in raise+0xe0 () from /usr/lib/hpux64/libc.so.1
#3 0xc0000000003f47f0:0 in abort+0x170 () from /usr/lib/hpux64/libc.so.1
#4 0xc00000000e65e0d0:0 in os::abort ()
at /CLO/Components/JAVA_HOTSPOT/Src/src/os/hp-ux/vm/os_hp-ux.cpp:2033
#5 0xc00000000eb473e0:0 in VMError::report_and_die ()
at /CLO/Components/JAVA_HOTSPOT/Src/src/share/vm/utilities/vmError.cpp:1008
#6 0xc00000000e66fc90:0 in os::Hpux::JVM_handle_hpux_signal ()
at /CLO/Components/JAVA_HOTSPOT/Src/src/os_cpu/hp-ux_ia64/vm/os_hp-ux_ia64.cpp:1051
#7 <signal handler called>
#8 0xc00000000c0c5670:1 in TFMTrace::PrintTrace () at tfmtrace.cpp:1064
#9 0xc00000000c0c4f70:0 in FMLogger::WriteLog () at fmlogger.cpp:90
...
(gdb) frame 8
#8 0xc00000000c0c5670:1 in TFMTrace::PrintTrace () at tfmtrace.cpp:1064
1064 if( dwDataLen <= dwViewLen ) {
Current language: auto; currently c++
(gdb) disas $pc-16*4 $pc+16*4
...
0xc00000000c0c5660:0 <TFMTrace::PrintTrace(...)+0xd0> : ld4.a r27=[r48] MII,
;;; 1064 if( dwDataLen <= dwViewLen ) {
0xc00000000c0c5660:1 <TFMTrace::PrintTrace(...)+0xd1> : adds r28=28,r42
0xc00000000c0c5660:2 <TFMTrace::PrintTrace(...)+0xd2> : cmp.ne.unc p6=r0,r46;;
0xc00000000c0c5670:0 <TFMTrace::PrintTrace(...)+0xe0> : ld4.sa r26=[r28] MMI,
0xc00000000c0c5670:1 <TFMTrace::PrintTrace(...)+0xe1> : (p6) ld4 r31=[r28]
0xc00000000c0c5670:2 <TFMTrace::PrintTrace(...)+0xe2> : adds r46=24,r42;;
0xc00000000c0c5680:0 <TFMTrace::PrintTrace(...)+0xf0> : (p6) st4 [r35]=r31 MI,I
0xc00000000c0c5680:1 <TFMTrace::PrintTrace(...)+0xf1> : adds r59=36,r42;;
0xc00000000c0c5680:2 <TFMTrace::PrintTrace(...)+0xf2> : nop.i 0x0
0xc00000000c0c5690:0 <TFMTrace::PrintTrace(...)+0x100>: ld4.c.clr r27=[r48] MIB,
;;; 1066 dwLen = dwTrcLen ;
0xc00000000c0c5690:1 <TFMTrace::PrintTrace(...)+0x101>: cmp4.eq.unc p6,p8=99,r27
0xc00000000c0c5690:2 <TFMTrace::PrintTrace(...)+0x102>: nop.b 0x0;;
0xc00000000c0c56a0:0 <TFMTrace::PrintTrace(...)+0x110>: (p8) ld4.c.clr r26=[r28] MMI
;;; 1067 }
0xc00000000c0c56a0:1 <TFMTrace::PrintTrace(...)+0x111>: (p6) st4 [r48]=r47
0xc00000000c0c56a0:2 <TFMTrace::PrintTrace(...)+0x112>: cmp4.geu.unc p7=r26,r27
End of assemb
A "normal" crash in native code causes a report like this:
C [libc.so.6+0x88368] strstr+0x64a
Note small offset from the function (strstr
in this case) to the crash point.
In your case, JVM decided that the address oxc00000000f675671
is inside libtracejni.so
, but the closest function it could find is very far from the crash point (0x5065eff9 == 1.2 GB away).
Is your library really that big?
If it really is that big, chances are you have stripped it, and so the symbol _NZ10TFM07PrintTraceEPi
doesn't actually have anything to do with the problem (which is in the code that is 1.2GB away).
You need to find out what code was really at address oxc00000000f675671
at the time of the crash. Usually hs_err_pid*.log
contains a list of load addresses for all the shared libraries. Find the load address of libtracejni.so
, subtract it from pc
. That should give you an address similar to 0x400...675671
which you should be able to lookup in your unstripped version of libtracejni.so
.
Also note that crash address ends with ASCII "C8G"
, which may or may not be a coincidence.
Update 2011/08/05.
Now you know which instruction crashed:
0x4000000000099670:1 <TFMTrace::PrintTrace(...)+0xe1>: (p6) ld4 r31=[r28]
This is a load of 4-byte integer from memory pointed by r28
.
The next questions are: what is the value of r28
at crash point (should be logged in hs_err*.log), and also where did it come from (complete disassembly of TFM::PrintTrace
will tell you that).
精彩评论