How do you read a segfault kernel log message
This can be a very simple question, I'm am attempting to debug an application which generates the following segfault error in the kern.log
kernel: myapp[15514]: segfault at 794ef0 ip 080513b sp 794ef0 error 6 in myapp[8048000+24000]
Here are my questions:
Is there any documentation as to what are the diff error numbers on segfault, in this instance it is error 6, but i've seen error 开发者_如何学Python4, 5
What is the meaning of the information
at bf794ef0 ip 0805130b sp bf794ef0 and myapp[8048000+24000]
?
So far i was able to compile with symbols, and when i do a x 0x8048000+24000
it returns a symbol, is that the correct way of doing it? My assumptions thus far are the following:
- sp = stack pointer?
- ip = instruction pointer
- at = ????
- myapp[8048000+24000] = address of symbol?
When the report points to a program, not a shared library
Run addr2line -e myapp 080513b
(and repeat for the other instruction pointer values given) to see where the error is happening. Better, get a debug-instrumented build, and reproduce the problem under a debugger such as gdb.
If it's a shared library
In the libfoo.so[NNNNNN+YYYY]
part, the NNNNNN
is where the library was loaded. Subtract this from the instruction pointer (ip
) and you'll get the offset into the .so
of the offending instruction. Then you can use objdump -DCgl libfoo.so
and search for the instruction at that offset. You should easily be able to figure out which function it is from the asm labels. If the .so
doesn't have optimizations you can also try using addr2line -e libfoo.so <offset>
.
What the error means
Here's the breakdown of the fields:
address
- the location in memory the code is trying to access (it's likely that10
and11
are offsets from a pointer we expect to be set to a valid value but which is instead pointing to0
)ip
- instruction pointer, ie. where the code which is trying to do this livessp
- stack pointererror
- Architecture-specific flags; seearch/*/mm/fault.c
for your platform.
Based on my limited knowledge, your assumptions are correct.
sp
= stack pointerip
= instruction pointermyapp[8048000+24000]
= address
If I were debugging the problem I would modify the code to produce a core dump or log a stack backtrace on the crash. You might also run the program under (or attach) GDB.
The error code is just the architectural error code for page faults and seems to be architecture specific. They are often documented in arch/*/mm/fault.c
in the kernel source. My copy of Linux/arch/i386/mm/fault.c
has the following definition for error_code:
- bit 0 == 0 means no page found, 1 means protection fault
- bit 1 == 0 means read, 1 means write
- bit 2 == 0 means kernel, 1 means user-mode
My copy of Linux/arch/x86_64/mm/fault.c
adds the following:
- bit 3 == 1 means fault was an instruction fetch
If it's a shared library
You're hosed, unfortunately; it's not possible to know where the libraries were placed in memory by the dynamic linker after-the-fact.
Well, there is still a possibility to retrieve the information, not from the binary, but from the object. But you need the base address of the object. And this information still is within the coredump, in the link_map structure.
So first you want to import the struct link_map into GDB. So lets compile a program with it with debug symbol and add it to the GDB.
link.c
#include <link.h>
toto(){struct link_map * s = 0x400;}
get_baseaddr_from_coredump.sh
#!/bin/bash
BINARY=$(which myapplication)
IsBinPIE ()
{
readelf -h $1|grep 'Type' |grep "EXEC">/dev/null || return 0
return 1
}
Hex2Decimal ()
{
export number="`echo "$1" | sed -e 's:^0[xX]::' | tr '[a-f]' '[A-F]'`"
export number=`echo "ibase=16; $number" | bc`
}
GetBinaryLength ()
{
if [ $# != 1 ]; then
echo "Error, no argument provided"
fi
IsBinPIE $1 || (echo "ET_EXEC file, need a base_address"; exit 0)
export totalsize=0
# Get PT_LOAD's size segment out of Program Header Table (ELF format)
export sizes="$(readelf -l $1 |grep LOAD |awk '{print $6}'|tr '\n' ' ')"
for size in $sizes
do Hex2Decimal "$size"; export totalsize=$(expr $number + $totalsize); export totalsize=$(expr $number + $totalsize)
done
return $totalsize
}
if [ $# = 1 ]; then
echo "Using binary $1"
IsBinPIE $1 && (echo "NOT ET_EXEC, need a base_address..."; exit 0)
BINARY=$1
fi
gcc -g3 -fPIC -shared link.c -o link.so
GOTADDR=$(readelf -S $BINARY|grep -E '\.got.plt[ \t]'|awk '{print $4}')
echo "First do the following command :"
echo file $BINARY
echo add-symbol-file ./link.so 0x0
read
echo "Now copy/paste the following into your gdb session with attached coredump"
cat <<EOF
set \$linkmapaddr = *(0x$GOTADDR + 4)
set \$mylinkmap = (struct link_map *) \$linkmapaddr
while (\$mylinkmap != 0)
if (\$mylinkmap->l_addr)
printf "add-symbol-file .%s %#.08x\n", \$mylinkmap->l_name, \$mylinkmap->l_addr
end
set \$mylinkmap = \$mylinkmap->l_next
end
it will print you the whole link_map content, within a set of GDB command.
It itself it might seems unnesseray but with the base_addr of the shared object we are about, you might get some more information out of an address by debuging directly the involved shared object in another GDB instance. Keep the first gdb to have an idee of the symbol.
NOTE : the script is rather incomplete i suspect you may add to the second parameter of add-symbol-file printed the sum with this value :
readelf -S $SO_PATH|grep -E '\.text[ \t]'|awk '{print $5}'
where $SO_PATH is the first argument of the add-symbol-file
Hope it helps
精彩评论