Determing the exact line in source code in a kernel crash-dump
Hi
I am running a bi-di
'iperf
' test on an interface using my driver.
Steps to repro would be to run bi-di I/O
on one interface(other interface is not active):
- Run iperf -c -P 8 -t 100000 -I 10 on DUT
- iperf -c with same params as above from peer almost immediately ( after 1st 10s of above 'iperf send' are over) With 'iperf -s -w 256K' on both
The crash is not happening as such in the driver but in the 'iperf
' context. I am going to copy-paste the stack trace:
PID: 8855 TASK: f7036550 CPU: 0 COMMAND: "iperf"
#0 [c074bed0] crash_kexec at c0443233
#1 [c074bf14] die at c04064d3
#2 [c074bf44] do_page_fault at c062134b
#3 [c074bf94] error_code (via page_fault) at c0405abb
EAX: f5888100 EBX: 00000000 ECX: 00100100 EDX: 00200200 EBP: 00000001
DS: 007b ESI: f5888000 ES: 007b EDI: cb614000
CS: 0060 EIP: c05c4e94 ERR: ffffffff EFLAGS: 00010046
#4 [c074bfc8] net_rx_action at c05c4e94
#5 [c074bfe4] __do_softirq at c042aa65
--- <soft IRQ> ---
#0 [f281ac4c] do_softirq at c04073e5
#1 [f281ac58] do_IRQ at c04074d9
#2 [开发者_如何学Gof281ac70] common_interrupt at c0405975
EAX: 39383736 EBX: f281af4c ECX: 00000428 EDX: 31303938 EBP: f378b042
DS: 007b ESI: f378b1c2 ES: 007b EDI: 09fdb448
CS: 0060 EIP: c04f1c07 ERR: ffffffba EFLAGS: 00000202
#3 [f281aca4] __copy_to_user_ll at c04f1c07
#4 [f281acb0] memcpy_toiovec at c05bfecc
#5 [f281acc4] skb_copy_datagram_iovec at c05c059b
#6 [f281acf4] tcp_rcv_established at c05ef40a
#7 [f281ad20] tcp_v4_do_rcv at c05f48c5
#8 [f281ad54] tcp_prequeue_process at c05e6bdd
#9 [f281ad5c] tcp_recvmsg at c05e90e2
#10 [f281ad9c] sock_common_recvmsg at c05bb1c4
#11 [f281adc0] sock_recvmsg at c05b8dc6
#12 [f281aea0] sys_recvfrom at c05ba6ab
#13 [f281af64] sys_recv at c05ba727
#14 [f281af80] sys_socketcall at c05bab52
#15 [f281afb8] system_call at c0404f44
EAX: ffffffda EBX: 0000000a ECX: b6ba2340 EDX: 00014268
DS: 007b ESI: 00000000 ES: 007b EDI: 09fbe630
SS: 007b ESP: b6ba2328 EBP: b6ba2378
CS: 0073 EIP: 004ad410 ERR: 00000066 EFLAGS: 00000293
crash>
the EIP
at the time of crash is net_rx_action:0xdd/19ca
. Now i have compiled the kernel-2.6.18-238 sources
( the source version of the OS on which the DUT is running) and did an 'objdump -S ./net/core/dev.o > dev_o_dmp
' on the ./net/core/dev.c
which has the definition of the net_rx_acdtion
(). Now in the 'dev_o_dmp
' file the net_rx_action()
has lots of inline definitions and hence somehow does not exactly mirror the flow in the source file. In such a scenario ,is it safe to add 0xdd to the base addr of net_rx_action (say 32FF) => 340C
.i.e would 340C
be the offending line number that is giving rise to the crash ' kernel paging request error
'
Any tips /recommendations on how to go about debugging this problem would be of great help
Unfortunately, or fortunately depending on your perspective, with high levels of optimization it is possible for the compiler to create assembly code that the debug format cannot make a reasonable C code line to assembly instruction(s) mapping. What type of cases you can run into this problem depends on the compiler, optimization level, debug symbol format, debug symbol level, and the code itself.
You have to assume that line numbers gained via this technique could be wrong. That being said, I use this technique frequently in my own kernel work and I have not had any problems yet (knocks on wood). Just remember that if you are faced with something that just makes no sense, you could have a bad line number.
精彩评论