通过dmesg crash信息调试驱动代码
程序员文章站
2022-06-03 23:00:35
...
最近在给一个驱动程序添加一个功能 --> 通过给定的进程名找到对应进程的pid号,但是遇到了crash的情况,我们一起找找问题出在哪里!
首先给到dmesg中的crash信息:
[ 4534.975026] BUG: unable to handle kernel NULL pointer dereference at 0000000000000430
[ 4534.976059] IP: [<ffffffffc0747e78>] bts_write+0x1b8/0x830 [bts]
[ 4534.977065] PGD 2195a2067 PUD 219c6f067 PMD 0
[ 4534.978066] Oops: 0000 [#3] SMP
[ 4534.979027] Modules linked in: bts(OE) chr(OE) hid_generic usbhid hid rfcomm bnep bluetooth intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm arc4 ath9k amdkfd ath9k_common ath9k_hw amd_iommu_v2 ath radeon snd_hda_codec_idt snd_hda_codec_generic snd_hda_codec_hdmi crct10dif_pclmul snd_hda_intel crc32_pclmul snd_hda_codec mac80211 snd_hda_core aesni_intel aes_x86_64 joydev snd_hwdep hp_wmi snd_pcm sparse_keymap input_leds lrw serio_raw gf128mul glue_helper ppdev ablk_helper lp parport_pc snd_seq_midi cfg80211 snd_seq_midi_event snd_rawmidi snd_seq ttm cryptd snd_seq_device snd_timer mei_me drm_kms_helper mei drm snd i2c_algo_bit soundcore hp_accel lpc_ich lis3lv02d tpm_infineon input_polldev parport video 8250_fintek hp_wireless mac_hid wmi psmouse ahci libahci firewire_ohci sdhci_pci firewire_core e1000e sdhci crc_itu_t ptp pps_core [last unloaded: bts]
[ 4534.985521] CPU: 0 PID: 3462 Comm: ops_main Tainted: G D W OE 4.2.0-42-generic #49~14.04.1-Ubuntu
[ 4534.986561] Hardware name: Hewlett-Packard HP ProBook 6470b/179C, BIOS 68ICE Ver. F.45 10/07/2013
[ 4534.987607] task: ffff8802203a5280 ti: ffff880220298000 task.ti: ffff880220298000
[ 4534.988636] RIP: 0010:[<ffffffffc0747e78>] [<ffffffffc0747e78>] bts_write+0x1b8/0x830 [bts]
[ 4534.989674] RSP: 0018:ffff88022029bd38 EFLAGS: 00010246
[ 4534.990663] RAX: ffffffff81c15840 RBX: 0000000000000006 RCX: 0000000000000002
[ 4534.991635] RDX: 0000000000000002 RSI: ffff88022029bd51 RDI: ffff8802203a5859
[ 4534.992587] RBP: ffff88022029be98 R08: ffffffffc074b060 R09: 315f6e65706f5f34
[ 4534.993573] R10: 00007fd6ff1ba6a0 R11: 0000000000000246 R12: 0000000000000000
[ 4534.994497] R13: ffffffff81c15840 R14: ffff8802203a5858 R15: ffff8800b8e7b000
[ 4534.995411] FS: 00007fd6ff3cb740(0000) GS:ffff88023ec00000(0000) knlGS:0000000000000000
[ 4534.996324] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4534.997232] CR2: 0000000000000430 CR3: 000000022b479000 CR4: 00000000001406f0
[ 4534.998334] Stack:
[ 4534.999528] ffff88022029bd68 ffffffff811f833e ffff88022029bd68 ffffffffc074a201
[ 4535.000466] ffffffff81c15840 7700007472617473 6174732065746972 6563617274207472
[ 4535.001395] 646e616d6d6f6320 253a726f72726520 7320737462000a73 6f72726520706f74
[ 4535.002401] Call Trace:
[ 4535.003364] [<ffffffff811f833e>] ? terminate_walk+0x6e/0xe0
[ 4535.004328] [<ffffffff811ede38>] __vfs_write+0x18/0x40
[ 4535.005283] [<ffffffff811ee479>] vfs_write+0xa9/0x190
[ 4535.006244] [<ffffffff810dbefd>] ? call_rcu_sched+0x1d/0x20
[ 4535.007182] [<ffffffff811ef1e6>] SyS_write+0x46/0xa0
[ 4535.008111] [<ffffffff817c36f2>] entry_SYSCALL_64_fastpath+0x16/0x75
[ 4535.009038] Code: 00 00 49 8b 84 24 40 03 00 00 48 89 85 c0 fe ff ff 4c 8b ad c0 fe ff ff 4d 8d a5 c0 fc ff ff 49 81 fc 00 55 c1 81 75 bc 45 31 e4 <45> 8b a4 24 30 04 00 00 48 c7 c7 1d a2 74 c0 31 c0 44 89 e6 e8
[ 4535.011028] RIP [<ffffffffc0747e78>] bts_write+0x1b8/0x830 [bts]
[ 4535.011968] RSP <ffff88022029bd38>
[ 4535.012902] CR2: 0000000000000430
[ 4535.013850] ---[ end trace bd7d268405d6447e ]---
从dmesg Log中可以看到 BUG: unable to handle kernel NULL pointer dereference at 0000000000000430 从字面意思来看遇到了一个空指针类型的错误,还有第二个信息是十分重要的,bts_write+0x1b8/0x830 [bts] ,从这个信息我们可以看出出错的函数以及偏移,出错的函数在 bts_write ,相对偏移为0x1b8;
针对这个信息,第一件要做的事情就是把驱动编译过程文件xxx.o进行反汇编,现在Linux 自带的objdump就可以了;
//要是不知道具体参数 objdump -h就知道了
[email protected]:/mnt/hgfs/share/write_code/set_task_cpu$ objdump
Usage: objdump <option(s)> <file(s)>
Display information from object <file(s)>.
At least one of the following switches must be given:
-a, --archive-headers Display archive header information
-f, --file-headers Display the contents of the overall file header
-p, --private-headers Display object format specific file header contents
-P, --private=OPT,OPT... Display object format specific contents
-h, --[section-]headers Display the contents of the section headers
-x, --all-headers Display the contents of all headers
-d, --disassemble Display assembler contents of executable sections
-D, --disassemble-all Display assembler contents of all sections
-S, --source Intermix source code with disassembly
-s, --full-contents Display the full contents of all sections requested
-g, --debugging Display debug information in object file
-e, --debugging-tags Display debug information using ctags style
-G, --stabs Display (in raw form) any STABS info in the file
-W[lLiaprmfFsoRt] or
--dwarf[=rawline,=decodedline,=info,=abbrev,=pubnames,=aranges,=macro,=frames,
=frames-interp,=str,=loc,=Ranges,=pubtypes,
=gdb_index,=trace_info,=trace_abbrev,=trace_aranges,
=addr,=cu_index]
Display DWARF info in the file
-t, --syms Display the contents of the symbol table(s)
-T, --dynamic-syms Display the contents of the dynamic symbol table
-r, --reloc Display the relocation entries in the file
-R, --dynamic-reloc Display the dynamic relocation entries in the file
@<file> Read options from <file>
-v, --version Display this program's version number
-i, --info List object formats and architectures supported
-H, --help Display this information
//这里使用-D参数把所有sections反汇编,并重定向到文件方便后续查看
[email protected]:~/Desktop/per_bts/drv$ objdump bts.o -D > err.txt
下一步就是找到出错函数的基址,vim打开搜索bts_write就可以找到:
0000000000000cc0 <bts_write>:
cc0: e8 00 00 00 00 callq cc5 <bts_write+0x5>
cc5: 55 push %rbp
cc6: b9 20 00 00 00 mov $0x20,%ecx
ccb: 48 89 e5 mov %rsp,%rbp
cce: 41 57 push %r15
cd0: 41 56 push %r14
cd2: 45 31 f6 xor %r14d,%r14d
cd5: 41 55 push %r13
cd7: 4c 8d ad c8 fe ff ff lea -0x138(%rbp),%r13
cde: 41 54 push %r12
ce0: 53 push %rbx
ce1: 48 89 d3 mov %rdx,%rbx
ce4: ba fe 00 00 00 mov $0xfe,%edx
ce9: 48 81 ec 38 01 00 00 sub $0x138,%rsp
cf0: 4c 8b bf d0 00 00 00 mov 0xd0(%rdi),%r15
cf7: 48 8d bd c8 fe ff ff lea -0x138(%rbp),%rdi
cfe: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax
d05: 00 00
d07: 48 89 45 c8 mov %rax,-0x38(%rbp)
从以上信息可以看出,函数的基址为0xcc0,想要找到具体的出错行,还需加上偏移0x1b8 --> 0xcc0+0x1b8=0xe78;
下一步就是如何定位出错代码行,这里就要用到另外一个工具,addr2line;
[email protected]:~/Desktop/per_bts/drv$ addr2line -h
Usage: addr2line [option(s)] [addr(s)]
Convert addresses into line number/file name pairs.
If no addresses are specified on the command line, they will be read from stdin
The options are:
@<file> Read options from <file>
-a --addresses Show addresses
-b --target=<bfdname> Set the binary file format
-e --exe=<executable> Set the input file name (default is a.out)
-i --inlines Unwind inlined functions
-j --section=<name> Read section-relative offsets instead of addresses
-p --pretty-print Make the output easier to read for humans
-s --basenames Strip directory names
-f --functions Show function names
-C --demangle[=style] Demangle function names
-h --help Display this information
-v --version Display the program's version
[email protected]:~/Desktop/per_bts/drv$ addr2line -C -f -e bts.o e78
find_pid
/home/curtis/Desktop/per_bts/drv/bts_driver.c:108
这里成功找到出错行函数以及出错行号,出错函数为find_pid,行号为108,在代码中找到对应函数;
static int find_pid(char *string_name)
{
unsigned int pid;
char *find_name = &string_name; --> char *find_name = string_name;
struct task_struct* task;
task = find_task(find_name);
pid = task->pid; <--第108行
printk("Have find pid is %d\n",pid);
return pid;
}
仔细分析发现是因为find_task函数没有返回进程的task_struct结构体,导致出现空指针,根本原因是前后代码改动较大,忽略了对find_name的初始化出错了,传入的形参是字符串指针,改完之后,完美解决问题;
上一篇: gem5cache参数