欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

通过dmesg crash信息调试驱动代码

程序员文章站 2022-06-03 23:00:35
...

最近在给一个驱动程序添加一个功能 --> 通过给定的进程名找到对应进程的pid号,但是遇到了crash的情况,我们一起找找问题出在哪里!

首先给到dmesg中的crash信息:

[ 4534.975026] BUG: unable to handle kernel NULL pointer dereference at 0000000000000430
[ 4534.976059] IP: [<ffffffffc0747e78>] bts_write+0x1b8/0x830 [bts]
[ 4534.977065] PGD 2195a2067 PUD 219c6f067 PMD 0 
[ 4534.978066] Oops: 0000 [#3] SMP 
[ 4534.979027] Modules linked in: bts(OE) chr(OE) hid_generic usbhid hid rfcomm bnep bluetooth intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm arc4 ath9k amdkfd ath9k_common ath9k_hw amd_iommu_v2 ath radeon snd_hda_codec_idt snd_hda_codec_generic snd_hda_codec_hdmi crct10dif_pclmul snd_hda_intel crc32_pclmul snd_hda_codec mac80211 snd_hda_core aesni_intel aes_x86_64 joydev snd_hwdep hp_wmi snd_pcm sparse_keymap input_leds lrw serio_raw gf128mul glue_helper ppdev ablk_helper lp parport_pc snd_seq_midi cfg80211 snd_seq_midi_event snd_rawmidi snd_seq ttm cryptd snd_seq_device snd_timer mei_me drm_kms_helper mei drm snd i2c_algo_bit soundcore hp_accel lpc_ich lis3lv02d tpm_infineon input_polldev parport video 8250_fintek hp_wireless mac_hid wmi psmouse ahci libahci firewire_ohci sdhci_pci firewire_core e1000e sdhci crc_itu_t ptp pps_core [last unloaded: bts]
[ 4534.985521] CPU: 0 PID: 3462 Comm: ops_main Tainted: G      D W  OE   4.2.0-42-generic #49~14.04.1-Ubuntu
[ 4534.986561] Hardware name: Hewlett-Packard HP ProBook 6470b/179C, BIOS 68ICE Ver. F.45 10/07/2013
[ 4534.987607] task: ffff8802203a5280 ti: ffff880220298000 task.ti: ffff880220298000
[ 4534.988636] RIP: 0010:[<ffffffffc0747e78>]  [<ffffffffc0747e78>] bts_write+0x1b8/0x830 [bts]
[ 4534.989674] RSP: 0018:ffff88022029bd38  EFLAGS: 00010246
[ 4534.990663] RAX: ffffffff81c15840 RBX: 0000000000000006 RCX: 0000000000000002
[ 4534.991635] RDX: 0000000000000002 RSI: ffff88022029bd51 RDI: ffff8802203a5859
[ 4534.992587] RBP: ffff88022029be98 R08: ffffffffc074b060 R09: 315f6e65706f5f34
[ 4534.993573] R10: 00007fd6ff1ba6a0 R11: 0000000000000246 R12: 0000000000000000
[ 4534.994497] R13: ffffffff81c15840 R14: ffff8802203a5858 R15: ffff8800b8e7b000
[ 4534.995411] FS:  00007fd6ff3cb740(0000) GS:ffff88023ec00000(0000) knlGS:0000000000000000
[ 4534.996324] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4534.997232] CR2: 0000000000000430 CR3: 000000022b479000 CR4: 00000000001406f0
[ 4534.998334] Stack:
[ 4534.999528]  ffff88022029bd68 ffffffff811f833e ffff88022029bd68 ffffffffc074a201
[ 4535.000466]  ffffffff81c15840 7700007472617473 6174732065746972 6563617274207472
[ 4535.001395]  646e616d6d6f6320 253a726f72726520 7320737462000a73 6f72726520706f74
[ 4535.002401] Call Trace:
[ 4535.003364]  [<ffffffff811f833e>] ? terminate_walk+0x6e/0xe0
[ 4535.004328]  [<ffffffff811ede38>] __vfs_write+0x18/0x40
[ 4535.005283]  [<ffffffff811ee479>] vfs_write+0xa9/0x190
[ 4535.006244]  [<ffffffff810dbefd>] ? call_rcu_sched+0x1d/0x20
[ 4535.007182]  [<ffffffff811ef1e6>] SyS_write+0x46/0xa0
[ 4535.008111]  [<ffffffff817c36f2>] entry_SYSCALL_64_fastpath+0x16/0x75
[ 4535.009038] Code: 00 00 49 8b 84 24 40 03 00 00 48 89 85 c0 fe ff ff 4c 8b ad c0 fe ff ff 4d 8d a5 c0 fc ff ff 49 81 fc 00 55 c1 81 75 bc 45 31 e4 <45> 8b a4 24 30 04 00 00 48 c7 c7 1d a2 74 c0 31 c0 44 89 e6 e8 
[ 4535.011028] RIP  [<ffffffffc0747e78>] bts_write+0x1b8/0x830 [bts]
[ 4535.011968]  RSP <ffff88022029bd38>
[ 4535.012902] CR2: 0000000000000430
[ 4535.013850] ---[ end trace bd7d268405d6447e ]---

从dmesg Log中可以看到 BUG: unable to handle kernel NULL pointer dereference at 0000000000000430 从字面意思来看遇到了一个空指针类型的错误,还有第二个信息是十分重要的,bts_write+0x1b8/0x830 [bts] ,从这个信息我们可以看出出错的函数以及偏移,出错的函数在 bts_write ,相对偏移为0x1b8;

针对这个信息,第一件要做的事情就是把驱动编译过程文件xxx.o进行反汇编,现在Linux 自带的objdump就可以了;

//要是不知道具体参数 objdump -h就知道了
[email protected]:/mnt/hgfs/share/write_code/set_task_cpu$ objdump
Usage: objdump <option(s)> <file(s)>
 Display information from object <file(s)>.
 At least one of the following switches must be given:
  -a, --archive-headers    Display archive header information
  -f, --file-headers       Display the contents of the overall file header
  -p, --private-headers    Display object format specific file header contents
  -P, --private=OPT,OPT... Display object format specific contents
  -h, --[section-]headers  Display the contents of the section headers
  -x, --all-headers        Display the contents of all headers
  -d, --disassemble        Display assembler contents of executable sections
  -D, --disassemble-all    Display assembler contents of all sections
  -S, --source             Intermix source code with disassembly
  -s, --full-contents      Display the full contents of all sections requested
  -g, --debugging          Display debug information in object file
  -e, --debugging-tags     Display debug information using ctags style
  -G, --stabs              Display (in raw form) any STABS info in the file
  -W[lLiaprmfFsoRt] or
  --dwarf[=rawline,=decodedline,=info,=abbrev,=pubnames,=aranges,=macro,=frames,
          =frames-interp,=str,=loc,=Ranges,=pubtypes,
          =gdb_index,=trace_info,=trace_abbrev,=trace_aranges,
          =addr,=cu_index]
                           Display DWARF info in the file
  -t, --syms               Display the contents of the symbol table(s)
  -T, --dynamic-syms       Display the contents of the dynamic symbol table
  -r, --reloc              Display the relocation entries in the file
  -R, --dynamic-reloc      Display the dynamic relocation entries in the file
  @<file>                  Read options from <file>
  -v, --version            Display this program's version number
  -i, --info               List object formats and architectures supported
  -H, --help               Display this information
//这里使用-D参数把所有sections反汇编,并重定向到文件方便后续查看
[email protected]:~/Desktop/per_bts/drv$ objdump bts.o -D > err.txt

下一步就是找到出错函数的基址,vim打开搜索bts_write就可以找到:

0000000000000cc0 <bts_write>:
     cc0:       e8 00 00 00 00          callq  cc5 <bts_write+0x5>
     cc5:       55                      push   %rbp
     cc6:       b9 20 00 00 00          mov    $0x20,%ecx
     ccb:       48 89 e5                mov    %rsp,%rbp
     cce:       41 57                   push   %r15
     cd0:       41 56                   push   %r14
     cd2:       45 31 f6                xor    %r14d,%r14d
     cd5:       41 55                   push   %r13
     cd7:       4c 8d ad c8 fe ff ff    lea    -0x138(%rbp),%r13
     cde:       41 54                   push   %r12
     ce0:       53                      push   %rbx
     ce1:       48 89 d3                mov    %rdx,%rbx
     ce4:       ba fe 00 00 00          mov    $0xfe,%edx
     ce9:       48 81 ec 38 01 00 00    sub    $0x138,%rsp
     cf0:       4c 8b bf d0 00 00 00    mov    0xd0(%rdi),%r15
     cf7:       48 8d bd c8 fe ff ff    lea    -0x138(%rbp),%rdi
     cfe:       65 48 8b 04 25 28 00    mov    %gs:0x28,%rax
     d05:       00 00
     d07:       48 89 45 c8             mov    %rax,-0x38(%rbp)

从以上信息可以看出,函数的基址为0xcc0,想要找到具体的出错行,还需加上偏移0x1b8 --> 0xcc0+0x1b8=0xe78;
下一步就是如何定位出错代码行,这里就要用到另外一个工具,addr2line;

[email protected]:~/Desktop/per_bts/drv$ addr2line -h
Usage: addr2line [option(s)] [addr(s)]
 Convert addresses into line number/file name pairs.
 If no addresses are specified on the command line, they will be read from stdin
 The options are:
  @<file>                Read options from <file>
  -a --addresses         Show addresses
  -b --target=<bfdname>  Set the binary file format
  -e --exe=<executable>  Set the input file name (default is a.out)
  -i --inlines           Unwind inlined functions
  -j --section=<name>    Read section-relative offsets instead of addresses
  -p --pretty-print      Make the output easier to read for humans
  -s --basenames         Strip directory names
  -f --functions         Show function names
  -C --demangle[=style]  Demangle function names
  -h --help              Display this information
  -v --version           Display the program's version

[email protected]:~/Desktop/per_bts/drv$ addr2line -C -f -e bts.o e78
find_pid
/home/curtis/Desktop/per_bts/drv/bts_driver.c:108

这里成功找到出错行函数以及出错行号,出错函数为find_pid,行号为108,在代码中找到对应函数;

static int find_pid(char *string_name)
{
        unsigned int pid;
        char *find_name = &string_name;  --> char *find_name = string_name;
        struct task_struct* task;

        task = find_task(find_name);
        pid = task->pid;   <--108printk("Have find pid is %d\n",pid);
        return pid;
}

仔细分析发现是因为find_task函数没有返回进程的task_struct结构体,导致出现空指针,根本原因是前后代码改动较大,忽略了对find_name的初始化出错了,传入的形参是字符串指针,改完之后,完美解决问题;