欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

jvm crash的崩溃日志详细分析及注意点

程序员文章站 2024-03-03 20:00:16
生成 1. 生成error 文件的路径:你可以通过参数设置-xx:errorfile=/path/hs_error%p.log, 默认是在java运行的当前目录 [def...

生成

1. 生成error 文件的路径:你可以通过参数设置-xx:errorfile=/path/hs_error%p.log, 默认是在java运行的当前目录 [default: ./hs_err_pid%p.log]

2. 参数-xx:onerror  可以在crash退出的时候执行命令,格式是-xx:onerror=“string”,  <string> 可以是命令的集合,用分号做分隔符, 可以用"%p"来取到当前进程的id.

例如:

// -xx:onerror="pmap %p"  // show memory map
// -xx:onerror="gcore %p; dbx - %p" // dump core and launch debugger

在linux中系统会fork出一个子进程去执行shell的命令,因为是用fork可能会内存不够的情况,注意修改你的 /proc/sys/vm/overcommit_memory 参数,不清楚为什么这里不使用vfork

3. -xx:+showmessageboxonerror 参数,当jvm crash的时候在linux里会启动gdb 去分析和调式,适合在测试环境中使用。

什么情况下不会生成error文件

linux 内核在发生oom的时候会强制kill一些进程, 可以在/var/logs/messages中查找

error crash 文件的几个重要部分

a.  错误信息概要

# a fatal error has been detected by the java runtime environment: 
# 
# sigsegv (0xb) at pc=0x0000000000043566, pid=32046, tid=1121192256 
# 
# jre version: 6.0_17-b04 
# java vm: java hotspot(tm) 64-bit server vm (14.3-b01 mixed mode linux-amd64 ) 
# problematic frame: 
# c 0x0000000000043566 
# 
# if you would like to submit a bug report, please visit: 
# http://java.sun.com/webapps/bugreport/crash.jsp 
# the crash happened outside the java virtual machine in native code. 
# see problematic frame for where to report the bug. 

sigsegv 错误的信号类型

pc 就是ip/pc寄存器值也就是执行指令的代码地址

pid 就是进程id

# problematic frame:
# v  [libjvm.so+0x593045]

就是导致问题的动态链接库函数的地址

pc 和 +0x593045 指的是同一个地址,只是一个是动态的偏移地址,一个是运行的虚拟地址

b.信号信息

java中在linux 中注册的信号处理函数,中间有2个参数info, ucvoid

static void crash_handler(int sig, siginfo_t* info, void* ucvoid) { 
 // unmask current signal 
 sigset_t newset; 
 sigemptyset(&newset); 
 sigaddset(&newset, sig); 
 sigprocmask(sig_unblock, &newset, null); 
 
 vmerror err(null, sig, null, info, ucvoid); 
 err.report_and_die(); 
} 

在crash report中的信号错误提示

siginfo:si_signo=sigsegv: si_errno=0, si_code=1 (segv_maperr), si_addr=0x0000000000043566 

信号的详细信息和si_addr 出错误的内存,都保存在siginfo_t的结构体中,也就是信号注册函数crash_handler里的参数info,内核会保存导致错误的内存地址在用户空间的信号结构体中siginfo_t,这样在进程在注册的信号处理函数中可以取得导致错误的地址。

c.寄存器信息

registers: 
rax=0x00002aacb5ae5de2, rbx=0x00002aaaaf46aa48, rcx=0x0000000000000219, rdx=0x00002aaaaf46b920 
rsp=0x0000000042d3f968, rbp=0x0000000042d3f9c8, rsi=0x0000000042d3f9e8, rdi=0x0000000045aef9b8 
r8 =0x0000000000000f80, r9 =0x00002aaab3d30ce8, r10=0x00002aaaab138ea1, r11=0x00002b017ae65110 
r12=0x0000000042d3f6f0, r13=0x00002aaaaf46aa48, r14=0x0000000042d3f9e8, r15=0x0000000045aef800 
rip=0x0000000000043566, efl=0x0000000000010202, csgsfs=0x0000000000000033, err=0x0000000000000014 
 trapno=0x000000000000000e 

寄存器的信息就保存在b部分的信号处理函数参数 (ucontext_t*)usvoid中

在x86架构下:

void os::print_context(outputstream *st, void *context) { 
 if (context == null) return; 
 
 ucontext_t *uc = (ucontext_t*)context; 
 st->print_cr("registers:"); 
#ifdef amd64 
 st->print( "rax=" intptr_format, uc->uc_mcontext.gregs[reg_rax]); 
 st->print(", rbx=" intptr_format, uc->uc_mcontext.gregs[reg_rbx]); 
 st->print(", rcx=" intptr_format, uc->uc_mcontext.gregs[reg_rcx]); 
 st->print(", rdx=" intptr_format, uc->uc_mcontext.gregs[reg_rdx]); 
 st->cr(); 
 st->print( "rsp=" intptr_format, uc->uc_mcontext.gregs[reg_rsp]); 
 st->print(", rbp=" intptr_format, uc->uc_mcontext.gregs[reg_rbp]); 
 st->print(", rsi=" intptr_format, uc->uc_mcontext.gregs[reg_rsi]); 
 st->print(", rdi=" intptr_format, uc->uc_mcontext.gregs[reg_rdi]); 
 st->cr(); 
 st->print( "r8 =" intptr_format, uc->uc_mcontext.gregs[reg_r8]); 
 st->print(", r9 =" intptr_format, uc->uc_mcontext.gregs[reg_r9]); 
 st->print(", r10=" intptr_format, uc->uc_mcontext.gregs[reg_r10]); 
 st->print(", r11=" intptr_format, uc->uc_mcontext.gregs[reg_r11]); 
 st->cr(); 
 st->print( "r12=" intptr_format, uc->uc_mcontext.gregs[reg_r12]); 
 st->print(", r13=" intptr_format, uc->uc_mcontext.gregs[reg_r13]); 
 st->print(", r14=" intptr_format, uc->uc_mcontext.gregs[reg_r14]); 
 st->print(", r15=" intptr_format, uc->uc_mcontext.gregs[reg_r15]); 
 st->cr(); 
 st->print( "rip=" intptr_format, uc->uc_mcontext.gregs[reg_rip]); 
 st->print(", efl=" intptr_format, uc->uc_mcontext.gregs[reg_efl]); 
 st->print(", csgsfs=" intptr_format, uc->uc_mcontext.gregs[reg_csgsfs]); 
 st->print(", err=" intptr_format, uc->uc_mcontext.gregs[reg_err]); 
 st->cr(); 
 st->print(" trapno=" intptr_format, uc->uc_mcontext.gregs[reg_trapno]); 
#else 
 st->print( "eax=" intptr_format, uc->uc_mcontext.gregs[reg_eax]); 
 st->print(", ebx=" intptr_format, uc->uc_mcontext.gregs[reg_ebx]); 
 st->print(", ecx=" intptr_format, uc->uc_mcontext.gregs[reg_ecx]); 
 st->print(", edx=" intptr_format, uc->uc_mcontext.gregs[reg_edx]); 
 st->cr(); 
 st->print( "esp=" intptr_format, uc->uc_mcontext.gregs[reg_uesp]); 
 st->print(", ebp=" intptr_format, uc->uc_mcontext.gregs[reg_ebp]); 
 st->print(", esi=" intptr_format, uc->uc_mcontext.gregs[reg_esi]); 
 st->print(", edi=" intptr_format, uc->uc_mcontext.gregs[reg_edi]); 
 st->cr(); 
 st->print( "eip=" intptr_format, uc->uc_mcontext.gregs[reg_eip]); 
 st->print(", cr2=" intptr_format, uc->uc_mcontext.cr2); 
 st->print(", eflags=" intptr_format, uc->uc_mcontext.gregs[reg_efl]); 
#endif // amd64 
 st->cr(); 
 st->cr(); 
 
 intptr_t *sp = (intptr_t *)os::linux::ucontext_get_sp(uc); 
 st->print_cr("top of stack: (sp=" ptr_format ")", sp); 
 print_hex_dump(st, (address)sp, (address)(sp + 8*sizeof(intptr_t)), sizeof(intptr_t)); 
 st->cr(); 
 
 // note: it may be unsafe to inspect memory near pc. for example, pc may 
 // point to garbage if entry point in an nmethod is corrupted. leave 
 // this at the end, and hope for the best. 
 address pc = os::linux::ucontext_get_pc(uc); 
 st->print_cr("instructions: (pc=" ptr_format ")", pc); 
 print_hex_dump(st, pc - 16, pc + 16, sizeof(char)); 
} 

寄存器的信息在分析出错的时候是非常重要的

打印出执行附近的部分机器码

instructions: (pc=0x00007f48f14ef51a) 
0x00007f48f14ef4fa: 90 90 55 48 89 e5 48 81 ec 98 9f 00 00 48 89 bd 
0x00007f48f14ef50a: f8 5f ff ff 48 89 b5 f0 5f ff ff b8 00 00 00 00 
0x00007f48f14ef51a: c7 00 01 00 00 00 c6 85 00 60 ff ff ff c9 c3 90 
0x00007f48f14ef52a: 90 90 90 90 90 90 55 48 89 e5 53 48 8d 1d 94 00 

在instruction 部分中会打印出部分的机器码
格式是

地址:机器码 

第一种使用udis库里带的udcli工具来反汇编

命令:

echo '90 90 55 48 89 e5 48 81 ec 98 9f 00 00 48 89 bd' | udcli -intel -x -64 -o 0x00007f48f14ef4fa 

显示出对应的汇编

第二种可以用

objectdump -d -c libjvm.so >> jvmsodisass.dump  

查找偏移地址  0x593045, 就是当时的执行的汇编,然后结合上下文,源码推测出问题的语句。

d.寄存器对应的内存的值

rax=0x0000000000000000 is an unknown value 
rbx=0x000000041a07d1e8 is an oop 
{method} 
 - klass: {other class} 
rcx=0x0000000000000000 is an unknown value 
rdx=0x0000000040111800 is a thread 
rsp=0x0000000041261b88 is pointing into the stack for thread: 0x0000000040111800 
rbp=0x000000004126bb20 is pointing into the stack for thread: 0x0000000040111800 
rsi=0x000000004126bb80 is pointing into the stack for thread: 0x0000000040111800 
rdi=0x00000000401119d0 is an unknown value 
r8 =0x0000000040111c40 is an unknown value 
r9 =0x00007f48fcc8b550: <offset 0xa85550> in /usr/java/jdk1.6.0_30/jre/lib/amd64/server/libjvm.so at 0x00007f48fc206000 
r10=0x00007f48f8ca7d41 is an interpreter codelet 
method entry point (kind = native) [0x00007f48f8ca7ae0, 0x00007f48f8ca8320] 2112 bytes 
r11=0x00007f48fc98f270: <offset 0x789270> in /usr/java/jdk1.6.0_30/jre/lib/amd64/server/libjvm.so at 0x00007f48fc206000 
r12=0x0000000000000000 is an unknown value 
r13=0x000000041a07d1e8 is an oop 
{method} 
 - klass: {other class} 
r14=0x000000004126bb88 is pointing into the stack for thread: 0x0000000040111800 
r15=0x0000000040111800 is a thread 

jvm 会通过寄存器的值对找对应的对象,也是一个比较好的参考

e. 其他的信息

error 里面还有一些线程信息,还有当时内存映像信息,这些都可以作为分析的部分参考

crash 报告可以大概的反应出一个当时的情况,特别是在没有core dump的时候,是比较有助于帮助分析的,但如果有core dump的话,最终还是core dump能快速准确的发现问题原因。

以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,同时也希望多多支持!