欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

用oradebugshort_stack及strace-p分析oracle进程是否dead或出现故障

程序员文章站 2022-05-25 20:09:56
1,可以采用oradebug或者strace -p跟踪后台或前台进程是否dead或hang住 2,如果进程出现故障,必会在对应的trc文件写入最新信息,基于此可以获取非常重要的信息进一步分析与诊断...

1,可以采用oradebug或者strace -p跟踪后台或前台进程是否dead或hang住

2,如果进程出现故障,必会在对应的trc文件写入最新信息,基于此可以获取非常重要的信息进一步分析与诊断

日志文件在background_dump_dest

3,采用 ll -lhrt *lgwr*|tail -10f 获取最新的进程的trc文件

4,而且出现故障时,多半会在alert日志记录相关信息,此是排除故障重要且首要的方法及思路

5,oradebug setospid ospid

oradebug short_stack

会显示进程的堆栈信息,注意:可以间隔多次运行,如果多次显示的堆栈信息一致,可以肯定此进程肯定是dead或出现故障了

6,可以用strace -p ospid跟踪分析,

---hang或故障时的类似信息如下

semtimedop(9273344, 0x7fffe66199d0, 1, {1, 0}) = -1 eagain (resource temporarily unavailable)

---正常时的类似信息如下

times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440015944

semtimedop(9273344, 0x7fffe661b1f0, 1, {1, 800000000}) = -1 eagain (resource temporarily unavailable)

getrusage(rusage_self, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0

getrusage(rusage_self, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0

times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016124

times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016124

times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016124

times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016124

semtimedop(9273344, 0x7fffe661b1f0, 1, {3, 0}) = -1 eagain (resource temporarily unavailable)

getrusage(rusage_self, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0

getrusage(rusage_self, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0

times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016424

times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016424

times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016424

times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016424

semtimedop(9273344, 0x7fffe661b1f0, 1, {3, 0}) = -1 eagain (resource temporarily unavailable)

getrusage(rusage_self, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0

getrusage(rusage_self, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0

times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016725

times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016725

times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016725

times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016725

semtimedop(9273344, 0x7fffe661b1f0, 1, {3, 0}) = -1 eagain (resource temporarily unavailable)

getrusage(rusage_self, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0

getrusage(rusage_self, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0

times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440017025

open("/proc/4385/stat", o_rdonly) = 35

read(35, "4385 (oracle) s 1 4385 4385 0 -1"..., 999) = 225

说白了,就是看信息有没有变化,有变化就说明进程是正常的,否则就说明是不正常的

测试

sql> select * from v$version where rownum=1;

banner

--------------------------------------------------------------------------------

oracle database 11g enterprise edition release 11.2.0.1.0 - 64bit production

查看后台进程

sql> select pid,spid,pname,username from v$process order by 1;

pid spid pname username

---------- ---------- ---------- ------------------------------

1

2 4385 pmon oracle

3 4387 vktm oracle

4 4391 gen0 oracle

5 4393 diag oracle

6 4395 dbrm oracle

7 4397 psp0 oracle

8 4399 dia0 oracle

9 4401 mman oracle

10 4403 dbw0 oracle

11 4405 lgwr oracle

pid spid pname username

---------- ---------- ---------- ------------------------------

12 4407 ckpt oracle

13 4409 smon oracle

14 4411 reco oracle

15 4413 mmon oracle

16 4415 mmnl oracle

17 4417 d000 oracle

18 4419 s000 oracle

19 4652 smco oracle

20 5266 w000 oracle

21 4936 oracle

27 4468 arc0 oracle

pid spid pname username

---------- ---------- ---------- ------------------------------

28 4481 arc1 oracle

29 4486 arc2 oracle

30 4489 arc3 oracle

31 4496 qmnc oracle

32 4549 q000 oracle

33 4551 q001 oracle

34 4568 oracle

29 rows selected.

sql>

---查看trc文件目录

[oracle@seconary trace]$ ll -lhrt *lgwr*|tail -10f

-rw-r----- 1 oracle oinstall 213 dec 14 19:05 guowang_lgwr_5297.trm

-rw-r----- 1 oracle oinstall 2.4k dec 14 19:05 guowang_lgwr_5297.trc

-rw-r----- 1 oracle oinstall 2.3k dec 15 01:05 guowang_lgwr_22295.trm

-rw-r----- 1 oracle oinstall 27k dec 15 01:05 guowang_lgwr_22295.trc

-rw-r----- 1 oracle oinstall 63 dec 15 02:18 guowang_lgwr_31280.trm

-rw-r----- 1 oracle oinstall 903 dec 15 02:18 guowang_lgwr_31280.trc

-rw-r----- 1 oracle oinstall 63 dec 15 02:44 guowang_lgwr_32077.trm

-rw-r----- 1 oracle oinstall 906 dec 15 02:44 guowang_lgwr_32077.trc

-rw-r----- 1 oracle oinstall 62 dec 15 03:27 guowang_lgwr_1032.trm

-rw-r----- 1 oracle oinstall 887 dec 15 03:27 guowang_lgwr_1032.trc

---hang lgwr

sql> oradebug setospid 4405

oracle pid: 11, unix process pid: 4405, image: oracle@seconary (lgwr)

sql> oradebug suspend

statement processed.

--alert同步记录上述信息

tue dec 15 04:46:15 2015

unix process pid: 4405, image: oracle@seconary (lgwr) flash frozen [ command #1 ]

---trc目录同步记录上述信息

[oracle@seconary trace]$ ll -lhrt *lgwr*|tail -10f

-rw-r----- 1 oracle oinstall 2.3k dec 15 01:05 guowang_lgwr_22295.trm

-rw-r----- 1 oracle oinstall 27k dec 15 01:05 guowang_lgwr_22295.trc

-rw-r----- 1 oracle oinstall 63 dec 15 02:18 guowang_lgwr_31280.trm

-rw-r----- 1 oracle oinstall 903 dec 15 02:18 guowang_lgwr_31280.trc

-rw-r----- 1 oracle oinstall 63 dec 15 02:44 guowang_lgwr_32077.trm

-rw-r----- 1 oracle oinstall 906 dec 15 02:44 guowang_lgwr_32077.trc

-rw-r----- 1 oracle oinstall 62 dec 15 03:27 guowang_lgwr_1032.trm

-rw-r----- 1 oracle oinstall 887 dec 15 03:27 guowang_lgwr_1032.trc

-rw-r----- 1 oracle oinstall 63 dec 15 04:46 guowang_lgwr_4405.trm

-rw-r----- 1 oracle oinstall 896 dec 15 04:46 guowang_lgwr_4405.trc

[oracle@seconary trace]$