欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  数据库

使用Linux Strace跟踪调试Oracle程序进程

程序员文章站 2024-04-05 14:32:30
...

Oracle自身已经提供了很多这类型的工具,如oradebug、各种等待事件和跟踪方式。此外,各类型的操作系统提供出很多系统级别工具,

所谓操作系统,是应用程序与服务器硬件进行沟通的中间层。应用程序的所有操作,都是和操作系统进行沟通交互。操作系统负责将所有交互转化为设备语言,进行硬件交互。

我们在进行Oracle故障调试和内核原理工作的时候,经常需要了解后台运行的动作和细节。一些故障场景,如程序进程hang住、无法登陆等问题,就需要操作系统级别监控,检查定位问题。

Oracle自身已经提供了很多这类型的工具,如oradebug、各种等待事件和跟踪方式。此外,各类型的操作系统提供出很多系统级别工具,帮助我们进行监控。本篇主要介绍Linux环境中的strace工具典型使用方法,留待需要朋友待查。

1、环境介绍

笔者使用红帽Red Hat 6.5进行测试。

[root@XXX-Standby-Asm ~]# cat /etc/RedHat-release

Red Hat Enterprise Linux Server release 6.5 (Santiago)

[root@XXX-Standby-Asm ~]# uname -a

Linux XXX-Standby-Asm 2.6.32-431.el6.x86_64 #1 SMP Sun Nov 10 22:19:54 EST 2013 x86_64 x86_64 x86_64 GNU/Linux

对应使用Oracle版本为11.2.0.4。

[oracle@XXX-Standby-Asm ~]$ sqlplus /nolog

SQL*Plus: Release 11.2.0.4.0 Production on Tue Jun 16 16:39:21 2015

Copyright (c) 1982, 2013, Oracle. All rights reserved.

SQL> conn / as sysdba

Connected.

SQL> select * from v$version;

BANNER

--------------------------------------------------------------------------------

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

PL/SQL Release 11.2.0.4.0 - Production

CORE 11.2.0.4.0 Production

TNS for Linux: Version 11.2.0.4.0 - Production

NLSRTL Version 11.2.0.4.0 – Production

2、Strace命令信息

使用-h参数,可以调用出strace命令的语法格式和各个使用参数。

[oracle@XXX-Standby-Asm ~]$ strace -h

usage: strace [-dDffhiqrtttTvVxx] [-a column] [-e expr] ... [-o file]

[-p pid] ... [-s strsize] [-u username] [-E var=val] ...

[command [arg ...]]

or: strace -c [-D] [-e expr] ... [-O overhead] [-S sortby] [-E var=val] ...

[command [arg ...]]

-c -- count time, calls, and errors for each syscall and report summary

-f -- follow forks, -ff -- with output into separate files

-F -- attempt to follow vforks, -h -- print help message

-i -- print instruction pointer at time of syscall

-q -- suppress messages about attaching, detaching, etc.

-r -- print relative timestamp, -t -- absolute timestamp, -tt -- with usecs

-T -- print time spent in each syscall, -V -- print version

-v -- verbose mode: print unabbreviated argv, stat, termio[s], etc. args

-x -- print non-ascii strings in hex, -xx -- print all strings in hex

-a column -- alignment COLUMN for printing syscall results (default 40)

-e expr -- a qualifying expression: option=[!]all or option=[!]val1[,val2]...

options: trace, abbrev, verbose, raw, signal, read, or write

-o file -- send trace output to FILE instead of stderr

-O overhead -- set overhead for tracing syscalls to OVERHEAD usecs

-p pid -- trace process with process id PID, may be repeated

-D -- run tracer process as a detached grandchild, not as parent

-s strsize -- limit length of print strings to STRSIZE chars (default 32)

-S sortby -- sort syscall counts by: time, calls, name, nothing (default time)

-u username -- run command as username handling setuid and/or setgid

-E var=val -- put var=val in the environment for command

-E var -- remove var from the environment for command

在诸多参数中,比较常用的有如下几个:

ü -t参数:可以显示调用操作系统过程的时间。这个在诊断中很重要,如果发现某一个执行过程时间消耗大,就可以定位到是什么问题;

ü -p参数:strace除了可以从进程启动时候开始监控之外,还可以根据操作系统进程编号,进入一个运行中的进程进行debug;

ü -o参数:输出结果如果需要以文件方式保存,就通过-o参数指定文件名称;

ü -e参数:监控表达式。提供trace、signal、read和write等内容,反映监控方面项目。常用的参数是trace=file,集中在操作系统文件访问层面。

下面,通过两个典型应用场景,来演示一下strace应用。

3、启动程序监控

strace监控研究的一个重要范例是sqlplus,很多前辈使用这个工具来测试跟踪sqlplus启动过程访问的文件形式。启动strace监控sqlplus,就是在strace命令后直接输入sqlplus命令即可。

--启动命令

[oracle@localhost ~]$ strace -t -e trace=file sqlplus /nolog

14:21:14 execve("/u01/app/oracle/bin/sqlplus", ["sqlplus", "/nolog"], [/* 31 vars */]) = 0 –启动命令

14:21:14 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)

14:21:14 open("/u01/app/oracle/lib/tls/x86_64/libsqlplus.so", O_RDONLY) = -1 ENOENT (No such file or directory)

14:21:14 stat("/u01/app/oracle/lib/tls/x86_64", 0x7fff3076da50) = -1 ENOENT (No such file or directory)

14:21:14 open("/u01/app/oracle/lib/tls/libsqlplus.so", O_RDONLY) = -1 ENOENT (No such file or directory)

14:21:14 stat("/u01/app/oracle/lib/tls", 0x7fff3076da50) = -1 ENOENT (No such file or directory)

14:21:14 open("/u01/app/oracle/lib/x86_64/libsqlplus.so", O_RDONLY) = -1 ENOENT (No such file or directory)

14:21:14 stat("/u01/app/oracle/lib/x86_64", 0x7fff3076da50) = -1 ENOENT (No such file or directory)

14:21:14 open("/u01/app/oracle/lib/libsqlplus.so", O_RDONLY) = 3

14:21:14 open("/u01/app/oracle/lib/libclntsh.so.11.1", O_RDONLY) = 3

14:21:14 open("/u01/app/oracle/lib/libnnz11.so", O_RDONLY) = 3

14:21:14 open("/u01/app/oracle/lib/libdl.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)

(篇幅原因,有省略……)

14:21:14 access("/etc/tnsnav.ora", F_OK) = -1 ENOENT (No such file or directory)

14:21:14 access("/u01/app/oracle/network/admin/tnsnav.ora", F_OK) = -1 ENOENT (No such file or directory)

14:21:14 open("/u01/app/oracle/oracore/zoneinfo", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 6

14:21:14 open("/u01/app/oracle/oracore/zoneinfo/timezlrg_1.dat", O_RDONLY) = 7

14:21:14 open("/u01/app/oracle/oracore/zoneinfo/timezlrg_2.dat", O_RDONLY) = 7

(读取各种时区数据……)

14:21:14 open("/u01/app/oracle/oracore/zoneinfo/timezone_14.dat", O_RDONLY) = 7

14:21:14 open("/u01/app/oracle/oracore/zoneinfo/timezlrg_14.dat", O_RDONLY) = 6

14:21:14 open("/usr/lib64/libnuma.so", O_RDONLY) = -1 ENOENT (No such file or directory)

14:21:14 open("/usr/lib64/libnuma.so.1", O_RDONLY) = 6

14:21:14 open("/proc/self/status", O_RDONLY) = 6

14:21:14 open("/sys/devices/system/node", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 6

14:21:14 open("/sys/devices/system/node/node0/meminfo", O_RDONLY) = 7

14:21:14 open("/sys/devices/system/cpu", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 6

14:21:14 open("/proc/self/status", O_RDONLY) = 6

14:21:14 open("/proc/sys/kernel/shmmax", O_RDONLY) = 6

14:21:14 open("/proc/meminfo", O_RDONLY) = 6

14:21:14 open("/usr/lib64/libnuma.so", O_RDONLY) = -1 ENOENT (No such file or directory)

14:21:14 open("/etc/localtime", O_RDONLY) = 6

14:21:14 open("/u01/app/oracle/rdbms/mesg/ocius.msb", O_RDONLY) = 6

SQL*Plus: Release 11.2.0.4.0 Production on Fri Jun 12 14:21:14 2015

Copyright (c) 1982, 2013, Oracle. All rights reserved.

--启动之后,读取的文件。

14:21:14 stat("login.sql", 0x7fff3076c740) = -1 ENOENT (No such file or directory)

14:21:14 stat("/u01/app/oracle/sqlplus/admin/glogin.sql", {st_mode=S_IFREG|0644, st_size=368, ...}) = 0

14:21:14 access("/u01/app/oracle/sqlplus/admin/glogin.sql", F_OK) = 0