给大家分享一个oracle进程异常占用内存,险些造成分区宕机的案例
http://www.loveunix.net/thread-126455-1-1.html 昨晚收到手机短信,有个数据库节点的paginspace占用率到了31%(告警阀是30%),早晨到现场后发现交换空间的利用率已经从31%增长到58%,而且物理内存的占用率到了100%,这个是IBM P595上的一个分区AIX5304,HA
http://www.loveunix.net/thread-126455-1-1.html
昨晚收到手机短信,有个数据库节点的paginspace占用率到了31%(告警阀值是30%),早晨到现场后发现交换空间的利用率已经从31%增长到58%,而且物理内存的占用率到了100%,这个是IBM P595上的一个分区AIX5304,HACMP5.3,ORACLE 9208 RAC
ossresdb2:[/]lsps -a
Page Space Physical Volume Volume Group Size %Used Active Auto Type
hd6 hdisk0 rootvg 32768MB 58 yes yes lv
查看进程占用内存情况时,发现pid为438434的oralce进程占用了系统53%的内存,共计19G内存:
ossresdb2:[/]ps aux | head -1 ; ps aux | sort -rn +3 | head
USER PID %CPU %MEM SZ RSS TTY STAT STIME TIME COMMAND
oracle 438434 0.9 53.0 19353060 17625500 - A Mar 02 2137:42 oracleresdb2 (L
zte 450808 0.0 0.0 720 756 pts/0 A 10:02:26 0:00 -ksh
zte 327788 0.3 0.0 9728 9744 pts/0 A 10:05:04 0:14 topas
root 6914224 0.0 0.0 1988 1956 - A Feb 28 5:27 /usr/sbin/rsct/
root 5878006 0.0 0.0 52 48 - A Jan 30 0:09 aioserver
root 5419188 0.0 0.0 60 32 - A Jan 31 2:49 aioserver
root 4788268 0.0 0.0 1988 1956 - A Feb 28 5:28 /usr/sbin/rsct/
root 4755470 0.0 0.0 1888 1860 - A Feb 28 9:02 /usr/sbin/rsct/
root 4616446 0.0 0.0 48 32 - A 15:55:52 0:00 aioserver
root 3989894 0.0 0.0 320 108 - A Jan 24 0:00 storwatchd star
在数据库中查询该进程对应的sql语句:
SQL> select sql_text from v$sqlarea where address in (select sql_address from v$session where paddr in (select addr from v$process where spid = 438434));
SQL_TEXT
--------------------------------------------------------------------------------
BEGIN PG_TOPO_430021.ReCreateTopoDate(:1,:2,:3); END;
将该存储过程异常占用内存的情况先应用人员反映,和应用人员确认后,将该进程杀掉,杀掉改进程后,释放了大量内存,交换空间利用率下降到35.5%,物理内存的利用率下降到50.6%
nmon--------l=LongTerm-CPU-----Host=ossresdb2------Refresh=2 secs---10:53.23-----------------------------------------------------+
| Memory --------------------------------------------------------------------------------------------------------------------------|
| Physical PageSpace | pages/sec In Out | FileSystemCache |
|% Used 50.6% 35.5% | to Paging Space 1.5 0.0 | (numperm) 0.1% |
|% Free 49.4% 64.5% | to File System 0.0 0.0 | Process 36.2% |
|MB Used 16588.3MB 11632.4MB | Page Scans 0.0 | System 14.3% |
|MB Free 16179.6MB 21135.6MB | Page Cycles 0.0 | Free 49.4% |
|Total(MB) 32768.0MB 32768.0MB | Page Steals 0.0 | ------ |
| | Page Faults 154.9 | Total 100.0% |
|------------------------------------------------------------ | numclient 0.2% |
|Min/Maxperm 1555MB( 5%) 3110MB( 9%)
|Min/Maxfree 960 1088 Total Virtual 64.0GB | User 32.2% |
|Min/Maxpgahead 2 8 Accessed Virtual 24.3GB 38.0% Pinned 18.0% |
| |
|---------------------------------------------------------------------------------
之前我们的这个分区就出现过pagingspace到达100%,造成分区宕机,当时也不知道原因,今天总算是找到真凶了,应用上的问题也会造成系统宕机,大家可要当心了哦,尤其要注意pagingspace的利用率