数据库死锁严重引发中间件连接池满故障诊断
1、故障现象 前台系统应用无法登陆,weblogic服务器应用程序的运行状态显示为overload,线程连接池满。 2、故障原因分析 根据上述故障现象,分析基础可以确定为是Weblogic有过多的连接连到数据库,因为会话一直保持未释放,将连接池占满后,导致新的连接无法
1、故障现象
前台系统应用无法登陆,weblogic服务器应用程序的运行状态显示为overload,线程连接池满。
2、故障原因分析
根据上述故障现象,分析基础可以确定为是Weblogic有过多的连接连到数据库,因为会话一直保持未释放,将连接池占满后,导致新的连接无法请求到连接池。在此关键是分析为什么会有大量的会话占满连接池而不释放。
3、问题分析过程
3.1 session数超过1000
|
Snap Id |
Snap Time |
Sessions |
Cursors/Session |
Begin Snap: |
38194 |
2014/12/10 8:00 |
1010 |
13.8 |
End Snap: |
38196 |
2014/12/10 10:00 |
1004 |
14.4 |
Elapsed: |
|
119.65 (mins) |
|
|
DB Time: |
|
20,686.23 (mins) |
|
|
3.2 每秒执行次数超过15000多次
|
Per Second |
Per Transaction |
Redo size: |
18,144,642.95 |
459,254.26 |
Logical reads: |
199,155.23 |
5,040.77 |
Block changes: |
100,209.94 |
2,536.39 |
Physical reads: |
5,241.51 |
132.67 |
Physical writes: |
2,182.48 |
55.24 |
User calls: |
16,134.45 |
408.37 |
Parses: |
177.18 |
4.48 |
Hard parses: |
69.02 |
1.75 |
Sorts: |
106.97 |
2.71 |
Logons: |
0.58 |
0.01 |
Executes: |
15,955.55 |
403.85 |
Transactions: |
39.51 |
|
3.3 94%的等待时间都是花在“enq: TX -row lock contention”
Event |
Waits |
Time(s) |
Avg Wait(ms) |
% Total Call Time |
Wait Class |
enq: TX - row lock contention |
2,419,702 |
1,174,344 |
485 |
94.6 |
Application |
CPU time |
|
28,795 |
|
2.3 |
|
db file sequential read |
2,507,054 |
13,011 |
5 |
1 |
User I/O |
db file scattered read |
2,358,449 |
11,234 |
5 |
0.9 |
User I/O |
db file parallel write |
475,780 |
7,760 |
16 |
0.6 |
System I/O |
3.4 消耗时间最长的SQL语句
Elapsed Time (s) |
CPU Time (s) |
Executions |
Elap per Exec (s) |
% Total DB Time |
SQL Id |
SQL Text |
744,943 |
51 |
226 |
3296.21 |
60.02 |
2d0msz9tv9z5c |
DELETE FROM PUB_XXLINE WHERE U... |
429,987 |
29 |
483 |
890.24 |
34.64 |
1yucnh7p2mjdk |
DELETE FROM PUB_XXLINE WHERE S... |
2d0msz9tv9z5c文本如下:
DELETE FROM PUB_XXLINE WHERE USER_ID=:1
1yucnh7p2mjdk文本如下:
DELETE FROM PUB_XXLINE WHERE SESSION_ID=:1
以上两条SQL语句产生严重行争用,并且在会话级产生死锁
3.5 数据库内验证会话死锁
(1)查看被阻塞的sid:800会话是被哪个会话所阻塞
select sid,serial#,sql_id,status,blocking_session from v$sessionwhere sid=800;
sid |
serial# |
sql_id |
status |
blocking_session |
800 |
28777 |
2d0msz9tv9z5c |
ACTIVE |
347 |
从上面看来,sid:800执行的SQL语句的ID是2d0msz9tv9z5c,它被session id 347所阻塞
(2)查看session id 347是否被阻塞
select sid,serial#,sql_id,status,blocking_session from v$sessionwhere sid=347;
sid |
serial# |
sql_id |
status |
blocking_session |
347 |
15367 |
2d0msz9tv9z5c |
ACTIVE |
800 |
Session 347执行的语句的ID也是2d0msz9tv9z5c,它被sid 800所阻塞,在这里已经形成死锁。
3.6 数据库日志中发现大量死锁信息
Wed Dec 10 09:55:05 2014 Global Enqueue Services Deadlock detected. More info in file /oracle/admin/XXTH/bdump/XXth1_lmd0_1541770.trc. Wed Dec 10 09:55:36 2014 Thread 1 advanced to log sequence 493665 (LGWR switch) Current log# 16 seq# 493665 mem# 0: +REDOGROUP/XXth/onlinelog/group_16.263.837169415 Wed Dec 10 09:56:04 2014 Thread 1 advanced to log sequence 493666 (LGWR switch) Current log# 14 seq# 493666 mem# 0: +REDOGROUP/XXth/onlinelog/group_14.261.837169357 Wed Dec 10 09:56:05 2014 Global Enqueue Services Deadlock detected. More info in file /oracle/admin/XXTH/bdump/XXth1_lmd0_1541770.trc. |
4、问题原因诊断总结
语句DELETEFROM PUB_XXLINE WHERE USER_ID=:1每次操作的数据行有多行,多个会话同时执行该语句,在数据库内形成死锁,并且玥塞单次删除一行数据的语句DELETEFROM PUB_XXLINE WHERE SESSION_ID=:1从而使Weblogic(中间件)的连接池消耗殆尽。无法新增连接。
上一篇: 双路四核数据库服务器,正睿只卖15999
下一篇: PHP计划任务