欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  数据库

Oracle 11g控制文件损坏问题分析

程序员文章站 2024-02-16 09:04:28
...

对于oracle 11g版本以下数据库当控制文件损坏后,我们在mount数据库时,会有很明显的ora-600错误,这样就很容易知道控制文件损坏

对于Oracle 11g版本以下数据库当控制文件损坏后,我们在mount数据库时,会有很明显的ora-600错误,这样就很容易知道控制文件损坏的错误,但是对于oracle 11g R2就不是很明显了,

当时是一个ORACLE 11g 的RAC系统,出现问题时数据库实例可以nomount打开但是在mount控制文件时就会出现如下告警:

ORA-3113 "end of file on communication channel"

然后整个sqlplus连接终止,需要重新连接,当然我们知道通常mount阶段无法进行,问题就出在控制文件本身的存在损坏的问题,但是对于专业的人员来说,如果仅仅满足这样的心态,显然是不行的,所以需要对其进行进一步分析:

但是在ASM日志中我们可以看到如下信息:

Tue Mar 27 13:35:11 2012
NOTE: client PROD1:PROD registered, osid 6726, mbr 0x1
Tue Mar 27 13:35:24 2012
NOTE: ASM client PROD1:PROD disconnected unexpectedly.
NOTE: check client alert log.
NOTE: Trace records dumped in trace file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_6726.trc
Tue Mar 27 13:40:35 2012
NOTE: client PROD1:PROD registered, osid 7477, mbr 0x1
Tue Mar 27 13:41:45 2012
NOTE: ASM client PROD1:PROD disconnected unexpectedly.
NOTE: check client alert log.
NOTE: Trace records dumped in trace file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_7477.trc
Tue Mar 27 13:41:47 2012
NOTE: client PROD1:PROD registered, osid 7736, mbr 0x1
Tue Mar 27 13:42:01 2012
NOTE: ASM client PROD1:PROD disconnected unexpectedly.
NOTE: check client alert log.
NOTE: Trace records dumped in trace file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_7736.tr

对于生成的trace文件我们仅能够看到如下些信息:

2012-03-27 13:41:08.022438 :802EEFE8:KFNS:kfn.c@702:kfnDispatch(): calling server stub for KFNOP=5
2012-03-27 13:41:13.027006 :802EF0F4:KFNU:kfns.c@1924:kfnsBackground(): kfnsBackground completed in 5 seconds (KFNPM=0)
2012-03-27 13:41:13.027012 :802EF0F5:KFNS:kfn.c@729:kfnDispatch(): completed KFNOP=5
2012-03-27 13:41:13.027122 :802EF0F6:KFNS:kfn.c@702:kfnDispatch(): calling server stub for KFNOP=5

对于此问题显然没什么用处,并且问题应该还是在数据库方面。

所以对数据库实例的alert告警检查,当执行alter database mount状态时的日志如下:

Tue Mar 27 11:42:01 2012
alter database mount
This instance was first to mount
Tue Mar 27 11:42:01 2012
NOTE: Loaded library: /opt/oracle/extapi/64/asm/orcl/1/libasm.so
NOTE: Loaded library: System
Tue Mar 27 11:42:01 2012
SUCCESS: diskgroup PRODDATA was mounted
Tue Mar 27 11:42:01 2012
NOTE: dependency between database PROD and diskgroup resource ora.PRODDATA.dg is established
USER (ospid: 26774): terminating the instance
Tue Mar 27 11:42:07 2012
System state dump requested by (instance=1, osid=26774), summary=[abnormal instance termination].
System State dumped to trace file /d01/oracle/11.2.0/admin/PROD1_db01/diag/rdbms/prod/PROD1/trace/PROD1_diag_26656.trc
Dumping diagnostic data in directory=[cdmp_20120327114207], requested by (instance=1, osid=26774), summary=[abnormal instance termination].
Instance terminated by USER, pid = 26774

还是不明显的日志提示,检查告警trace文件:/d01/oracle/11.2.0/admin/PROD1_db01/diag/rdbms/prod/PROD1/trace/PROD1_diag_26656.trc也无明细的信息

后来采用10046事件来跟踪mount这个过程,才看到了比较明细的提示,

alter session set events='10046 trace name context forever,level 12';
Trace file /d01/oracle/11.2.0/admin/PROD1_db01/diag/rdbms/prod/PROD1/trace/PROD1_ora_7764.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /d01/oracle/11.2.0
System name: Linux
Node name: db01.clc.com
Release: 2.6.18-238.el5
Version: #1 SMP Sun Dec 19 14:22:44 EST 2010
Machine: x86_64
Instance name: PROD1
Redo thread mounted by this instance: 0
Oracle process number: 31
Unix process pid: 7764, image: oracle@db01.clc.com (TNS V1-V3)


*** 2012-03-27 13:41:55.101
*** SESSION ID:(1751.3) 2012-03-27 13:41:55.101
*** CLIENT ID:() 2012-03-27 13:41:55.101
*** SERVICE NAME:() 2012-03-27 13:41:55.101
*** MODULE NAME:(oraagent.bin@db01.clc.com (TNS V1-V3)) 2012-03-27 13:41:55.101
*** ACTION NAME:() 2012-03-27 13:41:55.101

Error: kccpb_sanity_check_2
Control file sequence number mismatch!
fhcsq: 312916 bhcsq: 313137 cfn 0


*** 2012-03-27 13:41:55.101
Submitting synchronized dump request [268435460]. summary=[Controlfile header dump (kccpbsc)].

*** 2012-03-27 13:41:57.102
kjzduptcctx: Notifying DIAG for crash event
----- Abridged Call Stack Trace -----
ksedsts()+461----- End of Abridged Call Stack Trace -----

*** 2012-03-27 13:41:57.141
USER (ospid: 7764): terminating the instance
ksuitm: waiting up to [5] seconds before killing DIAG(7652)

如上红色字段可以看到,是控制文件中序列号不匹配造成控制文件一致性验证损坏,而无法正常mount数据库。

这样问题就明了了,,可以修改或重建控制文件方式来打开数据库。

更多Oracle相关信息见Oracle 专题页面 ?tid=12

Oracle 11g控制文件损坏问题分析