欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

11gR2dataguard备库文件损坏处理一例

程序员文章站 2024-01-22 12:54:58
某客户的一套11gr2 dataguard环境出现异常,检查发现是备库出现文件损坏,且无法正常情况,已经超过1个多月没同步了。 我们先来看下备库的日志: .......省略部分内...

某客户的一套11gr2 dataguard环境出现异常,检查发现是备库出现文件损坏,且无法正常情况,已经超过1个多月没同步了。 我们先来看下备库的日志:

.......省略部分内容
see note 411.1 at my oracle support for error and packaging details.
slave exiting with ora-600 exception
errors in file /u01/app/oracle/diag/rdbms/crjnew/crjnew/trace/crjnew_pr0p_9892.trc:
ora-00600: internal error code, arguments: [3020], [3], [6118], [12589030], [], [], [], [], [], [], [], []
ora-10567: redo is inconsistent with data block (file# 3, block# 6118, file offset is 50118656 bytes)
ora-10564: tablespace undotbs1
ora-01110: data file 3: '/u01/app/oracle/oradata/crjnew/datafile/o1_mf_undotbs1_859l2yrm_.dbf'
ora-10560: block type 'ktu undo block'
use adrci or support workbench to package the incident.
see note 411.1 at my oracle support for error and packaging details.
slave exiting with ora-600 exception
errors in file /u01/app/oracle/diag/rdbms/crjnew/crjnew/trace/crjnew_pr1p_9964.trc:
ora-00600: internal error code, arguments: [3020], [3], [3740], [12586652], [], [], [], [], [], [], [], []
ora-10567: redo is inconsistent with data block (file# 3, block# 3740, file offset is 30638080 bytes)
ora-10564: tablespace undotbs1
ora-01110: data file 3: '/u01/app/oracle/oradata/crjnew/datafile/o1_mf_undotbs1_859l2yrm_.dbf'
ora-10560: block type 'ktu undo block'
recovery interrupted!
recovered data files to a consistent state at change 12331596958128
mrp0: background media recovery process shutdown (crjnew)
.....省略部分内容
tue may 27 19:30:03 2014
errors in file /u01/app/oracle/diag/rdbms/crjnew/crjnew/trace/crjnew_pr1e_21956.trc (incident=444672):
ora-00600: internal error code, arguments: [3020], [16], [1016759], [68125623], [], [], [], [], [], [], [], []
ora-10567: redo is inconsistent with data block (file# 16, block# 1016759, file offset is 4034322432 bytes)
ora-10564: tablespace crj
ora-01110: data file 16: '/u01/app/oracle/oradata/crjnew/datafile/crj_data09.dbf'
ora-10561: block type 'transaction managed index block', data object# 77037
incident details in: /u01/app/oracle/diag/rdbms/crjnew/crjnew/incident/incdir_444672/crjnew_pr1e_21956_i444672.trc
tue may 27 19:30:06 2014
dumping diagnostic data in directory=[cdmp_20140527193006], requested by (instance=1, osid=21956 (pr1e)), summary=[incident=444672].
use adrci or support workbench to package the incident.
see note 411.1 at my oracle support for error and packaging details.
slave exiting with ora-600 exception
errors in file /u01/app/oracle/diag/rdbms/crjnew/crjnew/trace/crjnew_pr1e_21956.trc:
ora-00600: internal error code, arguments: [3020], [16], [1016759], [68125623], [], [], [], [], [], [], [], []
ora-10567: redo is inconsistent with data block (file# 16, block# 1016759, file offset is 4034322432 bytes)
ora-10564: tablespace crj
ora-01110: data file 16: '/u01/app/oracle/oradata/crjnew/datafile/crj_data09.dbf'
ora-10561: block type 'transaction managed index block', data object# 77037
tue may 27 19:30:06 2014
errors in file /u01/app/oracle/diag/rdbms/crjnew/crjnew/trace/crjnew_mrp0_21854.trc (incident=444262):
ora-00600: internal error code, arguments: [3020], [16], [1016759], [68125623], [], [], [], [], [], [], [], []
ora-10567: redo is inconsistent with data block (file# 16, block# 1016759, file offset is 4034322432 bytes)
ora-10564: tablespace crj
ora-01110: data file 16: '/u01/app/oracle/oradata/crjnew/datafile/crj_data09.dbf'
ora-10561: block type 'transaction managed index block', data object# 77037
incident details in: /u01/app/oracle/diag/rdbms/crjnew/crjnew/incident/incdir_444262/crjnew_mrp0_21854_i444262.trc
use adrci or support workbench to package the incident.
see note 411.1 at my oracle support for error and packaging details.
recovery slave pr1e previously exited with exception 600
tue may 27 19:30:07 2014
mrp0: background media recovery terminated with error 448
errors in file /u01/app/oracle/diag/rdbms/crjnew/crjnew/trace/crjnew_pr00_21856.trc:
ora-00448: normal completion of background process
recovery interrupted!
recovered data files to a consistent state at change 12331596967112
mrp0: background media recovery process shutdown (crjnew)
tue may 27 19:30:11 2014
sweep [inc][444672]: completed
sweep [inc][444262]: completed
sweep [inc2][444672]: completed
sweep [inc2][444262]: completed
tue may 27 19:32:08 2014
primary database is in maximum performance mode

你会看到,当你手工发起recover managed standby database disconnect from session后,会出现上述的错误。我们也可以清楚
的看到,之所以mrp经常无法正常启动,是因为有文件存在坏块。对于数据文件坏块,通过dbv检查你会发现是这么一种情况:
[oracle@gscrj01 ~]$ dbv file=/u01/app/oracle/oradata/crjnew/datafile/o1_mf_sysaux_859l29lq_.dbf blocksize=8192

dbverify: release 11.2.0.3.0 - production on tue may 27 18:02:42 2014

copyright (c) 1982, 2011, oracle and/or its affiliates. all rights reserved.

dbverify - verification starting : file = /u01/app/oracle/oradata/crjnew/datafile/o1_mf_sysaux_859l29lq_.dbf
page 121298 is influx - most likely media corrupt
corrupt block relative dba: 0x0081d9d2 (file 2, block 121298)
fractured block found during dbv:
data in bad block:
type: 6 format: 2 rdba: 0x0081d9d2
last change scn: 0x0b37.2c742a38 seq: 0x3 flg: 0x04
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x441f0601
check value in block header: 0xf89f
computed block checksum: 0x2281

 

dbverify - verification complete

total pages examined : 655360
total pages processed (data) : 77609
total pages failing (data) : 0
total pages processed (index): 66328
total pages failing (index): 0
total pages processed (lob) : 9344
total pages failing (lob) : 0
total pages processed (other): 108285
total pages processed (seg) : 0
total pages failing (seg) : 0
total pages empty : 393793
total pages marked corrupt : 1
total pages influx : 1
total pages encrypted : 0
highest block scn : 745850569 (2871.745850569)
[oracle@gscrj01 ~]$ dbv file=/u01/app/oracle/oradata/crjnew/datafile/crj_data07.dbf blocksize=8192

dbverify: release 11.2.0.3.0 - production on tue may 27 18:12:41 2014

copyright (c) 1982, 2011, oracle and/or its affiliates. all rights reserved.

dbverify - verification starting : file = /u01/app/oracle/oradata/crjnew/datafile/crj_data07.dbf


dbverify - verification complete

total pages examined : 3932160
total pages processed (data) : 47043
total pages failing (data) : 0
total pages processed (index): 22456
total pages failing (index): 0
total pages processed (other): 3862660
total pages processed (seg) : 0
total pages failing (seg) : 0
total pages empty : 1
total pages marked corrupt : 0
total pages influx : 0
total pages encrypted : 0
highest block scn : 745794635 (2871.745794635)


我这里检查了2个报错的文件,发现sysaux的文件有一个坏块,然而另外一个数据dbv检查并没有提示坏块,但是为什么会报错呢?
这里的错误基本上都是类似ora-10567: redo is inconsistent with data block 的问题,这可能不是block本身的问题,可能是
日志写的内容和块的内容不一致了。

开始我看只有3个文件有报错,那我就想,能否直接从主库scp 这3个文件到备库,然后直接recover就行了呗? 大概是这样一个操作:

--备库

alter database datafile n offline drop;
mv xxxx.dbf xxxx.dbf.bak

--主库
scp /xxx/xxxx/xxxx.dbf oracle@x.x.x.x:/xxx/xxx/xxx.dbf

--备库

alter database datafile n online;

alter database recover managed standby database disconnect from session;

这种操作本身没有问题,然而有问题的是,这3个文件处理了之后,恢复发行又报错其他的数据文件了,我檫。

整个一共2.2tb,80个30g的文件。 我不可能给他全库scp过去。

那么怎么弄呢 ?


其实很简单,我很早之前也讲过利用rman增量的方式来恢复dataguard环境中缺少日志导致gap的情况。 我们也可以使用类似
这个方法来做,下面是我的基本操作:

---定位备库同步的scn
sql> col first_change# for 9999999999999999999
sql> col next_change# for 9999999999999999
sql> /

sequence# applied first_change# next_change#
---------- --------- -------------------- -----------------
6141 yes 12331596661580 12331596717210
6142 yes 12331596717210 12331596758421
6143 yes 12331596758421 12331596805008
6144 yes 12331596805008 12331596838849
6145 yes 12331596838849 12331596901470
6146 yes 12331596901470 12331596958127
6147 no 12331596958127 12331597090365
6148 no 12331597090365 12331597133130
6149 no 12331597133130 12331597176234
6150 no 12331597176234 12331597220783
6151 no 12331597220783 12331597276144
。。。。。省略部分内容

---主库进行增量备份(基于scn)

rman target / << oef
run
{
allocate channel d1 type disk;
allocate channel d2 type disk;
allocate channel d3 type disk;
allocate channel d4 type disk;
backup as compressed backupset incremental from scn 12331596958127 database format '/oraclenew/datadir3/rmanback双击查看原图_incr_%d_%t_%u.bak'
include current controlfile for standby filesperset=5 tag 'forstandby0527';
release channel d1;
release channel d2;
release channel d3;
release channel d4;
}

exit
eof

----将主库的备份文件scp到备库,并注册到catalog
rman> catalog start with '/oraclenew/datadir3/temp/';

using target database control file instead of recovery catalog
searching for all files that match the pattern /oraclenew/datadir3/temp/

list of files unknown to the database
=====================================
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0sp9btk8_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0kp9botr_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0op9brdj_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0mp9bqlg_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0up9butr_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_10p9c01g_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_11p9c37k_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0lp9bqhs_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0gp9bmtn_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0jp9boid_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0ip9bmtn_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0tp9bul8_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0pp9bsg4_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0rp9btan_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0qp9bsul_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0np9br09_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0fp9bmtn_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0vp9bvp7_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0hp9bmtn_1_1.bak

do you really want to catalog the above files (enter yes or no)? yes
cataloging files...
cataloging done

list of cataloged files
=======================
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0sp9btk8_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0kp9botr_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0op9brdj_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0mp9bqlg_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0up9butr_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_10p9c01g_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_11p9c37k_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0lp9bqhs_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0gp9bmtn_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0jp9boid_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0ip9bmtn_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0tp9bul8_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0pp9bsg4_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0rp9btan_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0qp9bsul_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0np9br09_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0fp9bmtn_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0vp9bvp7_1_1.bak
file name: /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0hp9bmtn_1_1.bak

---进行recover备库
rman> recover database noredo;

starting recover at 28-may-14
allocated channel: ora_disk_1
channel ora_disk_1: sid=1261 device type=disk
channel ora_disk_1: starting incremental datafile backup set restore
channel ora_disk_1: specifying datafile(s) to restore from backup set
destination for restore of datafile 00001: /u01/app/oracle/oradata/crjnew/datafile/o1_mf_system_859l1ovo_.dbf
destination for restore of datafile 00015: /u01/app/oracle/oradata/crjnew/datafile/crj_data08.dbf
destination for restore of datafile 00016: /u01/app/oracle/oradata/crjnew/datafile/crj_data09.dbf
destination for restore of datafile 00060: /oraclenew/datadir1/crj_data50.dbf
destination for restore of datafile 00062: /oraclenew/datadir1/crj_data51.dbf
channel ora_disk_1: reading from backup piece /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0op9brdj_1_1.bak
channel ora_disk_1: piece handle=/oraclenew/datadir3/temp/db_incr_crjnew_20140527_0op9brdj_1_1.bak tag=forstandby0527
channel ora_disk_1: restored backup piece 1
channel ora_disk_1: restore complete, elapsed time: 00:00:35
channel ora_disk_1: starting incremental datafile backup set restore
channel ora_disk_1: specifying datafile(s) to restore from backup set
destination for restore of datafile 00002: /u01/app/oracle/oradata/crjnew/datafile/o1_mf_sysaux_859l29lq_.dbf
destination for restore of datafile 00017: /u01/app/oracle/oradata/crjnew/datafile/crj_data10.dbf
destination for restore of datafile 00018: /u01/app/oracle/oradata/crjnew/datafile/crj_data11.dbf
destination for restore of datafile 00063: /oraclenew/datadir1/crj_data52.dbf
destination for restore of datafile 00064: /oraclenew/datadir1/crj_data53.dbf
channel ora_disk_1: reading from backup piece /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0pp9bsg4_1_1.bak
channel ora_disk_1: piece handle=/oraclenew/datadir3/temp/db_incr_crjnew_20140527_0pp9bsg4_1_1.bak tag=forstandby0527
channel ora_disk_1: restored backup piece 1
channel ora_disk_1: restore complete, elapsed time: 00:00:35
channel ora_disk_1: starting incremental datafile backup set restore
channel ora_disk_1: specifying datafile(s) to restore from backup set
destination for restore of datafile 00004: /u01/app/oracle/oradata/crjnew/datafile/o1_mf_users_859l57gz_.dbf
destination for restore of datafile 00006: /u01/app/oracle/oradata/crjnew/datafile/dzzj_index01.dbf
destination for restore of datafile 00008: /u01/app/oracle/oradata/crjnew/datafile/crj_data01.dbf
destination for restore of datafile 00054: /oraclenew/datadir1/dzzj_data02.dbf
destination for restore of datafile 00076: /oraclenew/datadir3/crjnew_bin01.dbf
channel ora_disk_1: reading from backup piece /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0up9butr_1_1.bak

......部分内容
hannel ora_disk_1: reading from backup piece /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0hp9bmtn_1_1.bak
channel ora_disk_1: piece handle=/oraclenew/datadir3/temp/db_incr_crjnew_20140527_0hp9bmtn_1_1.bak tag=forstandby0527
channel ora_disk_1: restored backup piece 1
channel ora_disk_1: restore complete, elapsed time: 00:00:45
channel ora_disk_1: starting incremental datafile backup set restore
channel ora_disk_1: specifying datafile(s) to restore from backup set
destination for restore of datafile 00034: /u01/app/oracle/oradata/crjnew/datafile/crj_data27.dbf
destination for restore of datafile 00035: /u01/app/oracle/oradata/crjnew/datafile/crj_data28.dbf
destination for restore of datafile 00056: /oraclenew/datadir1/dzzj_index02.dbf
destination for restore of datafile 00061: /oraclenew/datadir1/zzsb_data01.dbf
channel ora_disk_1: reading from backup piece /oraclenew/datadir3/temp/db_incr_crjnew_20140527_0qp9bsul_1_1.bak
channel ora_disk_1: piece handle=/oraclenew/datadir3/temp/db_incr_crjnew_20140527_0qp9bsul_1_1.bak tag=forstandby0527
channel ora_disk_1: restored backup piece 1
channel ora_disk_1: restore complete, elapsed time: 00:02:25

finished recover at 28-may-14


如果你这个时候去看alert log,你会发现类似这样的信息:
started logmerger process
wed may 28 14:30:22 2014
managed standby recovery not using real time apply
parallel media recovery started with 64 slaves
waiting for all non-current orls to be archived...
all non-current orls have been archived.
media recovery log /u01/app/oracle/fast_recovery_area/crjnew/archivelog/2014_04_15/o1_mf_1_6147_9nv894go_.arc
completed: alter database recover managed standby database disconnect from session
media recovery log /u01/app/oracle/fast_recovery_area/crjnew/archivelog/2014_04_15/o1_mf_1_6148_9nv88s4v_.arc
media recovery log /u01/app/oracle/fast_recovery_area/crjnew/archivelog/2014_04_15/o1_mf_1_6149_9nv88zkm_.arc
media recovery log /u01/app/oracle/fast_recovery_area/crjnew/archivelog/2014_04_15/o1_mf_1_6150_9nv894yk_.arc
media recovery log /u01/app/oracle/fast_recovery_area/crjnew/archivelog/2014_04_15/o1_mf_1_6151_9nv896bo_.arc
wed may 28 14:30:34 2014
media recovery log /u01/app/oracle/fast_recovery_area/crjnew/archivelog/2014_04_15/o1_mf_1_6152_9nv89fv0_.arc
media recovery log /u01/app/oracle/fast_recovery_area/crjnew/archivelog/2014_04_15/o1_mf_1_6153_9nv89g10_.arc
......
media recovery log /u01/app/oracle/fast_recovery_area/crjnew/archivelog/2014_04_21/o1_mf_1_6208_9o9mnqhc_.arc
media recovery log /u01/app/oracle/fast_recovery_area/crjnew/archivelog/2014_04_21/o1_mf_1_6209_9obb1c7s_.arc
......

你会发现oracle仍然会去检查,并跳过这部分差了1个多月的归档,这个过程很快的,不到10分钟完成了。

当然,这个case就算over了。


备注:oracle 11gr2(准确的说是11.2.0.2)开始,active dataguard引入了automatic block repair 机制。然后该机制

需要满足的一定的条件,如下是官方文档的说明:
if ... then ...
a corrupt data block is discovered on a primary database
a physical standby database operating in real-time query mode can be used to repair corrupt data blocks in a primary database. if possible, any corrupt data block encountered when a primary database is accessed will be automatically replaced with an uncorrupted copy of that block from a physical standby database operating in real-time query mode. an ora-1578 error is returned when automatic repair is not possible.

a corrupt data block is discovered on a physical standby database
the server attempts to automatically repair the corruption by obtaining a copy of the block from the primary database if the following database initialization parameters are configured on the standby database:

?configure the log_archive_config parameter with a dg_config list

?configure a log_archive_dest_n parameter for the primary database

实际上,可能还存在一些特殊的情况,当然客户这里是没有使用real-time模式。