allaboutoracleblockcorruption
ORA-01578错误是Oracle中常见的物理坏块讹误(Corruption)错误,从10g以后在拥有完整备份和归档日志的情况下可以通过blockrecover/recover命令在线恢复该坏块,前提是数据块所在磁道在物理上仍可用。 以下是一个在没有充分备份情况下的ORA-01578错误的解决,
ORA-01578错误是Oracle中常见的物理坏块讹误(Corruption)错误,从10g以后在拥有完整备份和归档日志的情况下可以通过blockrecover/recover命令在线恢复该坏块,前提是数据块所在磁道在物理上仍可用。
以下是一个在没有充分备份情况下的ORA-01578错误的解决,前提是能够容忍坏块所在数据的丢失:
SQL> exec DBMS_STATS.GATHER_DATABASE_STATS;
BEGIN DBMS_STATS.GATHER_DATABASE_STATS; END;
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 4, block # 870212)
ORA-01110: data file 4:
'/s01/oradata/G10R25/datafile/o1_mf_users_7ch7d4nx_.dbf'
ORA-06512: at "SYS.DBMS_STATS", line 15188
ORA-06512: at "SYS.DBMS_STATS", line 15530
ORA-06512: at "SYS.DBMS_STATS", line 15674
ORA-06512: at "SYS.DBMS_STATS", line 15638
ORA-06512: at line 1
使用RMAN blockreocver命令试图修改该物理坏块:
RMAN> blockrecover datafile 4 block 870212;
Starting blockrecover at 08-NOV-12
channel ORA_DISK_1: restored block(s) from backup piece 1
piece handle=/s01/flash_recovery_area/G10R25/backupset/2012_08_06/o1_mf_nnndf_TAG20120806T075500_81zd4njn_.bkp tag=TAG20120806T075500
channel ORA_DISK_1: block restore complete, elapsed time: 00:01:16
starting media recovery
archive log thread 1 sequence 467 is already on disk as file /s01/flash_recovery_area/G10R25/archivelog/2012_10_31/o1_mf_1_467_893571cm_.arc
archive log thread 1 sequence 468 is already on disk as file /s01/flash_recovery_area/G10R25/archivelog/2012_10_31/o1_mf_1_468_893pc84l_.arc
archive log thread 1 sequence 469 is already on disk as file /s01/flash_recovery_area/G10R25/archivelog/2012_11_01/o1_mf_1_469_894zsbym_.arc
archive log thread 1 sequence 470 is already on disk as file /s01/flash_recovery_area/G10R25/archivelog/2012_11_01/o1_mf_1_470_896b944y_.arc
4_.arc
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of blockrecover command at 11/08/2012 06:19:40
RMAN-06053: unable to perform media recovery because of missing log
RMAN-06025: no backup of log thread 1 seq 466 lowscn 27762151 found to restore
RMAN-06025: no backup of log thread 1 seq 465 lowscn 27762145 found to restore
RMAN-06025: no backup of log thread 1 seq 464 lowscn 27762142 found to restore
由于缺少必要的归档日志导致blockrecover无法成功,需要另想办法。
首先确认该数据块属于哪个SEGMENT,如果是INDEX那么完全可以重建也不会丢失数据,但是如果是表数据则需要容忍丢失该坏块内的数据:
SQL> col tablespace_name for a20
SQL> col segment_type for a10
SQL> col segment_name for a20
SQL> col owner for a8
SQL> SELECT tablespace_name, segment_type, owner, segment_name
2 FROM dba_extents
3 WHERE file_id = &fileid
4 and &blockid between block_id AND block_id + blocks - 1;
Enter value for fileid: 4
old 3: WHERE file_id = &fileid
new 3: WHERE file_id = 4
Enter value for blockid: 870212
old 4: and &blockid between block_id AND block_id + blocks - 1
new 4: and 870212 between block_id AND block_id + blocks - 1
TABLESPACE_NAME SEGMENT_TY OWNER SEGMENT_NAME
-------------------- ---------- -------- --------------------
USERS TABLE SYS CORRUPT_ME
SQL> select count(*) from CORRUPT_ME;
select count(*) from CORRUPT_ME
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 4, block # 870212)
ORA-01110: data file 4:
'/s01/oradata/G10R25/datafile/o1_mf_users_7ch7d4nx_.dbf'
SQL> analyze table corrupt_me validate structure;
analyze table corrupt_me validate structure
*
ERROR at line 1:
ORA-01498: block check failure - see trace file
SQL> oradebug setmypid
Statement processed.
SQL> oradebug tracefile_name
/s01/admin/G10R25/udump/g10r25_ora_19749.trc
Corrupt block relative dba: 0x010d4744 (file 4, block 870212)
Bad header found during buffer read
Data in bad block:
type: 6 format: 2 rdba: 0x000d4744
last change scn: 0x0000.00000000 seq: 0xff flg: 0x04
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x000006ff
check value in block header: 0x6323
computed block checksum: 0x0
Reread of rdba: 0x010d4744 (file 4, block 870212) found same corrupted data
*** 2012-11-08 06:23:12.564
table scan: segment: file# 4 block# 870211
skipping corrupt block file# 4 block# 870212
*** 2012-11-08 06:23:36.955
table scan: segment: file# 4 block# 870211
skipping corrupt block file# 4 block# 870212
skipping corrupted block at rdba: 0x010d4744
下面使用10231 level 10事件来避免发生ORA-01578错误,并将原坏块表复制出来:
SQL> alter session set events '10231 trace name context forever,level 10';
Session altered.
SQL> select count(*) from CORRUPT_ME;
COUNT(*)
----------
50857
SQL> create table corrupt_me_copy tablespace users as select * from CORRUPT_ME;
Table created.
SQL> analyze table corrupt_me_copy validate structure;
Table analyzed.
之后仅需要将新表rename为旧表,并重建索引即可:
SQL> alter table corrupt_me rename to corrupt_me_copy1;
Table altered.
SQL> alter table corrupt_me_copy rename to corrupt_me;
Table altered.
SQL> rebuild indexs
利用RMAN检测数据库坏块的脚本
============================
虽然我们也可以通过dbv(db file verify)工具做到对单个数据文件的坏块检测,但是直接使用RMAN的”backup validate check logical database;”结合V$DATABASE_BLOCK_CORRUPTION视图要方便地多。
Script:
1) $ rman target / nocatalog
2) RMAN> run {
allocate channel d1 type disk;
allocate channel d2 type disk;
allocate channel d3 type disk;
allocate channel d4 type disk;
backup validate check logical database;
}
3) select * from V$DATABASE_BLOCK_CORRUPTION ;
4) If V$DATABASE_BLOCK_CORRUPTION contains rows please run this query to
find the objects that contains the corrupted blocks:
SELECT e.owner,
e.segment_type,
e.segment_name,
e.partition_name,
c.file#,
greatest(e.block_id, c.block#) corr_start_block#,
least(e.block_id + e.blocks - 1, c.block# + c.blocks - 1) corr_end_block#,
least(e.block_id + e.blocks - 1, c.block# + c.blocks - 1) -
greatest(e.block_id, c.block#) + 1 blocks_corrupted,
null description
FROM dba_extents e, v$database_block_corruption c
WHERE e.file_id = c.file#
AND e.block_id
AND e.block_id + e.blocks - 1 >= c.block#
UNION
SELECT s.owner,
s.segment_type,
s.segment_name,
s.partition_name,
c.file#,
header_block corr_start_block#,
header_block corr_end_block#,
1 blocks_corrupted,
'Segment Header' description
FROM dba_segments s, v$database_block_corruption c
WHERE s.header_file = c.file#
AND s.header_block between c.block# and c.block# + c.blocks - 1
UNION
SELECT null owner,
null segment_type,
null segment_name,
null partition_name,
c.file#,
greatest(f.block_id, c.block#) corr_start_block#,
least(f.block_id + f.blocks - 1, c.block# + c.blocks - 1) corr_end_block#,
least(f.block_id + f.blocks - 1, c.block# + c.blocks - 1) -
greatest(f.block_id, c.block#) + 1 blocks_corrupted,
'Free Block' description
FROM dba_free_space f, v$database_block_corruption c
WHERE f.file_id = c.file#
AND f.block_id
AND f.block_id + f.blocks - 1 >= c.block#
order by file#, corr_start_block#;
SELECT tablespace_name, segment_type, owner, segment_name
FROM dba_extents
WHERE file_id = &fileid
and &blockid between block_id AND block_id + blocks - 1;
http://docs.oracle.com/cd/B28359_01/backup.111/b28270/rcmvalid.htm
Basic Concepts of RMAN Validation
The database prevents operations that result in unusable backup files or corrupted restored datafiles. The database automatically does the following:
Blocks access to datafiles while they are being restored or recovered
Permits only one restore operation for each datafile at a time
Ensures that incremental backups are applied in the correct order
Stores information in backup files to allow detection of corruption
Checks a block every time it is read or written in an attempt to report a corruption as soon as it has been detected
Checksums and Corrupt Blocks
A corrupt block is a block that has been changed so that it differs from what Oracle Database expects to find. Block corruptions can be caused by a number of different failures including, but not limited to the following:
Faulty disks and disk controllers
Faulty memory
Oracle Database software defects
DB_BLOCK_CHECKSUM is a database initialization parameter that controls the writing of checksums for the blocks in datafiles and online redo log files in the database (not backups). If DB_BLOCK_CHECKSUM is typical, then the database computes a checksum for each block during normal operations and stores it in the header of the block before writing it to disk. When the database reads the block from disk later, it recomputes the checksum and compares it to the stored value. If the values do not match, then the block is corrupt.
By default, the BACKUP command computes a checksum for each block and stores it in the backup. The BACKUP command ignores the values of DB_BLOCK_CHECKSUM because this initialization parameter applies to datafiles in
the database, not backups.
Physical and Logical Block Corruption
In a physical corruption, which is also called a media corruption, the database does not recognize the block at all: the checksum is invalid, the block contains all zeros, or the header and footer of the block do not match.
Note:
By default, the BACKUP command computes a checksum for each block and stores it in the backup. If you specify the NOCHECKSUM option, then RMAN does not perform a checksum of the blocks when creating the backup.
In a logical corruption, the contents of the block are logically inconsistent. Examples of logical corruption include corruption of a row piece or index entry. If RMAN detects logical corruption, then it logs the block in the alert log and server session trace file.
By default, RMAN does not check for logical corruption. If you specify CHECK LOGICAL on the BACKUP command, however, then RMAN tests data and index blocks for logical corruption, such as corruption of a row piece or index entry, and log them in the alert log located in the Automatic Diagnostic Repository (ADR). If you use RMAN with the following configuration when backing up or restoring files, then it detects all types of block corruption that are possible to detect:
In the initialization parameter file of a database, set DB_BLOCK_CHECKSUM=typical so that the database calculates datafile checksums automatically (not for backups, but for datafiles in use by the database)
Do not precede the BACKUP or RESTORE command with SET MAXCORRUPT so that RMAN does not tolerate any block corruptions
In a BACKUP command, do not specify the NOCHECKSUM option so that RMAN calculates a checksum when writing backups
In BACKUP and RESTORE commands, specify the CHECK LOGICAL option so that RMAN checks for logical as well as physical corruption