一次简单的Oracle恢复Case实战记录
程序员文章站
2022-05-28 20:08:16
发现问题
某网友的数据库由于坏盘了,并且存储掉电,导致数据库无法open了。单看其数据库alert log的错误来看,是非常之简单的,如下:
fri oct 2...
发现问题
某网友的数据库由于坏盘了,并且存储掉电,导致数据库无法open了。单看其数据库alert log的错误来看,是非常之简单的,如下:
fri oct 26 10:33:53 2018 recovery of online redo log: thread 1 group 3 seq 39 reading mem 0 mem# 0: /fs/fs/oradata/orcl/redo03.log block recovery stopped at eot rba 39.77.16 block recovery completed at rba 39.77.16, scn 0.1002048587 oracle instance orcl (pid = 8) - error 600 encountered while recovering transaction (9, 30) on object 9149. fri oct 26 10:33:53 2018 errors in file /fs/fs/oradata/admin/orcl/bdump/orcl_smon_192644.trc: ora-00600: internal error code, arguments: [6856], [0], [43], [], [], [], [], [] fri oct 26 10:33:56 2018 errors in file /fs/fs/oradata/admin/orcl/bdump/orcl_smon_192644.trc: ora-00600: internal error code, arguments: [4194], [33], [36], [], [], [], [], [] doing block recovery for file 2 block 713 block recovery from logseq 39, block 82 to scn 1002048595
对于这种错误,很明显,屏蔽回滚段即可,屏蔽之后可顺利打开数据库,不过后面很快又会crash掉,因此重建undo也就绕过这个问题了。
打开数据库之后,再去观察数据库,会发现alert log有不少的错误,如下所示:
fri oct 26 11:01:46 2018 errors in file /fs/fs/oradata/admin/orcl/bdump/orcl_mmon_385148.trc: ora-00600: internal error code, arguments: [17147], [0x110549070], [], [], [], [], [], [] fri oct 26 11:01:46 2018 errors in file /fs/fs/oradata/admin/orcl/bdump/orcl_m001_373218.trc: ora-00600: internal error code, arguments: [kdddgb5], [196650], [0], [], [], [], [], [] ora-600 encountered when generating server alert smg-4120 fri oct 26 11:01:47 2018 errors in file /fs/fs/oradata/admin/orcl/bdump/orcl_mmon_385148.trc: ora-00600: internal error code, arguments: [kghalo4], [0x11047f6f0], [], [], [], [], [], [] ora-600 encountered when generating server alert smg-4121 fri oct 26 11:01:48 2018 errors in file /fs/fs/oradata/admin/orcl/bdump/orcl_mmon_385148.trc: ora-00600: internal error code, arguments: [kghalo4], [0x11047f6f0], [], [], [], [], [], [] ora-600 encountered when generating server alert smg-4121 fri oct 26 11:01:50 2018 errors in file /fs/fs/oradata/admin/orcl/bdump/orcl_m001_373218.trc: ora-00600: internal error code, arguments: [kdddgb5], [196650], [0], [], [], [], [], [] fri oct 26 11:02:22 2018 errors in file /fs/fs/oradata/admin/orcl/bdump/orcl_mmon_385148.trc: ora-00600: internal error code, arguments: [17114], [0x110549070], [], [], [], [], [], [] fri oct 26 11:02:23 2018 errors in file /fs/fs/oradata/admin/orcl/bdump/orcl_mmon_385148.trc: ora-00600: internal error code, arguments: [kebm_mmon_main_1], [39], [], [], [], [], [], [] ora-00039: error during periodic action ora-00600: internal error code, arguments: [17114], [0x110549070], [], [], [], [], [], [] fri oct 26 11:03:30 2018 restarting dead background process mmon
除此之外,由于之外alert log有坏块报错,因此对system进行了dbv检查,发现确实存在少量坏块,如下:
dbverify: release 10.2.0.4.0 - production on fri oct 26 10:37:20 2018 copyright (c) 1982, 2007, oracle. all rights reserved. dbverify - verification starting : file = system01.dbf dbv-00200: block, dba 4255202, already marked corrupt block checking: dba = 4258751, block type = ktb-managed data block data header at 0x11022a05c kdbchk: fsbo(596) wrong, (hsz 4178) page 64447 failed with check code 6129 block checking: dba = 4259386, block type = ktb-managed data block **** kdxcofbo = 208 != 24 ---- end index block validation page 65082 failed with check code 6401 block checking: dba = 4269609, block type = unlimited data segment header incorrect extent count in the extent map: 16777317 block checking: dba = 4269612, block type = ktb-managed data block **** kdxcofbo = 224 != 216 ---- end index block validation page 75308 failed with check code 6401 block checking: dba = 4269615, block type = ktb-managed data block **** actual rows locked by itl 2 = 1 != # in trans. header = 0 ---- end index block validation page 75311 failed with check code 6401 page 85271 is influx - most likely media corrupt corrupt block relative dba: 0x00414d17 (file 1, block 85271) fractured block found during dbv: data in bad block: type: 6 format: 2 rdba: 0x00414d17 last change scn: 0x0000.3afaf495 seq: 0x1 flg: 0x04 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0xfe830601 check value in block header: 0x96c6 computed block checksum: 0x3c6b page 85383 is influx - most likely media corrupt corrupt block relative dba: 0x00414d87 (file 1, block 85383) fractured block found during dbv: data in bad block: type: 6 format: 2 rdba: 0x00414d87 last change scn: 0x0000.3b6b9d19 seq: 0x1 flg: 0x06 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0x970f0601 check value in block header: 0xe825 computed block checksum: 0x3c6b dbverify - verification complete total pages examined : 640000 total pages processed (data) : 116312 total pages failing (data) : 1 total pages processed (index): 65914 total pages failing (index): 3 total pages processed (other): 64634 total pages processed (seg) : 0 total pages failing (seg) : 0 total pages empty : 393138 total pages marked corrupt : 3 total pages influx : 2 highest block scn : 1002028510 (0.1002028510)
这部分错误,其实处理起来也不困难,部分是业务表的index,但是其他的几乎都是awr相关基表,有2个坏块跟是system相关的基表和索引,分别是i_h_obj#_col#和com$ ,histgrm$。
对于业务索引,很简单,直接drop 重建即可,对于这个sys的index,可以通过设置38003 event进行drop重建。
对于基表com$,histgrm$,由于是非bootstrap$核心对象,其实也可以处理掉的。
处理方法
不过考虑到这种毕竟是存储掉电,undo异常的情况,还是重建库更稳妥一些。最后补充一点,这个库稍微有点奇葩的地方是全库1.2tb,其中有个表的lob自动980gb,重建数据库是相对较慢的。对于大表,且有lob自动,通常建议基于分片,否则会报ora-01555错误的,如下是常用的一个基于rowid的分片脚本,供大家参考:
set verify off undefine rowid_ranges undefine segment_name undefine owner set head off set pages 0 set trimspool on select 'where rowid between ''' || sys.dbms_rowid.rowid_create(1, d.oid, c.fid1, c.bid1, 0) || ''' and ''' || sys.dbms_rowid.rowid_create(1, d.oid, c.fid2, c.bid2, 9999) || '''' || ';' from (select distinct b.rn, first_value(a.fid) over(partition by b.rn order by a.fid, a.bid rows between unbounded preceding and unbounded following) fid1, last_value(a.fid) over(partition by b.rn order by a.fid, a.bid rows between unbounded preceding and unbounded following) fid2, first_value(decode(sign(range2 - range1), 1, a.bid + ((b.rn - a.range1) * a.chunks1), a.bid)) over(partition by b.rn order by a.fid, a.bid rows between unbounded preceding and unbounded following) bid1, last_value(decode(sign(range2 - range1), 1, a.bid + ((b.rn - a.range1 + 1) * a.chunks1) - 1, (a.bid + a.blocks - 1))) over(partition by b.rn order by a.fid, a.bid rows between unbounded preceding and unbounded following) bid2 from (select fid, bid, blocks, chunks1, trunc((sum2 - blocks + 1 - 0.1) / chunks1) range1, trunc((sum2 - 0.1) / chunks1) range2 from (select /*+ rule */ relative_fno fid, block_id bid, blocks, sum(blocks) over() sum1, trunc((sum(blocks) over()) / &&rowid_ranges) chunks1, sum(blocks) over(order by relative_fno, block_id) sum2 from dba_extents where segment_name = upper('&&segment_name') and owner = upper('&&owner')) where sum1 > &&rowid_ranges) a, (select rownum - 1 rn from dual connect by level <= &&rowid_ranges) b where b.rn between a.range1 and a.range2) c, (select max(data_object_id) oid from dba_objects where object_name = upper('&&segment_name') and owner = upper('&&owner') and data_object_id is not null) d /
总结
以上就是这篇文章的全部内容了,希望本文的内容对大家的学习或者工作具有一定的参考学习价值,如果有疑问大家可以留言交流,谢谢大家对的支持。
下一篇: Java 多线程(二)—— 线程的同步