欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

ORA-00349|激活 ADG 备库时遇到的问题及处理方法

程序员文章站 2022-03-05 17:08:27
近日有一套实时同步的 asm 管理的单机 11204 adg 备库,由于业务需要,想要脱离主库的约束,想激活拉成读写库直接升级成 asm 管理的 19c,闪回快照模式无法满足要求,只能 alter d...

近日有一套实时同步的 asm 管理的单机 11204 adg 备库,由于业务需要,想要脱离主库的约束,想激活拉成读写库直接升级成 asm 管理的 19c,闪回快照模式无法满足要求,只能 alter database activate standby database 强制切成可读写的主库。说干就干,先将其切成主库,升级过程等下次在一起讨论。

--主库
--主库设置为 defer, 取消备库日志应用,关库启动到 mount 状态进行。
show parameter log_archive_dest_state_2
alter system set log_archive_dest_state_2=defer scope=both sid='*';
 
--备库
alter database recover managed standby database cancel;
shu immediate
startup mount
 
--强制拉成主库,很遗憾报错 ora-00349
sql> alter database activate standby database;
alter database activate standby database
*
error at line 1:
ora-00349: failure obtaining block size for '+jieke_data'
ora-15001: diskgroup "jieke_data" does not exist or is not mounted
ora-15001: diskgroup "jieke_data" does not exist or is not mounted

使用 activate 命令想强制拉成主库,很遗憾如下图报错 ora-00349。alert 日志中发现有很多清理 redo log 的报错,“ora-00313: open failed…”无法打开日志组 5、6、23,于是查看日志组成员确实发现 redolog 创建的有问题,member 成员显示的为不存在的磁盘组 “+jieke_data” 而不是具体路径,真是存在的磁盘组“+jieker_data”。这就是问题所在,redolog 创建错误,切成主库时 redolog 又是必须的,故报错了,那么现在就是将这个错误的 redolog 重建,问题就会得到解决。但实际上不是这样的,折腾了好久也没解决,继续往下看。

ORA-00349|激活 ADG 备库时遇到的问题及处理方法

group# member
---------- ---------------------------------------------------------------------------------------------------
  5 +jieke_data
  5 +jieke_data
  6 +jieke_data
  6 +jieke_data
 23 +jieke_data
 23 +jieke_data
 11 +jieker_data/jiekexu/onlinelog/group_11.1621.1065127343
 11 +jieker_arch/jiekexu/onlinelog/group_11.389.1065127355
 12 +jieker_data/jiekexu/onlinelog/group_12.1620.1065127363
 12 +jieker_arch/jiekexu/onlinelog/group_12.395.1065127371
 13 +jieker_data/jiekexu/onlinelog/group_13.1619.1065127381
 
 select to_char(f.group#) groupno, to_char(l.thread#) thread, f.member member, f.type redo_file_type, l.status group_status, f.status member_status, l.bytes/1024/1024 bytes, l.archived archived 
from v$logfile f, v$log l where f.group# = l.grsql> oup# order by f.group#, f.member;
sql> sql> sql> sql> sql> sql> sql> sql> sql> 2 
group thread member  redo type group status member status size(m) archived
------ ------ -------------------- ---------- ------------ --------------- -------- ----------
5 1 +jieke_data  online clearing invalid  4,096 yes
 1 +jieke_data  online clearing invalid  4,096 yes
6 2 +jieke_data  online clearing invalid  4,096 yes
 2 +jieke_data  online clearing invalid  4,096 yes
23 1 +jieke_data  online clearing_cur invalid  4,096 yes
      rent
 
 1 +jieke_data  online clearing_cur invalid  4,096 yes
      rent
 
27 2 +jieker_data/jiekexu/ online unused    4,096 yes
  onlinelog/group_27.1
  741.1065129955
 
28 2 +jieker_data/jiekexu/ online unused    4,096 yes
  onlinelog/group_28.1
  742.1065129973

alert 日志如下:

errors in file /app/oracle/diag/rdbms/jiekexu/jiekexu/trace/jiekexu_ora_30872.trc:
ora-00313: open failed for members of log group 5 of thread 1
clearing online redo logfile 5 +jieke_data
clearing online log 5 of thread 1 sequence number 4751
errors in file /app/oracle/diag/rdbms/jiekexu/jiekexu/trace/jiekexu_ora_30872.trc:
ora-00313: open failed for members of log group 5 of thread 1
errors in file /app/oracle/diag/rdbms/jiekexu/jiekexu/trace/jiekexu_ora_30872.trc:
ora-00313: open failed for members of log group 5 of thread 1
clearing online redo logfile 5 complete
errors in file /app/oracle/diag/rdbms/jiekexu/jiekexu/trace/jiekexu_ora_30872.trc:
ora-00313: open failed for members of log group 6 of thread 2
clearing online redo logfile 6 +jieke_data
clearing online log 6 of thread 2 sequence number 2592
errors in file /app/oracle/diag/rdbms/jiekexu/jiekexu/trace/jiekexu_ora_30872.trc:
ora-00313: open failed for members of log group 6 of thread 2
errors in file /app/oracle/diag/rdbms/jiekexu/jiekexu/trace/jiekexu_ora_30872.trc:
ora-00313: open failed for members of log group 6 of thread 2
clearing online redo logfile 6 complete
errors in file /app/oracle/diag/rdbms/jiekexu/jiekexu/trace/jiekexu_ora_30872.trc:
ora-00313: open failed for members of log group 23 of thread 1
clearing online redo logfile 23 +jieke_data
clearing online log 23 of thread 1 sequence number 4752
errors in file /app/oracle/diag/rdbms/jiekexu/jiekexu/trace/jiekexu_ora_30872.trc:
ora-00313: open failed for members of log group 23 of thread 1
errors in file /app/oracle/diag/rdbms/jiekexu/jiekexu/trace/jiekexu_ora_30872.trc:
ora-00313: open failed for members of log group 23 of thread 1
clearing online redo logfile 23 complete
resetting resetlogs activation id 2008461997 (0x77b6b2ad)
errors in file /app/oracle/diag/rdbms/jiekexu/jiekexu/trace/jiekexu_ora_30872.trc:
ora-00313: open failed for members of log group 5 of thread 1
errors in file /app/oracle/diag/rdbms/jiekexu/jiekexu/trace/jiekexu_ora_30872.trc:
ora-00313: open failed for members of log group 5 of thread 1
ora-349 signalled during: alter database activate physical standby database...
tue mar 02 22:42:30 2021
alter database drop logfile group 23
ora-1623 signalled during: alter database drop logfile group 23...
tue mar 02 22:45:07 2021
rfs[7]: assigned to rfs process 10180
rfs[7]: opened log for thread 2 sequence 2592 dbid 1797812601 branch 1063804222
archived log entry 1100 added for thread 2 sequence 2592 rlc 1063804222 id 0x77b6b2ad dest 2:
tue mar 02 22:45:08 2021
rfs[8]: assigned to rfs process 10277
rfs[8]: selected log 17 for thread 2 sequence 2593 dbid 1797812601 branch 1063804222
tue mar 02 22:45:08 2021
primary database is in maximum performance mode

重建备库 redolog

那么,这里将上演的是重建备库 redolog 的相关操作步骤。

sql> alter database drop logfile group 27;
alter database drop logfile group 27
*
error at line 1:
ora-01156: recovery or flashback in progress may need access to files
--由于开启了日志应用进程,直接删除会报错,故需要停止日志应用,修改参数 standby_file_management 为手动。
sql> alter database recover managed standby database cancel;
 
database altered.
sql> alter system set standby_file_management='manual' scope=both sid='*';
 
database altered.
sql> show parameter standby_file_management
 
name     type   value
------------------------------------ ---------------------- ------------------------------
standby_file_management  string   manual
sql> alter database drop logfile group 27;
 
database altered.
 
--由于日志组不能少于 2 个,故 日志组 28 不不能够删除。
sql> alter database drop logfile group 28;
alter database drop logfile group 28
*
error at line 1:
ora-01567: dropping log 28 would leave less than 2 log files for instance jiekexu2 (thread 2)
ora-00312: online log 28 thread 2: '+jieker_data/jiekexu/onlinelog/group_28.1742.1065129973'

下面则通过继续应用日志、重启、主库切日志、重命名等各种手段继续尝试删除这三个有问题的日志组。

--重启备库删除日志组 6、23,由于是当前日志组无法删除,庆幸日志组 5 成功删除了。
sql> alter database drop logfile group 6;
alter database drop logfile group 6
*
error at line 1:
ora-01623: log 6 is current log for instance jiekexu2 (thread 2) - cannot drop
ora-00312: online log 6 thread 2: '+jieke_data'
ora-00312: online log 6 thread 2: '+jieke_data'
 
sql> alter database drop logfile group 5;
 
database altered.
 
sql> alter database drop logfile group 23;
alter database drop logfile group 23
*
error at line 1:
ora-01623: log 23 is current log for instance jiekexu (thread 1) - cannot drop
ora-00312: online log 23 thread 1: '+jieke_data'
ora-00312: online log 23 thread 1: '+jieke_data'
 
--只剩两组日志组也是当前日志组,则当前日志组无法删除,尝试进行 rename 操作,但也是无效或者缺失命令。
 
sql> alter database rename '+jieke_data' to '+jieker_data';
alter database rename '+jieke_data' to '+jieker_data'
   *
error at line 1:
ora-02231: missing or invalid option to alter database
 
sql> alter database rename '+jieke_data' to '+jieker_data/jiekexu/onlinelog/group_6.dbf';
alter database rename '+jieke_data' to '+jieker_data/jiekexu/onlinelog/group_6.dbf'
   *
error at line 1:
ora-02231: missing or invalid option to alter database
 
--当然继续激活为主库肯定也是报错。那就继续开启日志同步模式,先保持备库同步吧。
sql> alter database activate standby database;
alter database activate standby database
*
error at line 1:
ora-00349: failure obtaining block size for '+jieke_data'
ora-15001: diskgroup "jieke_data" does not exist or is not mounted
ora-15001: diskgroup "jieke_data" does not exist or is not mounted
 
sql> alter database recover managed standby database using current logfile disconnect from session;
 
database altered.

重建备库控制文件解决

第二日早晨,睡醒之后头脑清晰想到备库既然无法删除,那主库肯定是可以删除的。通过主库删除日志组 6、23 之后,再重建一个备库的控制文件就可以解决。这样也很简单,主库删除备库有问题的两组日志组后使用 rman 备份一个备库的 控制文件,然后 scp 到备库,备库重启到 nomount 恢复控制文件,启动到 mount 就好了。

--主库:
sql> alter database drop logfile group 6;
database altered.
sql> alter database drop logfile group 23;
database altered.
 
rman target /
rman> backup current controlfile for standby format '/home/oracle/backup20210303%d_%i_%s_%p.ctl';
scp /home/oracle/backup20210303%d_%i_%s_%p.ctl jiekeadg:/home/oracle/
 
--备库
sql> alter database recover managed standby database cancel;
database altered.
 
sql> shu immediate 
ora-01109: database not open
database dismounted.
oracle instance shut down.
sql> startup nomount 
oracle instance started.
total system global area 1.0689e+11 bytes
fixed size   2265864 bytes
variable size  4.2144e+10 bytes
database buffers  6.4425e+10 bytes
redo buffers  323678208 bytes
sql> exit
 
jiekeadg:/home/oracle(jiekexu)>rman target /
recovery manager: release 11.2.0.4.0 - production on wed mar 3 10:07:34 2021
copyright (c) 1982, 2011, oracle and/or its affiliates. all rights reserved.
connected to target database: jiekexu (not mounted)
rman> restore standby controlfile from '/home/oracle/backup20210303jiekexu_1797812601_106_1.ctl';
starting restore at 2021-03-03 10:08:03
using target database control file instead of recovery catalog
allocated channel: ora_disk_1
channel ora_disk_1: sid=3009 device type=disk
channel ora_disk_1: restoring control file
channel ora_disk_1: restore complete, elapsed time: 00:00:01
output file name=+jieker_data/jiekexu/controlfile/current.1739.1065125909
output file name=+jieker_arch/jiekexu/controlfile/current.323.1065125911
finished restore at 2021-03-03 10:08:04
rman> sql'alter database mount';
sql statement: alter database mount
released channel: ora_disk_1
rman> exit
 
-- sqlplus 里应用 mrp0 进程同步数据。
jiekeadg:/home/oracle(jiekexu)>sqlplus / as sysdba
sql> alter database recover managed standby database using current logfile disconnect from session;
 
database altered.
sql> /
 
name  value  unit    time_computed
------------- -------------------- ------------------------------ ------------------------------
transport lag +00 00:00:00  day(2) to second(0) interval 03/03/2021 10:19:08
apply lag +00 00:00:00  day(2) to second(0) interval 03/03/2021 10:19:08

重建控制文件后恢复的备库中就没有了错误的磁盘组,但这样时间长一些则备库日志应用会延迟,因为 standby_log 日志状态全部为 unassigned 的。需要重建备库 standby redolog 日志组后它的状态才会变成 active。
取消日志应用,修改参数为 manual 删除原有日志组,重建日志组改回参数应用日志同步进程即可。

select to_char(f.group#) groupno, to_char(l.thread#) thread, f.member member, f.type redo_file_type, l.status group_status, f.status member_status, l.bytes/1024/1024 bytes, l.archived archived 
from v$logfile f, v$log l where f.group# = l.group# order by f.group#, f.member;sql> 
sql> sql> sql> sql> sql> sql> sql> sql> sql> 2 
group thread member  redo type group status member status size(m) archived
------ ------ -------------------- ---------- ------------ --------------- -------- ----------
1 1 +jieke_data  online clearing invalid  4,096 yes
 1 +jieke_data  online clearing invalid  4,096 yes
2 1 +jieke_data  online clearing invalid  4,096 yes
 1 +jieke_data  online clearing invalid  4,096 yes
3 1 +jieke_data  online clearing invalid  4,096 yes
 1 +jieke_data  online clearing invalid  4,096 yes
4 1 +jieke_data  online current invalid  4,096 yes
 1 +jieke_data  online current invalid  4,096 yes
5 1 +jieke_data  online clearing invalid  4,096 yes
 1 +jieke_data  online clearing invalid  4,096 yes
7 2 +jieke_data  online clearing invalid  4,096 yes
 2 +jieke_data  online clearing invalid  4,096 yes
8 2 +jieke_data  online clearing invalid  4,096 yes
 2 +jieke_data  online clearing invalid  4,096 yes
9 2 +jieke_data  online clearing invalid  4,096 yes
 2 +jieke_data  online clearing invalid  4,096 yes
10 2 +jieke_data  online current invalid  4,096 yes
 2 +jieke_data  online current invalid  4,096 yes
24 2 +jieke_data  online clearing invalid  4,096 yes
 2 +jieke_data  online clearing invalid  4,096 yes
 
sql> alter database recover managed standby database cancel;
database altered.
sql> show parameter standby_file_management
name     type   value
------------------------------------ ---------------------- ------------------------------standby_file_management  string   manual
sql> 
sql> select inst_id,group#,thread#,sequence#,used,archived,status from gv$standby_log; 
 inst_id group# thread# sequence# used archived status
---------- ---------- ---------- ---------- ---------- ---------- --------------------
  1  11  1  0  0 yes unassigned
  1  12  1  0  0 yes unassigned
  1  13  1  0  0 yes unassigned
  1  14  1  0  0 yes unassigned
  1  15  1  0  0 yes unassigned
 
alter database drop logfile group 11;
alter database drop logfile group 12; 
alter database drop logfile group 13;
alter database drop logfile group 14;
alter database drop logfile group 15;
alter database add standby logfile thread 1 ('+jieker_data','+jieker_arch') size 4g;
alter database add standby logfile thread 1 ('+jieker_data','+jieker_arch') size 4g;
alter database add standby logfile thread 1 ('+jieker_data','+jieker_arch') size 4g;
alter database add standby logfile thread 1 ('+jieker_data','+jieker_arch') size 4g;
alter database add standby logfile thread 1 ('+jieker_data','+jieker_arch') size 4g;
 
sql> select inst_id,group#,thread#,sequence#,used,archived,status from gv$standby_log; 
 inst_id group# thread# sequence# used archived status
---------- ---------- ---------- ---------- ---------- ---------- --------------------
  1  6  1 4797 3026358272 yes active
  1  11  1  0  0 yes unassigned
  1  12  1  0  0 yes unassigned
  1  13  1  0  0 yes unassigned
  1  14  1  0  0 yes unassigned
sql> alter system set standby_file_management='auto' scope=both sid='*';
sql> alter database open;
alter database recover managed standby database using current logfile disconnect from session;
database altered.

找到最终问题所在

完成重建后,本以为已经万事大吉了,但一开库应用日志却发现后台日志中所有 redolog 日志组都被清理了。这才意识到问题的根源所在,立马查看了路径相关的参数发现 db_create_online_log_dest 参数设置错误,导致创建出了错误的磁盘组

jiekeadg:/home/oracle(jiekexu)>sqlplus / as sysdba 
 
sql*plus: release 11.2.0.4.0 production on wed mar 3 10:52:04 2021
 
copyright (c) 1982, 2013, oracle. all rights reserved.
 
connected to:
oracle database 11g enterprise edition release 11.2.0.4.0 - 64bit production
with the partitioning, automatic storage management, olap, data mining
and real application testing options
 
sql> show parameter db_create_online_log_dest_ 
 
name     type value
------------------------------------ ----------- ------------------------------
db_create_online_log_dest_1  string +jieke_data
db_create_online_log_dest_2  string +jieke_data
db_create_online_log_dest_3  string
db_create_online_log_dest_4  string
db_create_online_log_dest_5  string
sql> alter system set db_create_online_log_dest_1='+jieker_data' scope=spfile;
 
system altered.
 
sql> alter system set db_create_online_log_dest_2='+jieker_data' scope=spfile;
 
system altered.

但是现在修改完参数所有的日志组成员也都出现在错误的磁盘组了,因为有了当前日志组 4、10 占用了,重建日志组也行不通,故只能再次重建备库控制文件了。以上主库备份控制文件传到备库恢复的过程再来一遍即可。

select to_char(f.group#) groupno, to_char(l.thread#) thread, f.member member, f.type redo_file_type, l.status group_status, f.status member_status, l.bytes/1024/1024 bytes, l.archived archived 
from v$logfile f, v$log l where f.group# = l.group# order by f.group#, f.member;sql> 
sql> sql> sql> sql> sql> sql> sql> sql> sql> 2 
group thread member  redo type group status member status size(m) archived
------ ------ -------------------- ---------- ------------ --------------- -------- ----------
1 1 +jieke_data  online clearing invalid  4,096 yes
 1 +jieke_data  online clearing invalid  4,096 yes
2 1 +jieke_data  online clearing invalid  4,096 yes
 1 +jieke_data  online clearing invalid  4,096 yes
3 1 +jieke_data  online clearing invalid  4,096 yes
 1 +jieke_data  online clearing invalid  4,096 yes
4 1 +jieke_data  online current invalid  4,096 yes
 1 +jieke_data  online current invalid  4,096 yes
5 1 +jieke_data  online clearing invalid  4,096 yes
 1 +jieke_data  online clearing invalid  4,096 yes
7 2 +jieke_data  online clearing invalid  4,096 yes
 2 +jieke_data  online clearing invalid  4,096 yes
8 2 +jieke_data  online clearing invalid  4,096 yes
 2 +jieke_data  online clearing invalid  4,096 yes
9 2 +jieke_data  online clearing invalid  4,096 yes
 2 +jieke_data  online clearing invalid  4,096 yes
10 2 +jieke_data  online current invalid  4,096 yes
 2 +jieke_data  online current invalid  4,096 yes
24 2 +jieke_data  online clearing invalid  4,096 yes
 2 +jieke_data  online clearing invalid  4,096 yes
 
20 rows selected.

最终激活备库

再一次重建后正常恢复同步,然后关闭实例启动到 mount 状态,激活 adg 备库,重启验证即可。

sql> shu immediate 
database closed.
database dismounted.
oracle instance shut down.
sql> 
sql> startup mount 
oracle instance started.
 
total system global area 1.0689e+11 bytes
fixed size   2265864 bytes
variable size  4.2144e+10 bytes
database buffers  6.4425e+10 bytes
redo buffers  323678208 bytes
database mounted.
sql> 
sql> alter database activate standby database;
 
database altered.
 
sql> alter database open;
 
database altered.
 
sql> select database_role from v$database;
 
database_role
--------------------------------
primary
 
sql> shu immediate 
database closed.
database dismounted.
oracle instance shut down.
sql> startup 
oracle instance started.
total system global area 1.0689e+11 bytes
fixed size   2265864 bytes
variable size  4.2144e+10 bytes
database buffers  6.4425e+10 bytes
redo buffers  323678208 bytes
database mounted.
database opened.
sql> exit

到此这篇关于ora-00349|激活 adg 备库时遇到的问题及处理方法的文章就介绍到这了,更多相关ora-00349内容请搜索以前的文章或继续浏览下面的相关文章希望大家以后多多支持!