RAC环境中HACMP的vg为non-conrrent的解决经历
程序员文章站
2022-05-11 12:36:55
...
在rac环境中,HACMP的vg为unconcurrent状态,是多么糟糕的一件事,而这个不幸就在某生产系统上发生了。环境介绍:AIX6.1的系统,使用的是EMCCLARiiON存储,oracl
环境介绍:
AIX 6.1的系统,使用的是EMC CLARiiON存储,oracle10.2.0.5
问题状况:
先看下各个卷组的状态
data03vg
lsvg data03vg VOLUME GROUP: data03vg VG IDENTIFIER: 00f79d1100004c00000001386f00edfb VG STATE: active PP SIZE: 128 megabyte (s) VG PERMISSION: read/write TOTAL PPs: 5315 (680320 megabytes) MAX LVs: 512 FREE PPs: 711 (91008 megabytes) LVs: 78 USED PPs: 4604 (589312 megabytes) OPEN LVs: 0 QUORUM: 3 (Enabled) TOTAL PVs: 5 VG DESCRIPTORS: 5 STALE PVs: 0 STALE PPs: 0 ACTIVE PVs: 5 AUTO ON: no Concurrent: Enhanced-Capable Auto-Concurrent: Disabled VG Mode: Non-Concurrent MAX PPs per VG: 130048 MAX PPs per PV: 2032 MAX PVs: 64 LTG size (Dynamic): 1024 kilobyte(s) AUTO SYNC: no HOT SPARE: no BB POLICY: relocatable PV RESTRICTION: none INFINITE RETRY: nodata01vg
lsvg data01vg VOLUME GROUP: data01vg VG IDENTIFIER: 00f79d1100004c00000001386effcc48 VG STATE: active PP SIZE: 128 megabyte (s) VG PERMISSION: read/write TOTAL PPs: 6378 (816384 megabytes) MAX LVs: 512 FREE PPs: 1146 (146688 megabytes) LVs: 88 USED PPs: 5232 (669696 megabytes) OPEN LVs: 0 QUORUM: 4 (Enabled) TOTAL PVs: 6 VG DESCRIPTORS: 6 STALE PVs: 0 STALE PPs: 0 ACTIVE PVs: 6 AUTO ON: no Concurrent: Enhanced-Capable Auto-Concurrent: Disabled VG Mode: Non-Concurrent MAX PPs per VG: 130048 MAX PPs per PV: 2032 MAX PVs: 64 LTG size (Dynamic): 1024 kilobyte(s) AUTO SYNC: no HOT SPARE: no BB POLICY: relocatable PV RESTRICTION: none INFINITE RETRY: nodata02vg
lsvg data02vg VOLUME GROUP: data02vg VG IDENTIFIER: 00f79d1100004c00000001386f007c90 VG STATE: active PP SIZE: 128 megabyte (s) VG PERMISSION: read/write TOTAL PPs: 2126 (272128 megabytes) MAX LVs: 512 FREE PPs: 18 (2304 megabytes) LVs: 39 USED PPs: 2108 (269824 megabytes) OPEN LVs: 0 QUORUM: 2 (Enabled) TOTAL PVs: 2 VG DESCRIPTORS: 3 STALE PVs: 0 STALE PPs: 0 ACTIVE PVs: 2 AUTO ON: no Concurrent: Enhanced-Capable Auto-Concurrent: Disabled VG Mode: Non-Concurrent MAX PPs per VG: 130048 MAX PPs per PV: 2032 MAX PVs: 64 LTG size (Dynamic): 1024 kilobyte(s) AUTO SYNC: no HOT SPARE: no BB POLICY: relocatable PV RESTRICTION: none INFINITE RETRY: novg中pv的状态:
data03vg
lsvg -p data03vg data03vg: PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION hdiskpower11 active 1063 43 01..00..00..00..42 hdiskpower17 removed 1063 167 21..00..00..00..146 hdiskpower18 removed 1063 167 21..00..00..00..146 hdiskpower19 removed 1063 167 21..00..00..00..146 hdiskpower20 removed 1063 167 21..00..00..00..146data01vg
lsvg -p data01vg data01vg: PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION hdiskpower7 active 1063 0 00..00..00..00..00 hdiskpower8 active 1063 20 00..00..00..00..20 hdiskpower9 active 1063 24 02..00..00..00..22 hdiskpower10 active 1063 0 00..00..00..00..00 hdiskpower16 missing 1063 551 21..00..105..212..213 hdiskpower21 missing 1063 551data02vg
lsvg -p data02vg data02vg: PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION hdiskpower0 active 1063 0 00..00..00..00..00 hdiskpower24 active 1063 18 00..00..00..00..18有好多盘不是missing就是removed的,数据库日志报错为:
Thu Mar 21 17:53:58 BEIST 2013
Errors in
file /oracle/app/oracle/admin/ctsdb/bdump/ctsdb2_m000_19595456.trc:
ORA-27072: File I/O error
IBM AIX RISC System/6000 Error: 5: I/O error
两节点的gsclvmd 都是inoperative,看来只能重启hacmp来把gsclvmd给拉起来。
解决过程:
1.先进行数据库的备份,,然后停库:
节点1: su – oracle srvctl stop listener –n ctscrm1 ps –ef | grep “LOCAL=NO”| grep –v grep | awk ‘{print $2}’|xargs kill -9 oracle> alter system switch logfile; oracle> alter system checkpoint; srvctl stop instance –d ctsdb –I ctsdb1节点2: su – oracle srvctl stop listener –n ctscrm2 ps –ef | grep “LOCAL=NO”| grep –v grep | awk ‘{print $2}’|xargs kill -9 oracle> alter system switch logfile; oracle> alter system checkpoint; srvctl stop instance –d ctsdb –I ctsdb2关闭crs:节点1和节点2
crsctlstop crs
2.重启hacmp
smit clstop