欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  数据库

11.2.0.3 Linux RAC 报错 CRS-5018:(:CLSN00037:) Removed unused

程序员文章站 2024-02-20 20:07:28
...

11.2.0.3 Linux RAC 报错 CRS-5018:(:CLSN00037:) Removed unused HAIP route:

某个支付系统11.2.0.3的rac系统,其中一个节点忽然无法启动

1.尝试关闭集群重新启动集群

[root@rac2 ~]$ crsctl stop crs -f

2.尝试重新启动集群

[root@rac2 ~]$ crsctl start crs

CRS-4123: Oracle High Availability Services has been started.

集群启动成功,无其他报错,,此时感觉asm已经起来了

[grid@rac2 ~]$ ps -ef | grep smon

root 8472 1 1 14:14 ? 00:00:01 /u01/app/11.2/product/crs_1/bin/osysmond.bin

grid 9238 1 0 14:15 ? 00:00:00 asm_smon_+ASM2

grid 9500 6212 0 14:16 pts/5 00:00:00 grep smon

这时试图连接ASM查看dg状态,奇怪的事情发生了,asm可以登录,但是查询asmdisk报错

[grid@rac2 ~]$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.3.0 Production on Wed May 21 15:35:05 2014

Connected to:

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

With the Real Application Clusters and Automatic Storage Management options

SQL> select status from v$asm_disk;

select status from v$asm_disk

*

ERROR at line 1:

ORA-01034: ORACLE not available

Process ID: 14465

Session ID: 2707 Serial number: 3

查看d.bin进程,发现crsd并没有起来

[root@rac01 ~]# ps -ef | grep d.bin

root 4142 1 0 09:27 ? 00:00:05 /u01/app/11.2.0/grid/product/db_1/bin/ohasd.bin reboot

grid 4548 1 0 09:28 ? 00:00:00 /u01/app/11.2.0/grid/product/db_1/bin/mdnsd.bin

grid 4558 1 0 09:28 ? 00:00:01 /u01/app/11.2.0/grid/product/db_1/bin/gpnpd.bin

grid 4568 1 0 09:28 ? 00:00:05 /u01/app/11.2.0/grid/product/db_1/bin/gipcd.bin

root 4590 1 0 09:28 ? 00:00:10 /u01/app/11.2.0/grid/product/db_1/bin/osysmond.bin

再次查看asm进程,还是存在相应的asm实例进程

[grid@rac2 ~]$ ps -ef | grep smon

root 17377 1 2 14:31 ? 00:00:09 /u01/app/11.2/product/crs_1/bin/osysmond.bin

grid 21518 1 0 14:37 ? 00:00:00 asm_smon_+ASM2

grid 21615 18834 0 14:37 pts/5 00:00:00 grep smon

--------------------------------------分割线 --------------------------------------

在CentOS 6.4下安装Oracle 11gR2(x64)

Oracle 11gR2 在VMWare虚拟机中安装步骤

Debian 下 安装 Oracle 11g XE R2

--------------------------------------分割线 --------------------------------------

登录asmcmd,也同样抛出异常

[grid@rac2 ~]$ asmcmd

ORA-01034: ORACLE not available

Process ID: 21716

Session ID: 2707 Serial number: 1 (DBD ERROR: OCIStmtExecute/Describe)

ocr无法进行正常check

[root@rac2 bin]# ./ocrcheck

PROT-602: Failed to retrieve data from the cluster registry

PROC-26: Error while accessing the physical storage

ORA-15077: could not locate ASM instance serving a required diskgroup

关闭asm,尝试手动启动asm实例,同样无法启动asm实例

[grid@rac2 ~]$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.3.0 Production on Wed May 21 14:33:37 2014

Copyright (c) 1982, 2011, Oracle. All rights reserved.

SQL> shutdown abort

ASM instance shutdown

SQL> startup

ORA-27103: internal error

Linux-x86_64 Error: 2: No such file or directory

Additional information: 1

Additional information: 25919497

Additional information: 2

通过+ASM1实例的spfile 创建新的pfile尝试启动+ASM2实例,同样无法启动+ASM2实例

SQL> startup nomount pfile='/tmp/init+asm2.ora';

ORA-24324: service handle not initialized

ORA-01041: internal error. hostdef extension doesn't exist

期初怀疑是存储的问题,于是开始检查存储状态,本套RAC使用multipath +asmlib的架构

查询multipath 状态,均为active

[root@rac2 bin]# multipath -ll

360a9800044336b327a24446172587864 dm-5 NETAPP,LUN

[size=200G][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 alua][rw]

\_ round-robin 0 [prio=50][active]

\_ 4:0:1:0 sdau 66:224 [active][ready]

\_ 3:0:1:0 sdq 65:0 [active][ready]

\_ round-robin 0 [prio=10][enabled]

\_ 4:0:0:0 sdaf 65:240 [active][ready]

\_ 3:0:0:0 sdb 8:16 [active][ready]

360a9800044336b327a24446172587862 dm-6 NETAPP,LUN

[size=200G][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 alua][rw]

\_ round-robin 0 [prio=50][active]

\_ 4:0:1:1 sdav 66:240 [active][ready]

\_ 3:0:1:1 sdr 65:16 [active][ready]

\_ round-robin 0 [prio=10][enabled]

\_ 4:0:0:1 sdag 66:0 [active][ready]

\_ 3:0:0:1 sdc 8:32 [active][ready]

360a9800044336b327a2444617258786e dm-13 NETAPP,LUN

[size=200G][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 alua][rw]

\_ round-robin 0 [prio=50][active]

\_ 4:0:1:8 sdbc 67:96 [active][ready]

\_ 3:0:1:8 sdy 65:128 [active][ready]

\_ round-robin 0 [prio=10][enabled]

\_ 4:0:0:8 sdan 66:112 [active][ready]

\_ 3:0:0:8 sdj 8:144 [active][ready]

360a9800044336b327a24446172587876 dm-11 NETAPP,LUN

[size=2.0G][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 alua][rw]

\_ round-robin 0 [prio=50][active]

\_ 4:0:1:6 sdba 67:64 [active][ready]

\_ 3:0:1:6 sdw 65:96 [active][ready]

\_ round-robin 0 [prio=10][enabled]

\_ 4:0:0:6 sdal 66:80 [active][ready]

\_ 3:0:0:6 sdh 8:112 [active][ready]

[size=10G][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 alua][rw]

\_ round-robin 0 [prio=50][active]

\_ 3:0:1:13 sdac 65:192 [active][ready]

\_ 4:0:1:13 sdbg 67:160 [active][ready]

\_ round-robin 0 [prio=10][enabled]

\_ 4:0:0:13 sdar 66:176 [active][ready]

\_ 3:0:0:13 sdn 8:208 [active][ready]

360a9800044336b327a24446172587868 dm-0 NETAPP,LUN

[size=200G][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 alua][rw]

\_ round-robin 0 [prio=50][active]

\_ 3:0:1:10 sdaa 65:160 [active][ready]

\_ 4:0:1:10 sdbe 67:128 [active][ready]

\_ round-robin 0 [prio=10][enabled]

\_ 4:0:0:10 sdap 66:144 [active][ready]

\_ 3:0:0:10 sdl 8:176 [active][ready]

360a9800044336b327a24446172587870 dm-9 NETAPP,LUN

[size=2.0G][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 alua][rw]

\_ round-robin 0 [prio=50][active]

\_ 4:0:1:4 sday 67:32 [active][ready]

\_ 3:0:1:4 sdu 65:64 [active][ready]

\_ round-robin 0 [prio=10][enabled]

\_ 4:0:0:4 sdaj 66:48 [active][ready]

\_ 3:0:0:4 sdf 8:80 [active][ready]

360a9800044336b327a24446172587970 dm-1 NETAPP,LUN

\_ round-robin 0 [prio=50][active]

\_ 3:0:1:15 sdae 65:224 [active][ready]

\_ 4:0:1:15 sdbi 67:192 [active][ready]

\_ round-robin 0 [prio=10][enabled]

\_ 4:0:0:15 sdat 66:208 [active][ready]

\_ 3:0:0:15 sdp 8:240 [active][ready]

360a9800044336b327a24446172587a55 dm-3 NETAPP,LUN

[size=5.0G][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 alua][rw]

\_ round-robin 0 [prio=50][active]

\_ 3:0:1:14 sdad 65:208 [active][ready]

\_ 4:0:1:14 sdbh 67:176 [active][ready]

\_ round-robin 0 [prio=10][enabled]

\_ 4:0:0:14 sdas 66:192 [active][ready]

\_ 3:0:0:14 sdo 8:224 [active][ready]

360a9800044336b327a2444617258786a dm-8 NETAPP,LUN

[size=200G][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 alua][rw]

\_ round-robin 0 [prio=50][active]

\_ 4:0:1:3 sdax 67:16 [active][ready]

\_ 3:0:1:3 sdt 65:48 [active][ready]

\_ round-robin 0 [prio=10][enabled]

\_ 4:0:0:3 sdai 66:32 [active][ready]

\_ 3:0:0:3 sde 8:64 [active][ready]

360a9800044336b327a24446172587872 dm-14 NETAPP,LUN

[size=200G][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 alua][rw]

\_ round-robin 0 [prio=50][active]

\_ 4:0:1:9 sdbd 67:112 [active][ready]

\_ 3:0:1:9 sdz 65:144 [active][ready]

\_ round-robin 0 [prio=10][enabled]

\_ 4:0:0:9 sdao 66:128 [active][ready]

\_ 3:0:0:9 sdk 8:160 [active][ready]

360a9800044336b327a24446172587972 dm-2 NETAPP,LUN

[size=10G][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 alua][rw]

\_ round-robin 0 [prio=50][active]

\_ 3:0:1:13 sdac 65:192 [active][ready]

\_ 4:0:1:13 sdbg 67:160 [active][ready]

\_ round-robin 0 [prio=10][enabled]

\_ 4:0:0:13 sdar 66:176 [active][ready]

\_ 3:0:0:13 sdn 8:208 [active][ready]

360a9800044336b327a24446172587868 dm-0 NETAPP,LUN

[size=200G][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 alua][rw]

\_ round-robin 0 [prio=50][active]

\_ 3:0:1:10 sdaa 65:160 [active][ready]

\_ 4:0:1:10 sdbe 67:128 [active][ready]

\_ round-robin 0 [prio=10][enabled]

\_ 4:0:0:10 sdap 66:144 [active][ready]

\_ 3:0:0:10 sdl 8:176 [active][ready]

查询asmlib中asm磁盘,也都可以正常识别,此时排除了存储的故障,开始定位grid的日志

[root@rac2 bin]# /etc/init.d/multipathd restart

Stopping multipathd daemon: [ OK ]

Starting multipathd daemon: [ OK ]

[root@rac2 bin]# /etc/init.d/oracleasm restart

Dropping Oracle ASMLib disks: [ OK ]

Shutting down the Oracle ASMLib driver: [ OK ]

Initializing the Oracle ASMLib driver: [ OK ]

Scanning the system for Oracle ASMLib disks: [ OK ]

[root@rac2 bin]# oracleasm scandisks

Reloading disk partitions: done

Cleaning any stale ASM disks...

Scanning system for ASM disks...

[root@rac2 bin]# oracleasm listdisks

ASMLB_OCR01

ASMLB_OCR02

ASMLB_OCR03

ASMLB_ORCL01

ASMLB_ORCL02

ASMLB_ORCL03

ASMLB_ORCLCTL

ASMLB_ORCLFRA

ASMLB_ORCLREDO

ASMLB_YYZF01

ASMLB_YYZF02

ASMLB_YYZF03

ASMLB_YYZFCTL

ASMLB_YYZFFRA

ASMLB_YYZREDO

重新启动crs后报,系统仍存在ASM进程,但asm无法mount磁盘,crsd依旧无法启动

alerrac2.log的日志如下

[/u01/app/11.2/product/crs_1/bin/orarootagent.bin(13382)]CRS-5018:(:CLSN00037:) Removed unused HAIP route: 169.254.95.0 / 255.255.255.0 / 0.0.0.0 / usb0

2014-05-21 13:18:09.087

[/u01/app/11.2/product/crs_1/bin/orarootagent.bin(13382)]CRS-5018:(:CLSN00037:) Removed unused HAIP route: 169.254.95.0 / 255.255.255.0 / 0.0.0.0 / usb0

2014-05-21 13:22:37.716

[ctssd(17298)]CRS-2409:The clock on host rac2 is not synchronous with the mean cluster time. No action has been taken as the Cluster Time Synchronization Service is running in observer mode.

2014-05-21 13:38:09.677

[/u01/app/11.2/product/crs_1/bin/orarootagent.bin(13382)]CRS-5018:(:CLSN00037:) Removed unused HAIP route: 169.254.95.0 / 255.255.255.0 / 0.0.0.0 / usb0

2014-05-21 13:52:38.315

[ctssd(17298)]CRS-2409:The clock on host rac2 is not synchronous with the mean cluster time. No action has been taken as the Cluster Time Synchronization Service is running in observer mode.

2014-05-21 13:58:10.245

[/u01/app/11.2/product/crs_1/bin/orarootagent.bin(13382)]CRS-5018:(:CLSN00037:) Removed unused HAIP route: 169.254.95.0 / 255.255.255.0 / 0.0.0.0 / usb0

[gpnpd(8438)]CRS-2328:GPNPD started on node rac2.

2014-05-21 14:14:29.618

[cssd(8520)]CRS-1713:CSSD daemon is started in clustered mode

2014-05-21 14:14:31.419

[ohasd(8217)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE

2014-05-21 14:14:50.409

[cssd(8520)]CRS-1707:Lease acquisition for node rac2 number 2 completed

2014-05-21 14:14:51.693

[cssd(8520)]CRS-1621:The IPMI configuration data for this node stored in the Oracle registry is incomplete; details at (:CSSNK00002:) in /u01/app/11.2/product/crs_1/log/rac2/cssd/ocssd.log

2014-05-21 14:14:51.693

[cssd(8520)]CRS-1617:The information required to do node kill for node rac2 is incomplete; details at (:CSSNM00004:) in /u01/app/11.2/product/crs_1/log/rac2/cssd/ocssd.log

2014-05-21 14:14:51.696

[cssd(8520)]CRS-1605:CSSD voting file is online: ORCL:ASMLB_OCR03; details in /u01/app/11.2/product/crs_1/log/rac2/cssd/ocssd.log.

2014-05-21 14:14:51.700

[cssd(8520)]CRS-1605:CSSD voting file is online: ORCL:ASMLB_OCR02; details in /u01/app/11.2/product/crs_1/log/rac2/cssd/ocssd.log.

2014-05-21 14:14:51.704

[cssd(8520)]CRS-1605:CSSD voting file is online: ORCL:ASMLB_OCR01; details in /u01/app/11.2/product/crs_1/log/rac2/cssd/ocssd.log.

2014-05-21 14:14:56.235

[cssd(8520)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac2 rac2 .

2014-05-21 14:14:58.695

[ctssd(8722)]CRS-2403:The Cluster Time Synchronization Service on host rac2 is in observer mode.

2014-05-21 14:14:58.973

[ctssd(8722)]CRS-2407:The new Cluster Time Synchronization Service reference node is host rac2.

2014-05-21 14:14:58.974

[ctssd(8722)]CRS-2401:The Cluster Time Synchronization Service started on host rac2.

[client(8758)]CRS-10001:21-May-14 14:14 ACFS-9391: Checking for existing ADVM/ACFS installation.

[client(8763)]CRS-10001:21-May-14 14:14 ACFS-9392: Validating ADVM/ACFS installation files for operating system.

[client(8765)]CRS-10001:21-May-14 14:14 ACFS-9393: Verifying ASM Administrator setup.

[client(8768)]CRS-10001:21-May-14 14:14 ACFS-9308: Loading installed ADVM/ACFS drivers.

[client(8771)]CRS-10001:21-May-14 14:14 ACFS-9154: Loading 'oracleoks.ko' driver.

[client(8812)]CRS-10001:21-May-14 14:15 ACFS-9154: Loading 'oracleadvm.ko' driver.

[client(8854)]CRS-10001:21-May-14 14:15 ACFS-9154: Loading 'oracleacfs.ko' driver.

[client(8966)]CRS-10001:21-May-14 14:15 ACFS-9327: Verifying ADVM/ACFS devices.

[client(8971)]CRS-10001:21-May-14 14:15 ACFS-9156: Detecting control device '/dev/asm/.asm_ctl_spec'.

[client(8975)]CRS-10001:21-May-14 14:15 ACFS-9156: Detecting control device '/dev/ofsctl'.

[client(8981)]CRS-10001:21-May-14 14:15 ACFS-9322: completed

2014-05-21 14:26:29.083

[/u01/app/11.2/product/crs_1/bin/orarootagent.bin(14037)]CRS-5018:(:CLSN00037:) Removed unused HAIP route: 169.254.95.0 / 255.255.255.0 / 0.0.0.0 / usb0

[cssd(14105)]CRS-1612:Network communication with node rac2 (1) missing for 50% of timeout interval. Removal of this node from cluster in 14.010 seconds

2014-05-21 14:28:40.898

[cssd(14105)]CRS-1662:Member kill requested by node rac2 for member number 1, group DB+ASM

2014-05-21 14:28:42.637

[/u01/app/11.2/product/crs_1/bin/oraagent.bin(13997)]CRS-5019:All OCR locations are on ASM disk groups [OCRDG],

and none of these disk groups are mounted. Details are at "(:CLSN00100:)"

in "/u01/app/11.2/product/crs_1/log/rac2/agent/ohasd/oraagent_grid/oraagent_grid.log".

2014-05-21 14:28:42.637

[/u01/app/11.2/product/crs_1/bin/oraagent.bin(13997)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)"

in "/u01/app/11.2/product/crs_1/log/rac2/agent/ohasd/oraagent_grid/oraagent_grid.log"

2014-05-21 14:28:42.919

[/u01/app/11.2/product/crs_1/bin/oraagent.bin(13997)]CRS-5019:

All OCR locations are on ASM disk groups [OCRDG], and none of these disk groups are mounted. Details are at "(:CLSN00100:)"

in "/u01/app/11.2/product/crs_1/log/rac2/agent/ohasd/oraagent_grid/oraagent_grid.log".

2014-05-21 14:28:42.920

[/u01/app/11.2/product/crs_1/bin/oraagent.bin(13997)]

CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)"

in "/u01/app/11.2/product/crs_1/log/rac2/agent/ohasd/oraagent_grid/oraagent_grid.log"

2014-05-21 14:28:43.147

[/u01/app/11.2/product/crs_1/bin/oraagent.bin(13997)]CRS-5019

:All OCR locations are on ASM disk groups [OCRDG], and none of these disk groups are mounted.

Details are at "(:CLSN00100:)" in "/u01/app/11.2/product/crs_1/log/rac2/agent/ohasd/oraagent_grid/oraagent_grid.log".

2014-05-21 14:28:43.147

[/u01/app/11.2/product/crs_1/bin/oraagent.bin(13997)]

CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)"

in "/u01/app/11.2/product/crs_1/log/rac2/agent/ohasd/oraagent_grid/oraagent_grid.log"

2014-05-21 14:28:43.429

[/u01/app/11.2/product/crs_1/bin/oraagent.bin(13997)]

CRS-5019:All OCR locations are on ASM disk groups [OCRDG], and none of these disk groups are mounted.

Details are at "(:CLSN00100:)" in "/u01/app/11.2/product/crs_1/log/rac2/agent/ohasd/oraagent_grid/oraagent_grid.log".

2014-05-21 14:28:43.430

[/u01/app/11.2/product/crs_1/bin/oraagent.bin(13997)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)"

in "/u01/app/11.2/product/crs_1/log/rac2/agent/ohasd/oraagent_grid/oraagent_grid.log"

2014-05-21 14:30:44.648

[cssd(14105)]CRS-1662:Member kill requested by node rac2 for member number 1, group DB+ASM

2014-05-21 14:30:47.200

[/u01/app/11.2/product/crs_1/bin/oraagent.bin(13997)]CRS-5019:All OCR locations are on ASM disk groups [OCRDG],

and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/u01/app/11.2/product/crs_1/log/rac2/agent/ohasd/oraagent_grid/oraagent_grid.log".

2014-05-21 14:30:47.201