欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  数据库

IP地址被清空导致实例重启

程序员文章站 2022-05-19 11:06:35
...

客户10.2.0.4 RAC for Solaris 10环境突然出现了实例重启的现象。 数据库正常运行到下午3点左右,随后两个节点分别重启,其中一个节点上的实例无法自动启动。检查两个实例的告警日志发现,在节点重启前,两个节点都出现了明显的ORA-27504错误: Wed Apr 10 1

客户10.2.0.4 RAC for Solaris 10环境突然出现了实例重启的现象。
数据库正常运行到下午3点左右,随后两个节点分别重启,其中一个节点上的实例无法自动启动。检查两个实例的告警日志发现,在节点重启前,两个节点都出现了明显的ORA-27504错误:

Wed Apr 10 15:00:05 2013
Errors IN file /oracle/admin/orcl/udump/orcl1_ora_10997.trc:
ORA-00603: ORACLE server SESSION TERMINATED BY fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed WITH STATUS: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 192.168.168.3 NOT found. CHECK output FROM ifconfig command
Wed Apr 10 15:00:06 2013
Errors IN file /oracle/admin/orcl/udump/orcl1_ora_11007.trc:
ORA-00603: ORACLE server SESSION TERMINATED BY fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed WITH STATUS: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 192.168.168.3 NOT found. CHECK output FROM ifconfig command
Wed Apr 10 15:00:06 2013
Errors IN file /oracle/admin/orcl/udump/orcl1_ora_11009.trc:
ORA-00603: ORACLE server SESSION TERMINATED BY fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed WITH STATUS: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 192.168.168.3 NOT found. CHECK output FROM ifconfig command
Wed Apr 10 15:00:06 2013
Errors IN file /oracle/admin/orcl/udump/orcl1_ora_11011.trc:
ORA-00603: ORACLE server SESSION TERMINATED BY fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed WITH STATUS: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 192.168.168.3 NOT found. CHECK output FROM ifconfig command
.
.
.
Wed Apr 10 15:07:08 2013
IPC Send timeout detected.Sender: ospid 25688
Receiver: inst 2 binc 427282 ospid 11838
Wed Apr 10 15:07:08 2013
IPC Send timeout detected.Sender: ospid 25724
Wed Apr 10 15:07:08 2013
IPC Send timeout detected.Sender: ospid 25680
Receiver: inst 2 binc 431591 ospid 11822
Receiver: inst 2 binc 431795 ospid 11874
Wed Apr 10 15:07:08 2013
IPC Send timeout detected.Sender: ospid 25684
Receiver: inst 2 binc 428985 ospid 11826
Wed Apr 10 15:07:08 2013
IPC Send timeout detected.Sender: ospid 25708
Receiver: inst 2 binc 430048 ospid 11858
Wed Apr 10 15:07:09 2013
ospid 25678: network interface WITH IP address 192.168.168.3 no longer operational
requested interface 192.168.168.3 NOT found. CHECK output FROM ifconfig command
Wed Apr 10 15:07:35 2013
IPC Send timeout TO 1.1 inc 4 FOR msg TYPE 44 FROM opid 7
Wed Apr 10 15:07:35 2013
IPC Send timeout TO 1.12 inc 4 FOR msg TYPE 44 FROM opid 21
Wed Apr 10 15:07:35 2013
IPC Send timeout TO 1.2 inc 4 FOR msg TYPE 44 FROM opid 8
Wed Apr 10 15:07:35 2013
IPC Send timeout TO 1.3 inc 4 FOR msg TYPE 44 FROM opid 10
Wed Apr 10 15:07:35 2013
IPC Send timeout TO 1.8 inc 4 FOR msg TYPE 44 FROM opid 15
Wed Apr 10 15:08:13 2013
ospid 25678: network interface WITH IP address 192.168.168.3 no longer operational
requested interface 192.168.168.3 NOT found. CHECK output FROM ifconfig command
Wed Apr 10 15:08:16 2013
IPC Send timeout detected.Sender: ospid 25748
Receiver: inst 2 binc 430164 ospid 11890
.
.
.
Wed Apr 10 15:08:53 2013
IPC Send timeout TO 1.13 inc 4 FOR msg TYPE 36 FROM opid 176
Wed Apr 10 15:08:53 2013
IPC Send timeout TO 1.15 inc 4 FOR msg TYPE 36 FROM opid 167
Wed Apr 10 15:08:57 2013
IPC Send timeout TO 1.4 inc 4 FOR msg TYPE 32 FROM opid 180
.
.
.
Wed Apr 10 15:15:51 2013
Evicting instance 2 FROM cluster
Wed Apr 10 15:16:09 2013
ospid 25678: network interface WITH IP address 192.168.168.3 no longer operational
requested interface 192.168.168.3 NOT found. CHECK output FROM ifconfig command
Wed Apr 10 15:16:40 2013
Waiting FOR instances TO leave: 
Wed Apr 10 15:17:00 2013
Waiting FOR instances TO leave: 
Wed Apr 10 15:17:09 2013
ospid 25678: network interface WITH IP address 192.168.168.3 no longer operational
requested interface 192.168.168.3 NOT found. CHECK output FROM ifconfig command
Wed Apr 10 15:17:20 2013
Waiting FOR instances TO leave: 

节点2上的错误信息与之类似:

.
.
.
Wed Apr 10 15:19:07 2013
Errors IN file /oracle/admin/orcl/udump/orcl2_ora_14065.trc:
ORA-00603: ORACLE server SESSION TERMINATED BY fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed WITH STATUS: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 192.168.168.4 NOT found. CHECK output FROM ifconfig command
Wed Apr 10 15:19:08 2013
Errors IN file /oracle/admin/orcl/udump/orcl2_ora_14057.trc:
ORA-00603: ORACLE server SESSION TERMINATED BY fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed WITH STATUS: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 192.168.168.4 NOT found. CHECK output FROM ifconfig command
Wed Apr 10 15:19:46 2013
ospid 11820: network interface WITH IP address 192.168.168.4 no longer operational
requested interface 192.168.168.4 NOT found. CHECK output FROM ifconfig command
Wed Apr 10 15:20:46 2013
ospid 11820: network interface WITH IP address 192.168.168.4 no longer operational
requested interface 192.168.168.4 NOT found. CHECK output FROM ifconfig command
Wed Apr 10 15:20:55 2013
Errors IN file /oracle/admin/orcl/bdump/orcl2_lmon_11818.trc:
ORA-29740: evicted BY member 0, GROUP incarnation 6
Wed Apr 10 15:20:55 2013
LMON: terminating instance due TO error 29740
Wed Apr 10 15:20:55 2013
Errors IN file /oracle/admin/orcl/bdump/orcl2_smon_11924.trc:
ORA-29740: evicted BY member , GROUP incarnation 
Wed Apr 10 15:20:55 2013
Errors IN file /oracle/admin/orcl/bdump/orcl2_lmse_11886.trc:
ORA-29740: evicted BY member , GROUP incarnation 
Wed Apr 10 16:11:37 2013
Starting ORACLE instance (normal)
Wed Apr 10 16:11:45 2013
sculkget: failed TO LOCK /oracle/products/10.2/db_1/dbs/lkinstorcl2 exclusive
Wed Apr 10 16:11:45 2013
sculkget: LOCK held BY PID: 6912
Wed Apr 10 16:11:45 2013
Oracle Instance Startup operation failed. Another process may be attempting TO startup OR shutdown this Instance.
Wed Apr 10 16:11:45 2013
Failed TO acquire instance startup/shutdown serialization primitive
Wed Apr 10 16:11:50 2013
sculkget: failed TO LOCK /oracle/products/10.2/db_1/dbs/lkinstorcl2 exclusive
Wed Apr 10 16:11:50 2013
sculkget: LOCK held BY PID: 6912
Wed Apr 10 16:11:50 2013
Oracle Instance Startup operation failed. Another process may be attempting TO startup OR shutdown this Instance.
Wed Apr 10 16:11:50 2013
Failed TO acquire instance startup/shutdown serialization primitive
Wed Apr 10 16:11:54 2013
sculkget: failed TO LOCK /oracle/products/10.2/db_1/dbs/lkinstorcl2 exclusive
Wed Apr 10 16:11:54 2013
sculkget: LOCK held BY PID: 6912
Wed Apr 10 16:11:54 2013
Oracle Instance Startup operation failed. Another process may be attempting TO startup OR shutdown this Instance.
Wed Apr 10 16:11:54 2013
Failed TO acquire instance startup/shutdown serialization primitive
Wed Apr 10 16:12:29 2013
sculkget: failed TO LOCK /oracle/products/10.2/db_1/dbs/lkinstorcl2 exclusive
Wed Apr 10 16:12:29 2013
sculkget: LOCK held BY PID: 6912
Wed Apr 10 16:12:29 2013
Oracle Instance Startup operation failed. Another process may be attempting TO startup OR shutdown this Instance.
Wed Apr 10 16:12:29 2013
Failed TO acquire instance startup/shutdown serialization primitive
Wed Apr 10 16:12:47 2013
sculkget: failed TO LOCK /oracle/products/10.2/db_1/dbs/lkinstorcl2 exclusive
Wed Apr 10 16:12:47 2013
sculkget: LOCK held BY PID: 6912
Wed Apr 10 16:12:47 2013
Oracle Instance Startup operation failed. Another process may be attempting TO startup OR shutdown this Instance.
Wed Apr 10 16:12:47 2013
Failed TO acquire instance startup/shutdown serialization primitive
Wed Apr 10 16:12:52 2013
sculkget: failed TO LOCK /oracle/products/10.2/db_1/dbs/lkinstorcl2 exclusive
Wed Apr 10 16:12:52 2013
sculkget: LOCK held BY PID: 6912
Wed Apr 10 16:12:52 2013
Oracle Instance Startup operation failed. Another process may be attempting TO startup OR shutdown this Instance.
Wed Apr 10 16:12:52 2013
Failed TO acquire instance startup/shutdown serialization primitive
Wed Apr 10 16:12:56 2013
sculkget: failed TO LOCK /oracle/products/10.2/db_1/dbs/lkinstorcl2 exclusive
Wed Apr 10 16:12:56 2013
sculkget: LOCK held BY PID: 6912
Wed Apr 10 16:12:56 2013
Oracle Instance Startup operation failed. Another process may be attempting TO startup OR shutdown this Instance.
Wed Apr 10 16:12:56 2013
Failed TO acquire instance startup/shutdown serialization primitive

导致问题的原因根据错误信息很容易分析出来,节点2上的IP地址被修改,导致心跳通信出现了异常,而节点1试图将节点2踢出集群,但是由于无法和节点2之间进行通信,因此只有等待节点2重启。
检查节点2的操作系统日志:

Apr 10 15:00:04 bj-sst-xhm-3f2-m5k-02 ip: [ID 482227 kern.notice] ip_arp_done: init failed
Apr 10 15:07:37 bj-sst-xhm-3f2-m5k-02 Had[4135]: [ID 702911 daemon.notice] VCS CRITICAL V-16-1-50086 CPU usage ON bj-sst-xhm-3f2-m5k-02 IS 92%
Apr 10 15:18:41 bj-sst-xhm-3f2-m5k-02 sshd[13485]: [ID 800047 auth.error] error: Failed TO allocate internet-DOMAIN X11 display socket.

在15点04秒时出现的ip_arp_done: init failed信息,说明设置网卡接口时使用了主机名信息,且主机的IP地址被在线修改。
最后根据HISTORY确认,发现有人通过root登录系统,执行ifconfig –a6来检查IPV6的地址,但是命令敲错,执行了ifconfig –a 6,在a和6之间多了一个空格,导致主机所有的IP地址被设置成0.0.0.0,于是导致了上面的错误。
这再次说明,对于root这种权限用户而言,任何的不小心都可能会导致非常严重的后果。