在内联网络失败之后Oracle 12c RAC和11g/10g RAC不同的表现

程序员文章站 2024-02-16 11:59:10

...

在Oracle 12c RAC环境下进行了内联网络失败的时候RAC如何表现得实验，发现Oracle 12c RAC和11g/10g RAC有所区别，如果内联网络失败，在11g/10g RAC中将会把问题节点从集群中驱逐，该节点而且会尝试自动重启以尝试恢复。但Oracle 12c RAC则只是把问题节点从集

在Oracle 12c RAC环境下进行了内联网络失败的时候RAC如何表现得实验，发现Oracle 12c RAC和11g/10g RAC有所区别，如果内联网络失败，在11g/10g RAC中将会把问题节点从集群中驱逐，该节点而且会尝试自动重启以尝试恢复。但Oracle 12c RAC则只是把问题节点从集群中驱逐，并没有重启该节点。下面看实验。
当前RAC状态：

[grid@12crac1 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
               ONLINE  ONLINE       12crac1                  STABLE
               ONLINE  ONLINE       12crac2                  STABLE
ora.RACCRS.dg
               ONLINE  ONLINE       12crac1                  STABLE
               ONLINE  ONLINE       12crac2                  STABLE
ora.RACDATA.dg
               ONLINE  ONLINE       12crac1                  STABLE
               ONLINE  ONLINE       12crac2                  STABLE
ora.RACFRA.dg
               ONLINE  ONLINE       12crac1                  STABLE
               ONLINE  ONLINE       12crac2                  STABLE
ora.asm
               ONLINE  ONLINE       12crac1                  Started,STABLE
               ONLINE  ONLINE       12crac2                  Started,STABLE
ora.net1.network
               ONLINE  ONLINE       12crac1                  STABLE
               ONLINE  ONLINE       12crac2                  STABLE
ora.ons
               ONLINE  ONLINE       12crac1                  STABLE
               ONLINE  ONLINE       12crac2                  STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.12crac1.vip
      1        ONLINE  ONLINE       12crac1                  STABLE
ora.12crac2.vip
      1        ONLINE  ONLINE       12crac2                  STABLE
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       12crac2                  STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       12crac1                  STABLE
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       12crac1                  STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       12crac1                  169.254.88.173 192.1
                                                             68.80.150,STABLE
ora.cvu
      1        ONLINE  ONLINE       12crac1                  STABLE
ora.luocs12c.db
      1        ONLINE  ONLINE       12crac2                  Open,STABLE
      2        ONLINE  ONLINE       12crac1                  Open,STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       12crac1                  Open,STABLE
ora.oc4j
      1        ONLINE  ONLINE       12crac1                  STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       12crac2                  STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       12crac1                  STABLE
ora.scan3.vip
      1        ONLINE  ONLINE       12crac1                  STABLE
--------------------------------------------------------------------------------

数据库2个实例都是运行状态

[grid@12crac1 ~]$ srvctl status database -d luocs12c
Instance luocs12c1 is running on node 12crac1
Instance luocs12c2 is running on node 12crac2

通过hosts文件了解RAC IP规划

[root@12crac1 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
# For Public
192.168.1.150   12crac1.luocs.com       12crac1
192.168.1.151   12crac2.luocs.com       12crac2
# For VIP
192.168.1.152   12crac1-vip.luocs.com   12crac1-vip.luocs.com
192.168.1.153   12crac2-vip.luocs.com   12crac2-vip.luocs.com
# For Private IP
192.168.80.150  12crac1-priv.luocs.com 12crac1-priv
192.168.80.154  12crac2-priv.luocs.com 12crac2-priv
# For SCAN IP
# 192.168.1.154 scan.luocs.com
# 192.168.1.155 scan.luocs.com
# 192.168.1.155 scan.luocs.com
# For DNS Server
192.168.1.158   dns12c.luocs.com        dns12c

现将节点2的私网接口eht1给down掉

[root@12crac2 12crac2]# ifdown eth1

查看节点2的集群日志

[root@12crac2 12crac2]# pwd
/u01/app/12.1.0/grid/log/12crac2
[root@12crac2 12crac2]# vi alert12crac2.log 
2013-07-20 16:22:51.603:
[cssd(2260)]CRS-1612:Network communication with node 12crac1 (1) missing for 50% of timeout interval.  Removal of this node from cluster in 14.240 seconds
2013-07-20 16:22:58.638:
[cssd(2260)]CRS-1611:Network communication with node 12crac1 (1) missing for 75% of timeout interval.  Removal of this node from cluster in 7.200 seconds
2013-07-20 16:23:03.640:
[cssd(2260)]CRS-1610:Network communication with node 12crac1 (1) missing for 90% of timeout interval.  Removal of this node from cluster in 2.200 seconds
2013-07-20 16:23:05.865:
[cssd(2260)]CRS-1608:This node was evicted by node 1, 12crac1; details at (:CSSNM00005:) in /u01/app/12.1.0/grid/log/12crac2/cssd/ocssd.log.
2013-07-20 16:23:05.865:
[cssd(2260)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/12.1.0/grid/log/12crac2/cssd/ocssd.log
2013-07-20 16:23:05.872:
[cssd(2260)]CRS-1652:Starting clean up of CRSD resources.
2013-07-20 16:23:07.418:
[/u01/app/12.1.0/grid/bin/oraagent.bin(2573)]CRS-5016:Process "/u01/app/12.1.0/grid/opmn/bin/onsctli" spawned by agent "/u01/app/12.1.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/12.1.0/grid/log/12crac2/agent/crsd/oraagent_grid/oraagent_grid.log"
2013-07-20 16:23:07.448:
[cssd(2260)]CRS-1609:This node is unable to communicate with other nodes in the cluster and is going down to preserve cluster integrity; details at (:CSSNM00008:) in /u01/app/12.1.0/grid/log/12crac2/cssd/ocssd.log.
2013-07-20 16:23:07.943:
[/u01/app/12.1.0/grid/bin/oraagent.bin(2573)]CRS-5016:Process "/u01/app/12.1.0/grid/bin/lsnrctl" spawned by agent "/u01/app/12.1.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/12.1.0/grid/log/12crac2/agent/crsd/oraagent_grid/oraagent_grid.log"
2013-07-20 16:23:07.951:
[/u01/app/12.1.0/grid/bin/oraagent.bin(2573)]CRS-5016:Process "/u01/app/12.1.0/grid/bin/lsnrctl" spawned by agent "/u01/app/12.1.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/12.1.0/grid/log/12crac2/agent/crsd/oraagent_grid/oraagent_grid.log"
2013-07-20 16:23:08.143:
[cssd(2260)]CRS-1654:Clean up of CRSD resources finished successfully.
2013-07-20 16:23:08.144:
[cssd(2260)]CRS-1655:CSSD on node 12crac2 detected a problem and started to shutdown.
2013-07-20 16:23:08.279:
[/u01/app/12.1.0/grid/bin/orarootagent.bin(2589)]CRS-5822:Agent '/u01/app/12.1.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:5:34} in /u01/app/12.1.0/grid/log/12crac2/agent/crsd/orarootagent_root/orarootagent_root.log.
2013-07-20 16:23:08.919:
[cssd(2260)]CRS-1625:Node 12crac2, number 2, was manually shut down
2013-07-20 16:23:13.838:
[/u01/app/12.1.0/grid/bin/oraagent.bin(1962)]CRS-5011:Check of resource "ora.asm" failed: details at "(:CLSN00006:)" in "/u01/app/12.1.0/grid/log/12crac2/agent/ohasd/oraagent_grid/oraagent_grid.log"
2013-07-20 16:23:14.125:
[crsd(8139)]CRS-0805:Cluster Ready Service aborted due to failure to communicate with Cluster Synchronization Service with error [3]. Details at (:CRSD00109:) in /u01/app/12.1.0/grid/log/12crac2/crsd/crsd.log.
2013-07-20 16:23:14.581:
[/u01/app/12.1.0/grid/bin/oraagent.bin(1962)]CRS-5011:Check of resource "ora.asm" failed: details at "(:CLSN00006:)" in "/u01/app/12.1.0/grid/log/12crac2/agent/ohasd/oraagent_grid/oraagent_grid.log"
2013-07-20 16:23:20.507:
[/u01/app/12.1.0/grid/bin/oraagent.bin(1962)]CRS-5011:Check of resource "ora.asm" failed: details at "(:CLSN00006:)" in "/u01/app/12.1.0/grid/log/12crac2/agent/ohasd/oraagent_grid/oraagent_grid.log"
2013-07-20 16:23:21.982:
[cssd(8199)]CRS-1713:CSSD daemon is started in hub mode
2013-07-20 16:23:29.976:
[cssd(8199)]CRS-1707:Lease acquisition for node 12crac2 number 2 completed
2013-07-20 16:23:31.349:
[cssd(8199)]CRS-1605:CSSD voting file is online: /dev/asm-crs; details in /u01/app/12.1.0/grid/log/12crac2/cssd/ocssd.log.
2013-07-20 16:24:21.977:
[/u01/app/12.1.0/grid/bin/orarootagent.bin(2021)]CRS-5818:Aborted command 'check' for resource 'ora.storage'. Details at (:CRSAGF00113:) {0:15:28} in /u01/app/12.1.0/grid/log/12crac2/agent/ohasd/orarootagent_root/orarootagent_root.log.
2013-07-20 16:25:23.820:
[ohasd(1825)]CRS-2767:Resource state recovery not attempted for 'ora.cluster_interconnect.haip' as its target state is OFFLINE
2013-07-20 16:25:23.825:
[ohasd(1825)]CRS-2769:Unable to failover resource 'ora.cluster_interconnect.haip'.
2013-07-20 16:26:23.467:
[/u01/app/12.1.0/grid/bin/orarootagent.bin(8263)]CRS-5818:Aborted command 'check' for resource 'ora.storage'. Details at (:CRSAGF00113:) {0:21:2} in /u01/app/12.1.0/grid/log/12crac2/agent/ohasd/orarootagent_root/orarootagent_root.log.
2013-07-20 16:28:24.760:
[/u01/app/12.1.0/grid/bin/orarootagent.bin(8314)]CRS-5818:Aborted command 'check' for resource 'ora.storage'. Details at (:CRSAGF00113:) {0:23:2} in /u01/app/12.1.0/grid/log/12crac2/agent/ohasd/orarootagent_root/orarootagent_root.log.
2013-07-20 16:28:51.161:
[cssd(8199)]CRS-1601:CSSD Reconfiguration complete. Active nodes are 12crac2 .
2013-07-20 16:28:53.827:
[ctssd(8344)]CRS-2407:The new Cluster Time Synchronization Service reference node is host 12crac2.
2013-07-20 16:28:53.831:
[ctssd(8344)]CRS-2401:The Cluster Time Synchronization Service started on host 12crac2.
2013-07-20 16:28:53.978:
[ohasd(1825)]CRS-2878:Failed to restart resource 'ora.ctssd'
2013-07-20 16:28:55.046:
[ohasd(1825)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
2013-07-20 16:28:55.046:
[ohasd(1825)]CRS-2769:Unable to failover resource 'ora.diskmon'.
2013-07-20 16:29:08.433:
[/u01/app/12.1.0/grid/bin/oraagent.bin(1962)]CRS-5019:All OCR locations are on ASM disk groups [RACCRS], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/u01/app/12.1.0/grid/log/12crac2/agent/ohasd/oraagent_grid/oraagent_grid.log".
2013-07-20 16:29:09.471:
[/u01/app/12.1.0/grid/bin/oraagent.bin(1962)]CRS-5019:All OCR locations are on ASM disk groups [RACCRS], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/u01/app/12.1.0/grid/log/12crac2/agent/ohasd/oraagent_grid/oraagent_grid.log".
2013-07-20 16:29:10.182:
[crsd(8448)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-44: Error in network address and interface operations Network address and interface operations error [7]]. Details at (:CRSD00111:) in /u01/app/12.1.0/grid/log/12crac2/crsd/crsd.log.
2013-07-20 16:29:22.138:
[crsd(8454)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-44: Error in network address and interface operations Network address and interface operations error [7]]. Details at (:CRSD00111:) in /u01/app/12.1.0/grid/log/12crac2/crsd/crsd.log.
2013-07-20 16:29:34.036:
[crsd(8465)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-44: Error in network address and interface operations Network address and interface operations error [7]]. Details at (:CRSD00111:) in /u01/app/12.1.0/grid/log/12crac2/crsd/crsd.log.
2013-07-20 16:29:39.506:
[/u01/app/12.1.0/grid/bin/oraagent.bin(1962)]CRS-5019:All OCR locations are on ASM disk groups [RACCRS], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/u01/app/12.1.0/grid/log/12crac2/agent/ohasd/oraagent_grid/oraagent_grid.log".
2013-07-20 16:29:45.968:
[crsd(8472)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-44: Error in network address and interface operations Network address and interface operations error [7]]. Details at (:CRSD00111:) in /u01/app/12.1.0/grid/log/12crac2/crsd/crsd.log.
2013-07-20 16:29:56.937:
[crsd(8480)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-44: Error in network address and interface operations Network address and interface operations error [7]]. Details at (:CRSD00111:) in /u01/app/12.1.0/grid/log/12crac2/crsd/crsd.log.
2013-07-20 16:30:08.808:
[crsd(8508)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-44: Error in network address and interface operations Network address and interface operations error [7]]. Details at (:CRSD00111:) in /u01/app/12.1.0/grid/log/12crac2/crsd/crsd.log.
2013-07-20 16:30:09.557:
[/u01/app/12.1.0/grid/bin/oraagent.bin(1962)]CRS-5019:All OCR locations are on ASM disk groups [RACCRS], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/u01/app/12.1.0/grid/log/12crac2/agent/ohasd/oraagent_grid/oraagent_grid.log".
2013-07-20 16:30:20.621:
[crsd(8533)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-44: Error in network address and interface operations Network address and interface operations error [7]]. Details at (:CRSD00111:) in /u01/app/12.1.0/grid/log/12crac2/crsd/crsd.log.
2013-07-20 16:30:32.604:
[crsd(8563)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-44: Error in network address and interface operations Network address and interface operations error [7]]. Details at (:CRSD00111:) in /u01/app/12.1.0/grid/log/12crac2/crsd/crsd.log.
2013-07-20 16:30:39.610:
[/u01/app/12.1.0/grid/bin/oraagent.bin(1962)]CRS-5019:All OCR locations are on ASM disk groups [RACCRS], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/u01/app/12.1.0/grid/log/12crac2/agent/ohasd/oraagent_grid/oraagent_grid.log".
2013-07-20 16:30:49.527:
[crsd(8593)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-44: Error in network address and interface operations Network address and interface operations error [7]]. Details at (:CRSD00111:) in /u01/app/12.1.0/grid/log/12crac2/crsd/crsd.log.
2013-07-20 16:30:59.779:
[ohasd(1825)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.
2013-07-20 16:30:59.779:
[ohasd(1825)]CRS-2769:Unable to failover resource 'ora.crsd'.
2013-07-20 16:31:09.649:
[/u01/app/12.1.0/grid/bin/oraagent.bin(1962)]CRS-5019:All OCR locations are on ASM disk groups [RACCRS], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/u01/app/12.1.0/grid/log/12crac2/agent/ohasd/oraagent_grid/oraagent_grid.log".
2013-07-20 16:31:39.791:
[/u01/app/12.1.0/grid/bin/oraagent.bin(1962)]CRS-5019:All OCR locations are on ASM disk groups [RACCRS], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/u01/app/12.1.0/grid/log/12crac2/agent/ohasd/oraagent_grid/oraagent_grid.log".

节点2/var/log/messages内容

Jul 20 16:22:46 12crac2 kernel: connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295436896, last ping 4295441900, now 4295446912
Jul 20 16:22:46 12crac2 kernel: connection1:0: detected conn error (1011)
Jul 20 16:22:46 12crac2 iscsid: Kernel reported iSCSI connection 1:0 error (1011 - ISCSI_ERR_CONN_FAILED: iSCSI connection failed) state (3)
Jul 20 16:22:49 12crac2 iscsid: connection1:0 is operational after recovery (1 attempts)
Jul 20 16:23:13 12crac2 abrt[8140]: Saved core dump of pid 2260 (/u01/app/12.1.0/grid/bin/ocssd.bin) to /var/spool/abrt/ccpp-2013-07-20-16:23:09-2260 (75399168 bytes)
Jul 20 16:23:13 12crac2 abrtd: Directory 'ccpp-2013-07-20-16:23:09-2260' creation detected
Jul 20 16:23:14 12crac2 abrtd: Executable '/u01/app/12.1.0/grid/bin/ocssd.bin' doesn't belong to any package
Jul 20 16:23:14 12crac2 abrtd: 'post-create' on '/var/spool/abrt/ccpp-2013-07-20-16:23:09-2260' exited with 1
Jul 20 16:23:14 12crac2 abrtd: Corrupted or bad directory '/var/spool/abrt/ccpp-2013-07-20-16:23:09-2260', deleting

节点2ASM实例告警日志

Sat Jul 20 16:23:00 2013
SKGXP: ospid 2406: network interface with IP address 169.254.171.71 no longer running (check cable)
SKGXP: ospid 2406: network interface with IP address 169.254.171.71 is DOWN
Sat Jul 20 16:23:08 2013
NOTE: ASMB process exiting, either shutdown is in progress or foreground connected to ASMB was killed.
NOTE: ASMB clearing idle groups before exit
Sat Jul 20 16:23:08 2013
NOTE: client exited [2489]
NOTE: force a map free for map id 2
Sat Jul 20 16:23:10 2013
Instance Critical Process (pid: 11, ospid: 2404, LMON) died unexpectedly
PMON (ospid: 2379): terminating the instance due to error 481
Sat Jul 20 16:23:14 2013
Instance terminated by PMON, pid = 2379
Sat Jul 20 16:28:55 2013
MEMORY_TARGET defaulting to 1128267776.
* instance_number obtained from CSS = 2, checking for the existence of node 0...
* node 0 does not exist. instance_number = 2
Starting ORACLE instance (normal)
Sat Jul 20 16:28:55 2013
CLI notifier numLatches:7 maxDescs:620
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Initial number of CPU is 4
Number of processor cores in the system is 4
Number of processor sockets in the system is 2
Public Interface 'eth0' configured from GPnP for use as a public interface.
  [name='eth0', type=1, ip=192.168.1.151, mac=00-0c-29-a1-81-7c, net=192.168.1.0/24, mask=255.255.255.0, use=public/1]
  WARNING: No cluster interconnect has been specified. Depending on
           the communication driver configured Oracle cluster traffic
           may be directed to the public interface of this machine.
           Oracle recommends that RAC clustered databases be configured
           with a private interconnect for enhanced security and
           performance.
CELL communication is configured to use 0 interface(s):
CELL IP affinity details:
    NUMA status: non-NUMA system
    cellaffinity.ora status: N/A
CELL communication will use 1 IP group(s):
    Grp 0:
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/12.1.0/grid/dbs/arch
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
NOTE: remote asm mode is local (mode 0x301; from cluster type)
NOTE: Volume support  enabled
Starting up:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options.
ORACLE_HOME = /u01/app/12.1.0/grid
System name:    Linux
Node name:      12crac2.luocs.com
Release:        2.6.39-400.17.1.el6uek.x86_64
Version:        #1 SMP Fri Feb 22 18:16:18 PST 2013
Machine:        x86_64
Using parameter settings in server-side spfile +RACCRS/scan12c/ASMPARAMETERFILE/registry.253.819592477
System parameters with non-default values:
  large_pool_size          = 12M
  remote_login_passwordfile= "EXCLUSIVE"
  asm_diskstring           = "/dev/asm*"
  asm_diskgroups           = "RACFRA"
  asm_diskgroups           = "RACDATA"
  asm_power_limit          = 1
NOTE: remote asm mode is local (mode 0x301; from cluster type)
Sat Jul 20 16:28:56 2013
Cluster communication is configured to use the following interface(s) for this instance
  192.168.1.151
cluster interconnect IPC version: Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
Oracle instance running with ODM: Oracle Direct NFS ODM Library Version 3.0
NOTE: PatchLevel of this instance 0
Starting background process PMON
Sat Jul 20 16:28:56 2013
PMON started with pid=2, OS id=8370
Starting background process PSP0
Sat Jul 20 16:28:56 2013
PSP0 started with pid=3, OS id=8373
Starting background process VKTM
Sat Jul 20 16:28:57 2013
VKTM started with pid=4, OS id=8383 at elevated priority
Starting background process GEN0
Sat Jul 20 16:28:57 2013
VKTM running at (1)millisec precision with DBRM quantum (100)ms
Sat Jul 20 16:28:57 2013
GEN0 started with pid=5, OS id=8387
Starting background process MMAN
Sat Jul 20 16:28:57 2013
MMAN started with pid=6, OS id=8389
Starting background process DIAG
Sat Jul 20 16:28:57 2013
DIAG started with pid=8, OS id=8393
Starting background process PING
Sat Jul 20 16:28:58 2013
PING started with pid=9, OS id=8395
Starting background process DIA0
Starting background process LMON
Sat Jul 20 16:28:58 2013
DIA0 started with pid=10, OS id=8397
Starting background process LMD0
Sat Jul 20 16:28:58 2013
LMON started with pid=11, OS id=8399
Starting background process LMS0
Sat Jul 20 16:28:58 2013
LMD0 started with pid=12, OS id=8401
Sat Jul 20 16:28:58 2013
* Load Monitor used for high load check
* New Low - High Load Threshold Range = [3840 - 5120]
Sat Jul 20 16:28:58 2013
LMS0 started with pid=13, OS id=8403 at elevated priority
Starting background process LMHB
Sat Jul 20 16:28:58 2013
LMHB started with pid=14, OS id=8407
Starting background process LCK1
Sat Jul 20 16:28:58 2013
LCK1 started with pid=15, OS id=8409
Starting background process DBW0
Sat Jul 20 16:28:58 2013
DBW0 started with pid=17, OS id=8413
Starting background process LGWR
Starting background process CKPT
Sat Jul 20 16:28:58 2013
LGWR started with pid=18, OS id=8415
Sat Jul 20 16:28:58 2013
CKPT started with pid=19, OS id=8417
Starting background process SMON
Starting background process LREG
Sat Jul 20 16:28:58 2013
SMON started with pid=20, OS id=8419
Sat Jul 20 16:28:58 2013
LREG started with pid=21, OS id=8421
Starting background process RBAL
Starting background process GMON
Sat Jul 20 16:28:58 2013
RBAL started with pid=22, OS id=8423
Sat Jul 20 16:28:58 2013
GMON started with pid=23, OS id=8425
Starting background process MMON
Starting background process MMNL
Sat Jul 20 16:28:58 2013
MMON started with pid=24, OS id=8427
Sat Jul 20 16:28:58 2013
MMNL started with pid=25, OS id=8429
Sat Jul 20 16:28:59 2013
lmon registered with NM - instance number 2 (internal mem no 1)
Sat Jul 20 16:28:59 2013
Reconfiguration started (old inc 0, new inc 2)
ASM instance
List of instances:
 2 (myinst: 2)
 Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
 Communication channels reestablished
 Begin lmon rcfg omni enqueue reconfig stage1
 End lmon rcfg omni enqueue reconfig stage1
 Master broadcasted resource hash value bitmaps
 Begin lmon rcfg omni enqueue reconfig stage2
 End lmon rcfg omni enqueue reconfig stage2
 Non-local Process blocks cleaned out
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info
 Begin lmon rcfg omni enqueue reconfig stage3
 End lmon rcfg omni enqueue reconfig stage3
 Submitted all remote-enqueue requests
 Begin lmon rcfg omni enqueue reconfig stage4
 End lmon rcfg omni enqueue reconfig stage4
 Dwn-cvts replayed, VALBLKs dubious
 Begin lmon rcfg omni enqueue reconfig stage5
 End lmon rcfg omni enqueue reconfig stage5
 All grantable enqueues granted
 Begin lmon rcfg omni enqueue reconfig stage6
 End lmon rcfg omni enqueue reconfig stage6
Sat Jul 20 16:28:59 2013
 Post SMON to start 1st pass IR
 Submitted all GCS remote-cache requests
 Begin lmon rcfg omni enqueue reconfig stage7
 End lmon rcfg omni enqueue reconfig stage7
 Fix write in gcs resources
Sat Jul 20 16:28:59 2013
Reconfiguration complete (total time 0.1 secs)
Starting background process LCK0
Sat Jul 20 16:29:00 2013
LCK0 started with pid=26, OS id=8431
Sat Jul 20 16:29:01 2013
Instance started by oraagent
ORACLE_BASE from environment = /u01/app/grid
Sat Jul 20 16:29:01 2013
Using default pga_aggregate_limit of 2048 MB
Sat Jul 20 16:29:01 2013
SQL> ALTER DISKGROUP ALL MOUNT /* asm agent call crs *//* {0:9:6} */
Sat Jul 20 16:29:01 2013
NOTE: Diskgroup used for Voting files is:
         RACCRS
Diskgroup with spfile:RACCRS
NOTE: Diskgroup used for OCR is:RACCRS
NOTE: Diskgroups listed in ASM_DISKGROUP are
         RACFRA
         RACDATA
NOTE: cache registered group RACCRS 1/0x8AC9E683
NOTE: cache began mount (first) of group RACCRS 1/0x8AC9E683
NOTE: cache registered group RACDATA 2/0x8AF9E684
NOTE: cache began mount (first) of group RACDATA 2/0x8AF9E684
NOTE: cache registered group RACFRA 3/0x8B19E685
NOTE: cache began mount (first) of group RACFRA 3/0x8B19E685
NOTE: Assigning number (1,0) to disk (/dev/asm-crs)
NOTE: Assigning number (2,0) to disk (/dev/asm-data)
NOTE: Assigning number (3,0) to disk (/dev/asm-fra)
Sat Jul 20 16:29:08 2013
ERROR: GMON found another heartbeat for grp 1 (RACCRS)
Sat Jul 20 16:29:08 2013
ERROR: GMON could not check any PST heartbeat (grp 1)
Sat Jul 20 16:29:08 2013
NOTE: cache dismounting (clean) group 1/0x8AC9E683 (RACCRS)
NOTE: messaging CKPT to quiesce pins Unix process pid: 8433, image: oracle@12crac2.luocs.com (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0x8AC9E683 (RACCRS)
NOTE: cache ending mount (fail) of group RACCRS number=1 incarn=0x8ac9e683
NOTE: cache deleting context for group RACCRS 1/0x8ac9e683
Sat Jul 20 16:29:08 2013
GMON dismounting group 1 at 5 for pid 7, osid 8433
Sat Jul 20 16:29:08 2013
NOTE: Disk RACCRS_0000 in mode 0x8 marked for de-assignment
ERROR: diskgroup RACCRS was not mounted
Sat Jul 20 16:29:08 2013
ERROR: GMON found another heartbeat for grp 2 (RACDATA)
Sat Jul 20 16:29:08 2013
ERROR: GMON could not check any PST heartbeat (grp 2)
Sat Jul 20 16:29:08 2013
NOTE: cache dismounting (clean) group 2/0x8AF9E684 (RACDATA)
NOTE: messaging CKPT to quiesce pins Unix process pid: 8433, image: oracle@12crac2.luocs.com (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 2/0x8AF9E684 (RACDATA)
NOTE: cache ending mount (fail) of group RACDATA number=2 incarn=0x8af9e684
NOTE: cache deleting context for group RACDATA 2/0x8af9e684
Sat Jul 20 16:29:08 2013
GMON dismounting group 2 at 7 for pid 7, osid 8433
Sat Jul 20 16:29:08 2013
NOTE: Disk RACDATA_0000 in mode 0x8 marked for de-assignment
ERROR: diskgroup RACDATA was not mounted
Sat Jul 20 16:29:08 2013
ERROR: GMON found another heartbeat for grp 3 (RACFRA)
Sat Jul 20 16:29:08 2013
ERROR: GMON could not check any PST heartbeat (grp 3)
Sat Jul 20 16:29:08 2013
NOTE: cache dismounting (clean) group 3/0x8B19E685 (RACFRA)
NOTE: messaging CKPT to quiesce pins Unix process pid: 8433, image: oracle@12crac2.luocs.com (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 3/0x8B19E685 (RACFRA)
NOTE: cache ending mount (fail) of group RACFRA number=3 incarn=0x8b19e685
NOTE: cache deleting context for group RACFRA 3/0x8b19e685
Sat Jul 20 16:29:08 2013
GMON dismounting group 3 at 9 for pid 7, osid 8433
Sat Jul 20 16:29:08 2013
NOTE: Disk RACFRA_0000 in mode 0x8 marked for de-assignment
ERROR: diskgroup RACFRA was not mounted
Sat Jul 20 16:29:08 2013
WARNING: Disk Group RACCRS containing spfile for this instance is not mounted
Sat Jul 20 16:29:08 2013
WARNING: Disk Group RACCRS containing configured OCR is not mounted
Sat Jul 20 16:29:08 2013
WARNING: Disk Group RACCRS containing voting files is not mounted
ORA-15032: not all alterations performed
ORA-15017: diskgroup "RACFRA" cannot be mounted
ORA-15003: diskgroup "RACFRA" already mounted in another lock name space
ORA-15017: diskgroup "RACDATA" cannot be mounted
ORA-15003: diskgroup "RACDATA" already mounted in another lock name space
ORA-15017: diskgroup "RACCRS" cannot be mounted
ORA-15003: diskgroup "RACCRS" already mounted in another lock name space
Sat Jul 20 16:29:08 2013
ERROR: ALTER DISKGROUP ALL MOUNT /* asm agent call crs *//* {0:9:6} */

节点2数据库实例告警日志

Sat Jul 20 16:22:46 2013
SKGXP: ospid 5480: network interface with IP address 169.254.171.71 no longer running (check cable)
SKGXP: ospid 5480: network interface with IP address 169.254.171.71 is DOWN
Sat Jul 20 16:23:08 2013
NOTE: ASMB terminating
Sat Jul 20 16:23:08 2013
Errors in file /u01/app/oracle/diag/rdbms/luocs12c/luocs12c2/trace/luocs12c2_asmb_5556.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Process ID: 
Session ID: 149 Serial number: 15
Sat Jul 20 16:23:08 2013
Errors in file /u01/app/oracle/diag/rdbms/luocs12c/luocs12c2/trace/luocs12c2_asmb_5556.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Process ID: 
Session ID: 149 Serial number: 15
USER (ospid: 5556): terminating the instance due to error 15064
Sat Jul 20 16:23:10 2013
Instance terminated by USER, pid = 5556

节点1集群日志

2013-07-20 16:22:51.478:
[cssd(2195)]CRS-1612:Network communication with node 12crac2 (2) missing for 50% of timeout interval.  Removal of this node from cluster in 14.230 seconds
2013-07-20 16:22:58.484:
[cssd(2195)]CRS-1611:Network communication with node 12crac2 (2) missing for 75% of timeout interval.  Removal of this node from cluster in 7.230 seconds
2013-07-20 16:23:03.487:
[cssd(2195)]CRS-1610:Network communication with node 12crac2 (2) missing for 90% of timeout interval.  Removal of this node from cluster in 2.220 seconds
2013-07-20 16:23:05.711:
[cssd(2195)]CRS-1607:Node 12crac2 is being evicted in cluster incarnation 269802665; details at (:CSSNM00007:) in /u01/app/12.1.0/grid/log/12crac1/cssd/ocssd.log.
2013-07-20 16:23:08.780:
[cssd(2195)]CRS-1625:Node 12crac2, number 2, was manually shut down
2013-07-20 16:23:08.847:
[cssd(2195)]CRS-1601:CSSD Reconfiguration complete. Active nodes are 12crac1 .
2013-07-20 16:23:09.128:
[crsd(2535)]CRS-5504:Node down event reported for node '12crac2'.
2013-07-20 16:23:20.404:
[crsd(2535)]CRS-2773:Server '12crac2' has been removed from pool 'Generic'.
2013-07-20 16:23:20.425:
[crsd(2535)]CRS-2773:Server '12crac2' has been removed from pool 'ora.luocs12c'.

从上面日志可以看出，节点2已经从集群中驱逐了，此时RAC的状态为

[grid@12crac1 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
               ONLINE  ONLINE       12crac1                  STABLE
ora.RACCRS.dg
               ONLINE  ONLINE       12crac1                  STABLE
ora.RACDATA.dg
               ONLINE  ONLINE       12crac1                  STABLE
ora.RACFRA.dg
               ONLINE  ONLINE       12crac1                  STABLE
ora.asm
               ONLINE  ONLINE       12crac1                  Started,STABLE
ora.net1.network
               ONLINE  ONLINE       12crac1                  STABLE
ora.ons
               ONLINE  ONLINE       12crac1                  STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.12crac1.vip
      1        ONLINE  ONLINE       12crac1                  STABLE
ora.12crac2.vip
      1        ONLINE  INTERMEDIATE 12crac1                  FAILED OVER,STABLE
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       12crac1                  STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       12crac1                  STABLE
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       12crac1                  STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       12crac1                  169.254.88.173 192.1
                                                             68.80.150,STABLE
ora.cvu
      1        ONLINE  ONLINE       12crac1                  STABLE
ora.luocs12c.db
      1        ONLINE  OFFLINE                               STABLE
      2        ONLINE  ONLINE       12crac1                  Open,STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       12crac1                  Open,STABLE
ora.oc4j
      1        ONLINE  ONLINE       12crac1                  STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       12crac1                  STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       12crac1                  STABLE
ora.scan3.vip
      1        ONLINE  ONLINE       12crac1                  STABLE
--------------------------------------------------------------------------------

下面将节点2恢复，方法有两种，手动重启节点2服务器，或者up私网接口eth1，然后将集群重启。
节点2：

[root@12crac2 ~]# ifup eth1
[root@12crac2 bin]# ./crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4534: Cannot communicate with Event Manager
[root@12crac2 bin]# ./crsctl stop has -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on '12crac2'
CRS-2673: Attempting to stop 'ora.mdnsd' on '12crac2'
CRS-2673: Attempting to stop 'ora.crf' on '12crac2'
CRS-2673: Attempting to stop 'ora.ctssd' on '12crac2'
CRS-2673: Attempting to stop 'ora.evmd' on '12crac2'
CRS-2673: Attempting to stop 'ora.asm' on '12crac2'
CRS-2673: Attempting to stop 'ora.gpnpd' on '12crac2'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on '12crac2'
CRS-2677: Stop of 'ora.drivers.acfs' on '12crac2' succeeded
CRS-2677: Stop of 'ora.crf' on '12crac2' succeeded
CRS-2677: Stop of 'ora.mdnsd' on '12crac2' succeeded
CRS-2677: Stop of 'ora.gpnpd' on '12crac2' succeeded
CRS-2677: Stop of 'ora.evmd' on '12crac2' succeeded
CRS-2677: Stop of 'ora.ctssd' on '12crac2' succeeded
CRS-2677: Stop of 'ora.asm' on '12crac2' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on '12crac2'
CRS-2677: Stop of 'ora.cssd' on '12crac2' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on '12crac2'
CRS-2677: Stop of 'ora.gipcd' on '12crac2' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on '12crac2' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@12crac2 bin]# ./crsctl start has
CRS-4123: Oracle High Availability Services has been started.

就这样，RAC恢复正常

[grid@12crac1 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
               ONLINE  ONLINE       12crac1                  STABLE
               ONLINE  ONLINE       12crac2                  STABLE
ora.RACCRS.dg
               ONLINE  ONLINE       12crac1                  STABLE
               ONLINE  ONLINE       12crac2                  STABLE
ora.RACDATA.dg
               ONLINE  ONLINE       12crac1                  STABLE
               ONLINE  ONLINE       12crac2                  STABLE
ora.RACFRA.dg
               ONLINE  ONLINE       12crac1                  STABLE
               ONLINE  ONLINE       12crac2                  STABLE
ora.asm
               ONLINE  ONLINE       12crac1                  Started,STABLE
               ONLINE  ONLINE       12crac2                  Started,STABLE
ora.net1.network
               ONLINE  ONLINE       12crac1                  STABLE
               ONLINE  ONLINE       12crac2                  STABLE
ora.ons
               ONLINE  ONLINE       12crac1                  STABLE
               ONLINE  ONLINE       12crac2                  STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.12crac1.vip
      1        ONLINE  ONLINE       12crac1                  STABLE
ora.12crac2.vip
      1        ONLINE  ONLINE       12crac2                  STABLE
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       12crac2                  STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       12crac1                  STABLE
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       12crac1                  STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       12crac1                  169.254.88.173 192.1
                                                             68.80.150,STABLE
ora.cvu
      1        ONLINE  ONLINE       12crac1                  STABLE
ora.luocs12c.db
      1        ONLINE  ONLINE       12crac2                  Open,STABLE
      2        ONLINE  ONLINE       12crac1                  Open,STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       12crac1                  Open,STABLE
ora.oc4j
      1        ONLINE  ONLINE       12crac1                  STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       12crac2                  STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       12crac1                  STABLE
ora.scan3.vip
      1        ONLINE  ONLINE       12crac1                  STABLE
--------------------------------------------------------------------------------

原文地址：在内联网络失败之后Oracle 12c RAC和11g/10g RAC不同的表现, 感谢原作者分享。