在内联网络失败之后Oracle 12c RAC和11g/10g RAC不同的表现
程序员文章站
2024-02-16 11:59:10
...
在Oracle 12c RAC环境下进行了内联网络失败的时候RAC如何表现得实验,发现Oracle 12c RAC和11g/10g RAC有所区别,如果内联网络失败,在11g/10g RAC中将会把问题节点从集群中驱逐,该节点而且会尝试自动重启以尝试恢复。但Oracle 12c RAC则只是把问题节点从集
在Oracle 12c RAC环境下进行了内联网络失败的时候RAC如何表现得实验,发现Oracle 12c RAC和11g/10g RAC有所区别,如果内联网络失败,在11g/10g RAC中将会把问题节点从集群中驱逐,该节点而且会尝试自动重启以尝试恢复。但Oracle 12c RAC则只是把问题节点从集群中驱逐,并没有重启该节点。下面看实验。
当前RAC状态:
[grid@12crac1 ~]$ crsctl stat res -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.LISTENER.lsnr ONLINE ONLINE 12crac1 STABLE ONLINE ONLINE 12crac2 STABLE ora.RACCRS.dg ONLINE ONLINE 12crac1 STABLE ONLINE ONLINE 12crac2 STABLE ora.RACDATA.dg ONLINE ONLINE 12crac1 STABLE ONLINE ONLINE 12crac2 STABLE ora.RACFRA.dg ONLINE ONLINE 12crac1 STABLE ONLINE ONLINE 12crac2 STABLE ora.asm ONLINE ONLINE 12crac1 Started,STABLE ONLINE ONLINE 12crac2 Started,STABLE ora.net1.network ONLINE ONLINE 12crac1 STABLE ONLINE ONLINE 12crac2 STABLE ora.ons ONLINE ONLINE 12crac1 STABLE ONLINE ONLINE 12crac2 STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.12crac1.vip 1 ONLINE ONLINE 12crac1 STABLE ora.12crac2.vip 1 ONLINE ONLINE 12crac2 STABLE ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE 12crac2 STABLE ora.LISTENER_SCAN2.lsnr 1 ONLINE ONLINE 12crac1 STABLE ora.LISTENER_SCAN3.lsnr 1 ONLINE ONLINE 12crac1 STABLE ora.MGMTLSNR 1 ONLINE ONLINE 12crac1 169.254.88.173 192.1 68.80.150,STABLE ora.cvu 1 ONLINE ONLINE 12crac1 STABLE ora.luocs12c.db 1 ONLINE ONLINE 12crac2 Open,STABLE 2 ONLINE ONLINE 12crac1 Open,STABLE ora.mgmtdb 1 ONLINE ONLINE 12crac1 Open,STABLE ora.oc4j 1 ONLINE ONLINE 12crac1 STABLE ora.scan1.vip 1 ONLINE ONLINE 12crac2 STABLE ora.scan2.vip 1 ONLINE ONLINE 12crac1 STABLE ora.scan3.vip 1 ONLINE ONLINE 12crac1 STABLE --------------------------------------------------------------------------------
数据库2个实例都是运行状态
[grid@12crac1 ~]$ srvctl status database -d luocs12c Instance luocs12c1 is running on node 12crac1 Instance luocs12c2 is running on node 12crac2
通过hosts文件了解RAC IP规划
[root@12crac1 ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 # For Public 192.168.1.150 12crac1.luocs.com 12crac1 192.168.1.151 12crac2.luocs.com 12crac2 # For VIP 192.168.1.152 12crac1-vip.luocs.com 12crac1-vip.luocs.com 192.168.1.153 12crac2-vip.luocs.com 12crac2-vip.luocs.com # For Private IP 192.168.80.150 12crac1-priv.luocs.com 12crac1-priv 192.168.80.154 12crac2-priv.luocs.com 12crac2-priv # For SCAN IP # 192.168.1.154 scan.luocs.com # 192.168.1.155 scan.luocs.com # 192.168.1.155 scan.luocs.com # For DNS Server 192.168.1.158 dns12c.luocs.com dns12c
现将节点2的私网接口eht1给down掉
[root@12crac2 12crac2]# ifdown eth1
查看节点2的集群日志
[root@12crac2 12crac2]# pwd /u01/app/12.1.0/grid/log/12crac2 [root@12crac2 12crac2]# vi alert12crac2.log 2013-07-20 16:22:51.603: [cssd(2260)]CRS-1612:Network communication with node 12crac1 (1) missing for 50% of timeout interval. Removal of this node from cluster in 14.240 seconds 2013-07-20 16:22:58.638: [cssd(2260)]CRS-1611:Network communication with node 12crac1 (1) missing for 75% of timeout interval. Removal of this node from cluster in 7.200 seconds 2013-07-20 16:23:03.640: [cssd(2260)]CRS-1610:Network communication with node 12crac1 (1) missing for 90% of timeout interval. Removal of this node from cluster in 2.200 seconds 2013-07-20 16:23:05.865: [cssd(2260)]CRS-1608:This node was evicted by node 1, 12crac1; details at (:CSSNM00005:) in /u01/app/12.1.0/grid/log/12crac2/cssd/ocssd.log. 2013-07-20 16:23:05.865: [cssd(2260)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/12.1.0/grid/log/12crac2/cssd/ocssd.log 2013-07-20 16:23:05.872: [cssd(2260)]CRS-1652:Starting clean up of CRSD resources. 2013-07-20 16:23:07.418: [/u01/app/12.1.0/grid/bin/oraagent.bin(2573)]CRS-5016:Process "/u01/app/12.1.0/grid/opmn/bin/onsctli" spawned by agent "/u01/app/12.1.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/12.1.0/grid/log/12crac2/agent/crsd/oraagent_grid/oraagent_grid.log" 2013-07-20 16:23:07.448: [cssd(2260)]CRS-1609:This node is unable to communicate with other nodes in the cluster and is going down to preserve cluster integrity; details at (:CSSNM00008:) in /u01/app/12.1.0/grid/log/12crac2/cssd/ocssd.log. 2013-07-20 16:23:07.943: [/u01/app/12.1.0/grid/bin/oraagent.bin(2573)]CRS-5016:Process "/u01/app/12.1.0/grid/bin/lsnrctl" spawned by agent "/u01/app/12.1.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/12.1.0/grid/log/12crac2/agent/crsd/oraagent_grid/oraagent_grid.log" 2013-07-20 16:23:07.951: [/u01/app/12.1.0/grid/bin/oraagent.bin(2573)]CRS-5016:Process "/u01/app/12.1.0/grid/bin/lsnrctl" spawned by agent "/u01/app/12.1.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/12.1.0/grid/log/12crac2/agent/crsd/oraagent_grid/oraagent_grid.log" 2013-07-20 16:23:08.143: [cssd(2260)]CRS-1654:Clean up of CRSD resources finished successfully. 2013-07-20 16:23:08.144: [cssd(2260)]CRS-1655:CSSD on node 12crac2 detected a problem and started to shutdown. 2013-07-20 16:23:08.279: [/u01/app/12.1.0/grid/bin/orarootagent.bin(2589)]CRS-5822:Agent '/u01/app/12.1.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:5:34} in /u01/app/12.1.0/grid/log/12crac2/agent/crsd/orarootagent_root/orarootagent_root.log. 2013-07-20 16:23:08.919: [cssd(2260)]CRS-1625:Node 12crac2, number 2, was manually shut down 2013-07-20 16:23:13.838: [/u01/app/12.1.0/grid/bin/oraagent.bin(1962)]CRS-5011:Check of resource "ora.asm" failed: details at "(:CLSN00006:)" in "/u01/app/12.1.0/grid/log/12crac2/agent/ohasd/oraagent_grid/oraagent_grid.log" 2013-07-20 16:23:14.125: [crsd(8139)]CRS-0805:Cluster Ready Service aborted due to failure to communicate with Cluster Synchronization Service with error [3]. Details at (:CRSD00109:) in /u01/app/12.1.0/grid/log/12crac2/crsd/crsd.log. 2013-07-20 16:23:14.581: [/u01/app/12.1.0/grid/bin/oraagent.bin(1962)]CRS-5011:Check of resource "ora.asm" failed: details at "(:CLSN00006:)" in "/u01/app/12.1.0/grid/log/12crac2/agent/ohasd/oraagent_grid/oraagent_grid.log" 2013-07-20 16:23:20.507: [/u01/app/12.1.0/grid/bin/oraagent.bin(1962)]CRS-5011:Check of resource "ora.asm" failed: details at "(:CLSN00006:)" in "/u01/app/12.1.0/grid/log/12crac2/agent/ohasd/oraagent_grid/oraagent_grid.log" 2013-07-20 16:23:21.982: [cssd(8199)]CRS-1713:CSSD daemon is started in hub mode 2013-07-20 16:23:29.976: [cssd(8199)]CRS-1707:Lease acquisition for node 12crac2 number 2 completed 2013-07-20 16:23:31.349: [cssd(8199)]CRS-1605:CSSD voting file is online: /dev/asm-crs; details in /u01/app/12.1.0/grid/log/12crac2/cssd/ocssd.log. 2013-07-20 16:24:21.977: [/u01/app/12.1.0/grid/bin/orarootagent.bin(2021)]CRS-5818:Aborted command 'check' for resource 'ora.storage'. Details at (:CRSAGF00113:) {0:15:28} in /u01/app/12.1.0/grid/log/12crac2/agent/ohasd/orarootagent_root/orarootagent_root.log. 2013-07-20 16:25:23.820: [ohasd(1825)]CRS-2767:Resource state recovery not attempted for 'ora.cluster_interconnect.haip' as its target state is OFFLINE 2013-07-20 16:25:23.825: [ohasd(1825)]CRS-2769:Unable to failover resource 'ora.cluster_interconnect.haip'. 2013-07-20 16:26:23.467: [/u01/app/12.1.0/grid/bin/orarootagent.bin(8263)]CRS-5818:Aborted command 'check' for resource 'ora.storage'. Details at (:CRSAGF00113:) {0:21:2} in /u01/app/12.1.0/grid/log/12crac2/agent/ohasd/orarootagent_root/orarootagent_root.log. 2013-07-20 16:28:24.760: [/u01/app/12.1.0/grid/bin/orarootagent.bin(8314)]CRS-5818:Aborted command 'check' for resource 'ora.storage'. Details at (:CRSAGF00113:) {0:23:2} in /u01/app/12.1.0/grid/log/12crac2/agent/ohasd/orarootagent_root/orarootagent_root.log. 2013-07-20 16:28:51.161: [cssd(8199)]CRS-1601:CSSD Reconfiguration complete. Active nodes are 12crac2 . 2013-07-20 16:28:53.827: [ctssd(8344)]CRS-2407:The new Cluster Time Synchronization Service reference node is host 12crac2. 2013-07-20 16:28:53.831: [ctssd(8344)]CRS-2401:The Cluster Time Synchronization Service started on host 12crac2. 2013-07-20 16:28:53.978: [ohasd(1825)]CRS-2878:Failed to restart resource 'ora.ctssd' 2013-07-20 16:28:55.046: [ohasd(1825)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE 2013-07-20 16:28:55.046: [ohasd(1825)]CRS-2769:Unable to failover resource 'ora.diskmon'. 2013-07-20 16:29:08.433: [/u01/app/12.1.0/grid/bin/oraagent.bin(1962)]CRS-5019:All OCR locations are on ASM disk groups [RACCRS], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/u01/app/12.1.0/grid/log/12crac2/agent/ohasd/oraagent_grid/oraagent_grid.log". 2013-07-20 16:29:09.471: [/u01/app/12.1.0/grid/bin/oraagent.bin(1962)]CRS-5019:All OCR locations are on ASM disk groups [RACCRS], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/u01/app/12.1.0/grid/log/12crac2/agent/ohasd/oraagent_grid/oraagent_grid.log". 2013-07-20 16:29:10.182: [crsd(8448)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-44: Error in network address and interface operations Network address and interface operations error [7]]. Details at (:CRSD00111:) in /u01/app/12.1.0/grid/log/12crac2/crsd/crsd.log. 2013-07-20 16:29:22.138: [crsd(8454)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-44: Error in network address and interface operations Network address and interface operations error [7]]. Details at (:CRSD00111:) in /u01/app/12.1.0/grid/log/12crac2/crsd/crsd.log. 2013-07-20 16:29:34.036: [crsd(8465)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-44: Error in network address and interface operations Network address and interface operations error [7]]. Details at (:CRSD00111:) in /u01/app/12.1.0/grid/log/12crac2/crsd/crsd.log. 2013-07-20 16:29:39.506: [/u01/app/12.1.0/grid/bin/oraagent.bin(1962)]CRS-5019:All OCR locations are on ASM disk groups [RACCRS], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/u01/app/12.1.0/grid/log/12crac2/agent/ohasd/oraagent_grid/oraagent_grid.log". 2013-07-20 16:29:45.968: [crsd(8472)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-44: Error in network address and interface operations Network address and interface operations error [7]]. Details at (:CRSD00111:) in /u01/app/12.1.0/grid/log/12crac2/crsd/crsd.log. 2013-07-20 16:29:56.937: [crsd(8480)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-44: Error in network address and interface operations Network address and interface operations error [7]]. Details at (:CRSD00111:) in /u01/app/12.1.0/grid/log/12crac2/crsd/crsd.log. 2013-07-20 16:30:08.808: [crsd(8508)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-44: Error in network address and interface operations Network address and interface operations error [7]]. Details at (:CRSD00111:) in /u01/app/12.1.0/grid/log/12crac2/crsd/crsd.log. 2013-07-20 16:30:09.557: [/u01/app/12.1.0/grid/bin/oraagent.bin(1962)]CRS-5019:All OCR locations are on ASM disk groups [RACCRS], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/u01/app/12.1.0/grid/log/12crac2/agent/ohasd/oraagent_grid/oraagent_grid.log". 2013-07-20 16:30:20.621: [crsd(8533)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-44: Error in network address and interface operations Network address and interface operations error [7]]. Details at (:CRSD00111:) in /u01/app/12.1.0/grid/log/12crac2/crsd/crsd.log. 2013-07-20 16:30:32.604: [crsd(8563)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-44: Error in network address and interface operations Network address and interface operations error [7]]. Details at (:CRSD00111:) in /u01/app/12.1.0/grid/log/12crac2/crsd/crsd.log. 2013-07-20 16:30:39.610: [/u01/app/12.1.0/grid/bin/oraagent.bin(1962)]CRS-5019:All OCR locations are on ASM disk groups [RACCRS], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/u01/app/12.1.0/grid/log/12crac2/agent/ohasd/oraagent_grid/oraagent_grid.log". 2013-07-20 16:30:49.527: [crsd(8593)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-44: Error in network address and interface operations Network address and interface operations error [7]]. Details at (:CRSD00111:) in /u01/app/12.1.0/grid/log/12crac2/crsd/crsd.log. 2013-07-20 16:30:59.779: [ohasd(1825)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart. 2013-07-20 16:30:59.779: [ohasd(1825)]CRS-2769:Unable to failover resource 'ora.crsd'. 2013-07-20 16:31:09.649: [/u01/app/12.1.0/grid/bin/oraagent.bin(1962)]CRS-5019:All OCR locations are on ASM disk groups [RACCRS], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/u01/app/12.1.0/grid/log/12crac2/agent/ohasd/oraagent_grid/oraagent_grid.log". 2013-07-20 16:31:39.791: [/u01/app/12.1.0/grid/bin/oraagent.bin(1962)]CRS-5019:All OCR locations are on ASM disk groups [RACCRS], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/u01/app/12.1.0/grid/log/12crac2/agent/ohasd/oraagent_grid/oraagent_grid.log".
节点2/var/log/messages内容
Jul 20 16:22:46 12crac2 kernel: connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295436896, last ping 4295441900, now 4295446912 Jul 20 16:22:46 12crac2 kernel: connection1:0: detected conn error (1011) Jul 20 16:22:46 12crac2 iscsid: Kernel reported iSCSI connection 1:0 error (1011 - ISCSI_ERR_CONN_FAILED: iSCSI connection failed) state (3) Jul 20 16:22:49 12crac2 iscsid: connection1:0 is operational after recovery (1 attempts) Jul 20 16:23:13 12crac2 abrt[8140]: Saved core dump of pid 2260 (/u01/app/12.1.0/grid/bin/ocssd.bin) to /var/spool/abrt/ccpp-2013-07-20-16:23:09-2260 (75399168 bytes) Jul 20 16:23:13 12crac2 abrtd: Directory 'ccpp-2013-07-20-16:23:09-2260' creation detected Jul 20 16:23:14 12crac2 abrtd: Executable '/u01/app/12.1.0/grid/bin/ocssd.bin' doesn't belong to any package Jul 20 16:23:14 12crac2 abrtd: 'post-create' on '/var/spool/abrt/ccpp-2013-07-20-16:23:09-2260' exited with 1 Jul 20 16:23:14 12crac2 abrtd: Corrupted or bad directory '/var/spool/abrt/ccpp-2013-07-20-16:23:09-2260', deleting
节点2ASM实例告警日志
Sat Jul 20 16:23:00 2013 SKGXP: ospid 2406: network interface with IP address 169.254.171.71 no longer running (check cable) SKGXP: ospid 2406: network interface with IP address 169.254.171.71 is DOWN Sat Jul 20 16:23:08 2013 NOTE: ASMB process exiting, either shutdown is in progress or foreground connected to ASMB was killed. NOTE: ASMB clearing idle groups before exit Sat Jul 20 16:23:08 2013 NOTE: client exited [2489] NOTE: force a map free for map id 2 Sat Jul 20 16:23:10 2013 Instance Critical Process (pid: 11, ospid: 2404, LMON) died unexpectedly PMON (ospid: 2379): terminating the instance due to error 481 Sat Jul 20 16:23:14 2013 Instance terminated by PMON, pid = 2379 Sat Jul 20 16:28:55 2013 MEMORY_TARGET defaulting to 1128267776. * instance_number obtained from CSS = 2, checking for the existence of node 0... * node 0 does not exist. instance_number = 2 Starting ORACLE instance (normal) Sat Jul 20 16:28:55 2013 CLI notifier numLatches:7 maxDescs:620 LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0 Initial number of CPU is 4 Number of processor cores in the system is 4 Number of processor sockets in the system is 2 Public Interface 'eth0' configured from GPnP for use as a public interface. [name='eth0', type=1, ip=192.168.1.151, mac=00-0c-29-a1-81-7c, net=192.168.1.0/24, mask=255.255.255.0, use=public/1] WARNING: No cluster interconnect has been specified. Depending on the communication driver configured Oracle cluster traffic may be directed to the public interface of this machine. Oracle recommends that RAC clustered databases be configured with a private interconnect for enhanced security and performance. CELL communication is configured to use 0 interface(s): CELL IP affinity details: NUMA status: non-NUMA system cellaffinity.ora status: N/A CELL communication will use 1 IP group(s): Grp 0: Picked latch-free SCN scheme 3 Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/12.1.0/grid/dbs/arch Autotune of undo retention is turned on. LICENSE_MAX_USERS = 0 SYS auditing is disabled NOTE: remote asm mode is local (mode 0x301; from cluster type) NOTE: Volume support enabled Starting up: Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options. ORACLE_HOME = /u01/app/12.1.0/grid System name: Linux Node name: 12crac2.luocs.com Release: 2.6.39-400.17.1.el6uek.x86_64 Version: #1 SMP Fri Feb 22 18:16:18 PST 2013 Machine: x86_64 Using parameter settings in server-side spfile +RACCRS/scan12c/ASMPARAMETERFILE/registry.253.819592477 System parameters with non-default values: large_pool_size = 12M remote_login_passwordfile= "EXCLUSIVE" asm_diskstring = "/dev/asm*" asm_diskgroups = "RACFRA" asm_diskgroups = "RACDATA" asm_power_limit = 1 NOTE: remote asm mode is local (mode 0x301; from cluster type) Sat Jul 20 16:28:56 2013 Cluster communication is configured to use the following interface(s) for this instance 192.168.1.151 cluster interconnect IPC version: Oracle UDP/IP (generic) IPC Vendor 1 proto 2 Oracle instance running with ODM: Oracle Direct NFS ODM Library Version 3.0 NOTE: PatchLevel of this instance 0 Starting background process PMON Sat Jul 20 16:28:56 2013 PMON started with pid=2, OS id=8370 Starting background process PSP0 Sat Jul 20 16:28:56 2013 PSP0 started with pid=3, OS id=8373 Starting background process VKTM Sat Jul 20 16:28:57 2013 VKTM started with pid=4, OS id=8383 at elevated priority Starting background process GEN0 Sat Jul 20 16:28:57 2013 VKTM running at (1)millisec precision with DBRM quantum (100)ms Sat Jul 20 16:28:57 2013 GEN0 started with pid=5, OS id=8387 Starting background process MMAN Sat Jul 20 16:28:57 2013 MMAN started with pid=6, OS id=8389 Starting background process DIAG Sat Jul 20 16:28:57 2013 DIAG started with pid=8, OS id=8393 Starting background process PING Sat Jul 20 16:28:58 2013 PING started with pid=9, OS id=8395 Starting background process DIA0 Starting background process LMON Sat Jul 20 16:28:58 2013 DIA0 started with pid=10, OS id=8397 Starting background process LMD0 Sat Jul 20 16:28:58 2013 LMON started with pid=11, OS id=8399 Starting background process LMS0 Sat Jul 20 16:28:58 2013 LMD0 started with pid=12, OS id=8401 Sat Jul 20 16:28:58 2013 * Load Monitor used for high load check * New Low - High Load Threshold Range = [3840 - 5120] Sat Jul 20 16:28:58 2013 LMS0 started with pid=13, OS id=8403 at elevated priority Starting background process LMHB Sat Jul 20 16:28:58 2013 LMHB started with pid=14, OS id=8407 Starting background process LCK1 Sat Jul 20 16:28:58 2013 LCK1 started with pid=15, OS id=8409 Starting background process DBW0 Sat Jul 20 16:28:58 2013 DBW0 started with pid=17, OS id=8413 Starting background process LGWR Starting background process CKPT Sat Jul 20 16:28:58 2013 LGWR started with pid=18, OS id=8415 Sat Jul 20 16:28:58 2013 CKPT started with pid=19, OS id=8417 Starting background process SMON Starting background process LREG Sat Jul 20 16:28:58 2013 SMON started with pid=20, OS id=8419 Sat Jul 20 16:28:58 2013 LREG started with pid=21, OS id=8421 Starting background process RBAL Starting background process GMON Sat Jul 20 16:28:58 2013 RBAL started with pid=22, OS id=8423 Sat Jul 20 16:28:58 2013 GMON started with pid=23, OS id=8425 Starting background process MMON Starting background process MMNL Sat Jul 20 16:28:58 2013 MMON started with pid=24, OS id=8427 Sat Jul 20 16:28:58 2013 MMNL started with pid=25, OS id=8429 Sat Jul 20 16:28:59 2013 lmon registered with NM - instance number 2 (internal mem no 1) Sat Jul 20 16:28:59 2013 Reconfiguration started (old inc 0, new inc 2) ASM instance List of instances: 2 (myinst: 2) Global Resource Directory frozen * allocate domain 0, invalid = TRUE Communication channels reestablished Begin lmon rcfg omni enqueue reconfig stage1 End lmon rcfg omni enqueue reconfig stage1 Master broadcasted resource hash value bitmaps Begin lmon rcfg omni enqueue reconfig stage2 End lmon rcfg omni enqueue reconfig stage2 Non-local Process blocks cleaned out LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Set master node info Begin lmon rcfg omni enqueue reconfig stage3 End lmon rcfg omni enqueue reconfig stage3 Submitted all remote-enqueue requests Begin lmon rcfg omni enqueue reconfig stage4 End lmon rcfg omni enqueue reconfig stage4 Dwn-cvts replayed, VALBLKs dubious Begin lmon rcfg omni enqueue reconfig stage5 End lmon rcfg omni enqueue reconfig stage5 All grantable enqueues granted Begin lmon rcfg omni enqueue reconfig stage6 End lmon rcfg omni enqueue reconfig stage6 Sat Jul 20 16:28:59 2013 Post SMON to start 1st pass IR Submitted all GCS remote-cache requests Begin lmon rcfg omni enqueue reconfig stage7 End lmon rcfg omni enqueue reconfig stage7 Fix write in gcs resources Sat Jul 20 16:28:59 2013 Reconfiguration complete (total time 0.1 secs) Starting background process LCK0 Sat Jul 20 16:29:00 2013 LCK0 started with pid=26, OS id=8431 Sat Jul 20 16:29:01 2013 Instance started by oraagent ORACLE_BASE from environment = /u01/app/grid Sat Jul 20 16:29:01 2013 Using default pga_aggregate_limit of 2048 MB Sat Jul 20 16:29:01 2013 SQL> ALTER DISKGROUP ALL MOUNT /* asm agent call crs *//* {0:9:6} */ Sat Jul 20 16:29:01 2013 NOTE: Diskgroup used for Voting files is: RACCRS Diskgroup with spfile:RACCRS NOTE: Diskgroup used for OCR is:RACCRS NOTE: Diskgroups listed in ASM_DISKGROUP are RACFRA RACDATA NOTE: cache registered group RACCRS 1/0x8AC9E683 NOTE: cache began mount (first) of group RACCRS 1/0x8AC9E683 NOTE: cache registered group RACDATA 2/0x8AF9E684 NOTE: cache began mount (first) of group RACDATA 2/0x8AF9E684 NOTE: cache registered group RACFRA 3/0x8B19E685 NOTE: cache began mount (first) of group RACFRA 3/0x8B19E685 NOTE: Assigning number (1,0) to disk (/dev/asm-crs) NOTE: Assigning number (2,0) to disk (/dev/asm-data) NOTE: Assigning number (3,0) to disk (/dev/asm-fra) Sat Jul 20 16:29:08 2013 ERROR: GMON found another heartbeat for grp 1 (RACCRS) Sat Jul 20 16:29:08 2013 ERROR: GMON could not check any PST heartbeat (grp 1) Sat Jul 20 16:29:08 2013 NOTE: cache dismounting (clean) group 1/0x8AC9E683 (RACCRS) NOTE: messaging CKPT to quiesce pins Unix process pid: 8433, image: oracle@12crac2.luocs.com (TNS V1-V3) NOTE: dbwr not being msg'd to dismount NOTE: lgwr not being msg'd to dismount NOTE: cache dismounted group 1/0x8AC9E683 (RACCRS) NOTE: cache ending mount (fail) of group RACCRS number=1 incarn=0x8ac9e683 NOTE: cache deleting context for group RACCRS 1/0x8ac9e683 Sat Jul 20 16:29:08 2013 GMON dismounting group 1 at 5 for pid 7, osid 8433 Sat Jul 20 16:29:08 2013 NOTE: Disk RACCRS_0000 in mode 0x8 marked for de-assignment ERROR: diskgroup RACCRS was not mounted Sat Jul 20 16:29:08 2013 ERROR: GMON found another heartbeat for grp 2 (RACDATA) Sat Jul 20 16:29:08 2013 ERROR: GMON could not check any PST heartbeat (grp 2) Sat Jul 20 16:29:08 2013 NOTE: cache dismounting (clean) group 2/0x8AF9E684 (RACDATA) NOTE: messaging CKPT to quiesce pins Unix process pid: 8433, image: oracle@12crac2.luocs.com (TNS V1-V3) NOTE: dbwr not being msg'd to dismount NOTE: lgwr not being msg'd to dismount NOTE: cache dismounted group 2/0x8AF9E684 (RACDATA) NOTE: cache ending mount (fail) of group RACDATA number=2 incarn=0x8af9e684 NOTE: cache deleting context for group RACDATA 2/0x8af9e684 Sat Jul 20 16:29:08 2013 GMON dismounting group 2 at 7 for pid 7, osid 8433 Sat Jul 20 16:29:08 2013 NOTE: Disk RACDATA_0000 in mode 0x8 marked for de-assignment ERROR: diskgroup RACDATA was not mounted Sat Jul 20 16:29:08 2013 ERROR: GMON found another heartbeat for grp 3 (RACFRA) Sat Jul 20 16:29:08 2013 ERROR: GMON could not check any PST heartbeat (grp 3) Sat Jul 20 16:29:08 2013 NOTE: cache dismounting (clean) group 3/0x8B19E685 (RACFRA) NOTE: messaging CKPT to quiesce pins Unix process pid: 8433, image: oracle@12crac2.luocs.com (TNS V1-V3) NOTE: dbwr not being msg'd to dismount NOTE: lgwr not being msg'd to dismount NOTE: cache dismounted group 3/0x8B19E685 (RACFRA) NOTE: cache ending mount (fail) of group RACFRA number=3 incarn=0x8b19e685 NOTE: cache deleting context for group RACFRA 3/0x8b19e685 Sat Jul 20 16:29:08 2013 GMON dismounting group 3 at 9 for pid 7, osid 8433 Sat Jul 20 16:29:08 2013 NOTE: Disk RACFRA_0000 in mode 0x8 marked for de-assignment ERROR: diskgroup RACFRA was not mounted Sat Jul 20 16:29:08 2013 WARNING: Disk Group RACCRS containing spfile for this instance is not mounted Sat Jul 20 16:29:08 2013 WARNING: Disk Group RACCRS containing configured OCR is not mounted Sat Jul 20 16:29:08 2013 WARNING: Disk Group RACCRS containing voting files is not mounted ORA-15032: not all alterations performed ORA-15017: diskgroup "RACFRA" cannot be mounted ORA-15003: diskgroup "RACFRA" already mounted in another lock name space ORA-15017: diskgroup "RACDATA" cannot be mounted ORA-15003: diskgroup "RACDATA" already mounted in another lock name space ORA-15017: diskgroup "RACCRS" cannot be mounted ORA-15003: diskgroup "RACCRS" already mounted in another lock name space Sat Jul 20 16:29:08 2013 ERROR: ALTER DISKGROUP ALL MOUNT /* asm agent call crs *//* {0:9:6} */
节点2数据库实例告警日志
Sat Jul 20 16:22:46 2013 SKGXP: ospid 5480: network interface with IP address 169.254.171.71 no longer running (check cable) SKGXP: ospid 5480: network interface with IP address 169.254.171.71 is DOWN Sat Jul 20 16:23:08 2013 NOTE: ASMB terminating Sat Jul 20 16:23:08 2013 Errors in file /u01/app/oracle/diag/rdbms/luocs12c/luocs12c2/trace/luocs12c2_asmb_5556.trc: ORA-15064: communication failure with ASM instance ORA-03113: end-of-file on communication channel Process ID: Session ID: 149 Serial number: 15 Sat Jul 20 16:23:08 2013 Errors in file /u01/app/oracle/diag/rdbms/luocs12c/luocs12c2/trace/luocs12c2_asmb_5556.trc: ORA-15064: communication failure with ASM instance ORA-03113: end-of-file on communication channel Process ID: Session ID: 149 Serial number: 15 USER (ospid: 5556): terminating the instance due to error 15064 Sat Jul 20 16:23:10 2013 Instance terminated by USER, pid = 5556
节点1集群日志
2013-07-20 16:22:51.478: [cssd(2195)]CRS-1612:Network communication with node 12crac2 (2) missing for 50% of timeout interval. Removal of this node from cluster in 14.230 seconds 2013-07-20 16:22:58.484: [cssd(2195)]CRS-1611:Network communication with node 12crac2 (2) missing for 75% of timeout interval. Removal of this node from cluster in 7.230 seconds 2013-07-20 16:23:03.487: [cssd(2195)]CRS-1610:Network communication with node 12crac2 (2) missing for 90% of timeout interval. Removal of this node from cluster in 2.220 seconds 2013-07-20 16:23:05.711: [cssd(2195)]CRS-1607:Node 12crac2 is being evicted in cluster incarnation 269802665; details at (:CSSNM00007:) in /u01/app/12.1.0/grid/log/12crac1/cssd/ocssd.log. 2013-07-20 16:23:08.780: [cssd(2195)]CRS-1625:Node 12crac2, number 2, was manually shut down 2013-07-20 16:23:08.847: [cssd(2195)]CRS-1601:CSSD Reconfiguration complete. Active nodes are 12crac1 . 2013-07-20 16:23:09.128: [crsd(2535)]CRS-5504:Node down event reported for node '12crac2'. 2013-07-20 16:23:20.404: [crsd(2535)]CRS-2773:Server '12crac2' has been removed from pool 'Generic'. 2013-07-20 16:23:20.425: [crsd(2535)]CRS-2773:Server '12crac2' has been removed from pool 'ora.luocs12c'.
从上面日志可以看出,节点2已经从集群中驱逐了,此时RAC的状态为
[grid@12crac1 ~]$ crsctl stat res -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.LISTENER.lsnr ONLINE ONLINE 12crac1 STABLE ora.RACCRS.dg ONLINE ONLINE 12crac1 STABLE ora.RACDATA.dg ONLINE ONLINE 12crac1 STABLE ora.RACFRA.dg ONLINE ONLINE 12crac1 STABLE ora.asm ONLINE ONLINE 12crac1 Started,STABLE ora.net1.network ONLINE ONLINE 12crac1 STABLE ora.ons ONLINE ONLINE 12crac1 STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.12crac1.vip 1 ONLINE ONLINE 12crac1 STABLE ora.12crac2.vip 1 ONLINE INTERMEDIATE 12crac1 FAILED OVER,STABLE ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE 12crac1 STABLE ora.LISTENER_SCAN2.lsnr 1 ONLINE ONLINE 12crac1 STABLE ora.LISTENER_SCAN3.lsnr 1 ONLINE ONLINE 12crac1 STABLE ora.MGMTLSNR 1 ONLINE ONLINE 12crac1 169.254.88.173 192.1 68.80.150,STABLE ora.cvu 1 ONLINE ONLINE 12crac1 STABLE ora.luocs12c.db 1 ONLINE OFFLINE STABLE 2 ONLINE ONLINE 12crac1 Open,STABLE ora.mgmtdb 1 ONLINE ONLINE 12crac1 Open,STABLE ora.oc4j 1 ONLINE ONLINE 12crac1 STABLE ora.scan1.vip 1 ONLINE ONLINE 12crac1 STABLE ora.scan2.vip 1 ONLINE ONLINE 12crac1 STABLE ora.scan3.vip 1 ONLINE ONLINE 12crac1 STABLE --------------------------------------------------------------------------------
下面将节点2恢复,方法有两种,手动重启节点2服务器,或者up私网接口eth1,然后将集群重启。
节点2:
[root@12crac2 ~]# ifup eth1 [root@12crac2 bin]# ./crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4535: Cannot communicate with Cluster Ready Services CRS-4529: Cluster Synchronization Services is online CRS-4534: Cannot communicate with Event Manager [root@12crac2 bin]# ./crsctl stop has -f CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on '12crac2' CRS-2673: Attempting to stop 'ora.mdnsd' on '12crac2' CRS-2673: Attempting to stop 'ora.crf' on '12crac2' CRS-2673: Attempting to stop 'ora.ctssd' on '12crac2' CRS-2673: Attempting to stop 'ora.evmd' on '12crac2' CRS-2673: Attempting to stop 'ora.asm' on '12crac2' CRS-2673: Attempting to stop 'ora.gpnpd' on '12crac2' CRS-2673: Attempting to stop 'ora.drivers.acfs' on '12crac2' CRS-2677: Stop of 'ora.drivers.acfs' on '12crac2' succeeded CRS-2677: Stop of 'ora.crf' on '12crac2' succeeded CRS-2677: Stop of 'ora.mdnsd' on '12crac2' succeeded CRS-2677: Stop of 'ora.gpnpd' on '12crac2' succeeded CRS-2677: Stop of 'ora.evmd' on '12crac2' succeeded CRS-2677: Stop of 'ora.ctssd' on '12crac2' succeeded CRS-2677: Stop of 'ora.asm' on '12crac2' succeeded CRS-2673: Attempting to stop 'ora.cssd' on '12crac2' CRS-2677: Stop of 'ora.cssd' on '12crac2' succeeded CRS-2673: Attempting to stop 'ora.gipcd' on '12crac2' CRS-2677: Stop of 'ora.gipcd' on '12crac2' succeeded CRS-2793: Shutdown of Oracle High Availability Services-managed resources on '12crac2' has completed CRS-4133: Oracle High Availability Services has been stopped. [root@12crac2 bin]# ./crsctl start has CRS-4123: Oracle High Availability Services has been started.
就这样,RAC恢复正常
[grid@12crac1 ~]$ crsctl stat res -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.LISTENER.lsnr ONLINE ONLINE 12crac1 STABLE ONLINE ONLINE 12crac2 STABLE ora.RACCRS.dg ONLINE ONLINE 12crac1 STABLE ONLINE ONLINE 12crac2 STABLE ora.RACDATA.dg ONLINE ONLINE 12crac1 STABLE ONLINE ONLINE 12crac2 STABLE ora.RACFRA.dg ONLINE ONLINE 12crac1 STABLE ONLINE ONLINE 12crac2 STABLE ora.asm ONLINE ONLINE 12crac1 Started,STABLE ONLINE ONLINE 12crac2 Started,STABLE ora.net1.network ONLINE ONLINE 12crac1 STABLE ONLINE ONLINE 12crac2 STABLE ora.ons ONLINE ONLINE 12crac1 STABLE ONLINE ONLINE 12crac2 STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.12crac1.vip 1 ONLINE ONLINE 12crac1 STABLE ora.12crac2.vip 1 ONLINE ONLINE 12crac2 STABLE ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE 12crac2 STABLE ora.LISTENER_SCAN2.lsnr 1 ONLINE ONLINE 12crac1 STABLE ora.LISTENER_SCAN3.lsnr 1 ONLINE ONLINE 12crac1 STABLE ora.MGMTLSNR 1 ONLINE ONLINE 12crac1 169.254.88.173 192.1 68.80.150,STABLE ora.cvu 1 ONLINE ONLINE 12crac1 STABLE ora.luocs12c.db 1 ONLINE ONLINE 12crac2 Open,STABLE 2 ONLINE ONLINE 12crac1 Open,STABLE ora.mgmtdb 1 ONLINE ONLINE 12crac1 Open,STABLE ora.oc4j 1 ONLINE ONLINE 12crac1 STABLE ora.scan1.vip 1 ONLINE ONLINE 12crac2 STABLE ora.scan2.vip 1 ONLINE ONLINE 12crac1 STABLE ora.scan3.vip 1 ONLINE ONLINE 12crac1 STABLE --------------------------------------------------------------------------------
原文地址:在内联网络失败之后Oracle 12c RAC和11g/10g RAC不同的表现, 感谢原作者分享。