欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

换了网线异常了,CRS无法正常启动,clssnmSendingThread: sending status msg to all nodes

程序员文章站 2022-03-21 17:29:49
换了网线异常了,CRS无法正常启动,clssnmSendingThread: sending status msg to all nodes同事换网线前我将节点2正常关闭了,换完网线告诉我,发现节点2死活起不来了,看上面的日志和一些帖子最后也没解决,尝试过重启、网线拔掉重新插上、查看过存储是否正常和 ......

换了网线异常了,crs无法正常启动,clssnmsendingthread: sending status msg to all nodes
同事换网线前我将节点2正常关闭了,换完网线告诉我,发现节点2死活起不来了,看上面的日志和一些帖子最后也没解决,尝试过重启、网线拔掉重新插上、查看过存储是否正常和存储重新挂载。。。。看过一个帖子说可能是ocr信息发生了改变,不过之前没备份,也没忘这方面深入考虑。
最后还是没搞定,主要是技术有限,没准确的定位出具体问题也不敢轻易乱动。。。
20xx-12-16 19:01:05.792: [ cssd][3786819328]clssnmsendingthread: sending join msg to all nodes
20xx-12-16 19:01:05.792: [ cssd][3786819328]clssnmsendingthread: sent 5 join msgs to all nodes
20xx-12-16 19:01:06.295: [gipchalo][3811858176] gipchalowerprocessnode: no valid interfaces found to node for 7286464 ms, node 0x7fecd0028450 { host 'myrac1', haname 'css_myrac-cluster', srcluid fac66ea4-f1a960af, dstluid 00000000-00000000 numinf 0, contigseq 0, lastack 0, lastvalidack 0, sendseq [249 : 249], createtime 7037424, sentregister 1, localmonitor 1, flags 0x4 }
20xx-12-16 19:01:06.303: [ cssd][3789973248]clssgmwaitoneventvalue: after cminfo state val 3, eval 1 waited 0
20xx-12-16 19:01:06.420: [ cssd][3799754496]clssnmvdhbvalidatencopy: node 1, myrac1, has a disk hb, but no network hb, dhb has rcfg 471981092, wrtcnt, 211618800, lats 7286584, lastseqno 211618797, uniqueness 1576485880, timestamp 1576494065/8540734
20xx-12-16 19:01:06.435: [ cssd][3804591872]clssnmvdhbvalidatencopy: node 1, myrac1, has a disk hb, but no network hb, dhb has rcfg 471981092, wrtcnt, 211618802, lats 7286594, lastseqno 211618799, uniqueness 1576485880, timestamp 1576494066/8541524
20xx-12-16 19:01:07.304: [ cssd][3789973248]clssgmwaitoneventvalue: after cminfo state val 3, eval 1 waited 0
20xx-12-16 19:01:07.421: [ cssd][3799754496]clssnmvdhbvalidatencopy: node 1, myrac1, has a disk hb, but no network hb, dhb has rcfg 471981092, wrtcnt, 211618803, lats 7287584, lastseqno 211618800, uniqueness 1576485880, timestamp 1576494066/8541734
20xx-12-16 19:01:07.435: [ cssd][3804591872]clssnmvdhbvalidatencopy: node 1, myrac1, has a disk hb, but no network hb, dhb has rcfg 471981092, wrtcnt, 211618805, lats 7287604, lastseqno 211618802, uniqueness 1576485880, timestamp 1576494067/8542524
20xx-12-16 19:01:08.304: [ cssd][3789973248]clssgmwaitoneventvalue: after cminfo state val 3, eval 1 waited 0
20xx-12-16 19:01:08.422: [ cssd][3799754496]clssnmvdhbvalidatencopy: node 1, myrac1, has a disk hb, but no network hb, dhb has rcfg 471981092, wrtcnt, 211618806, lats 7288584, lastseqno 211618803, uniqueness 1576485880, timestamp 1576494067/8542734
20xx-12-16 19:01:08.436: [ cssd][3804591872]clssnmvdhbvalidatencopy: node 1, myrac1, has a disk hb, but no network hb, dhb has rcfg 471981092, wrtcnt, 211618808, lats 7288604, lastseqno 211618805, uniqueness 1576485880, timestamp 1576494068/8543524
20xx-12-16 19:01:09.304: [ cssd][3789973248]clssgmwaitoneventvalue: after cminfo state val 3, eval 1 waited 0
20xx-12-16 19:01:09.422: [ cssd][3799754496]clssnmvdhbvalidatencopy: node 1, myrac1, has a disk hb, but no network hb, dhb has rcfg 471981092, wrtcnt, 211618809, lats 7289584, lastseqno 211618806, uniqueness 1576485880, timestamp 1576494068/8543744
20xx-12-16 19:01:09.437: [ cssd][3804591872]clssnmvdhbvalidatencopy: node 1, myrac1, has a disk hb, but no network hb, dhb has rcfg 471981092, wrtcnt, 211618811, lats 7289604, lastseqno 211618808, uniqueness 1576485880, timestamp 1576494069/8544524
20xx-12-16 19:01:09.803: [ cssd][3785242368]clssnmrcfgmgrthread: local join
20xx-12-16 19:01:09.803: [ cssd][3785242368]clssnmlocaljoinevent: begin on node(2), waittime 193000
20xx-12-16 19:01:09.803: [ cssd][3785242368]clssnmlocaljoinevent: set curtime (7289964) for my node
20xx-12-16 19:01:09.803: [ cssd][3785242368]clssnmlocaljoinevent: scanning 32 nodes
20xx-12-16 19:01:09.803: [ cssd][3785242368]clssnmlocaljoinevent: node myrac1, number 1, is in an existing cluster with disk state 3
20xx-12-16 19:01:09.803: [ cssd][3785242368]clssnmlocaljoinevent: takeover aborted due to cluster member node found on disk
20xx-12-16 19:01:10.305: [ cssd][3789973248]clssgmwaitoneventvalue: after cminfo state val 3, eval 1 waited 0
20xx-12-16 19:01:10.423: [ cssd][3799754496]clssnmvdhbvalidatencopy: node 1, myrac1, has a disk hb, but no network hb, dhb has rcfg 471981092, wrtcnt, 211618812, lats 7290584, lastseqno 211618809, uniqueness 1576485880, timestamp 1576494069/8544744
20xx-12-16 19:01:10.437: [ cssd][3804591872]clssnmvdhbvalidatencopy: node 1, myrac1, has a disk hb, but no network hb, dhb has rcfg 471981092, wrtcnt, 211618814, lats 7290604, lastseqno 211618811, uniqueness 1576485880, timestamp 1576494070/8545524
20xx-12-16 19:01:10.794: [ cssd][3786819328]clssnmsendingthread: sending join msg to all nodes
20xx-12-16 19:01:10.794: [ cssd][3786819328]clssnmsendingthread: sent 5 join msgs to all nodes


20xx-12-16 20:36:02.919: [ cssd][2756265728]clssgmupdategrpdata: grock(clsn.onsnetproc.master), commissioner(-1/0)
20xx-12-16 20:36:02.919: [ cssd][2756265728]clssgmhandlegrockrcfgupdate: grock(clsn.onsnetproc.master), updateseq(118), status(0), sendresp(1)
20xx-12-16 20:36:02.920: [ cssd][2756265728]clssgmtestsetlastgrockupdate: grock(clsn.onsnetproc.master), updateseq(118) msgseq(119), lastupdt<0x7fbb58031e10>, ignoreseq(0)
20xx-12-16 20:36:02.920: [ cssd][2756265728]clssgmgrockoptagprocess: request to commission member(1) using key(1) for grock(clsn.onsnetproc.master)
20xx-12-16 20:36:02.920: [ cssd][2756265728]clssgmupdategrpdata: grock(clsn.onsnetproc.master), commissioner(1/1)
20xx-12-16 20:36:02.920: [ cssd][2756265728]clssgmhandlegrockrcfgupdate: grock(clsn.onsnetproc.master), updateseq(119), status(0), sendresp(1)
20xx-12-16 20:36:02.921: [ cssd][2756265728]clssgmtestsetlastgrockupdate: grock(clsn.onsnetproc.master), updateseq(119) msgseq(120), lastupdt<0x7fbb5804d490>, ignoreseq(0)
20xx-12-16 20:36:02.921: [ cssd][2756265728]clssgmupdategrpdata: grock(clsn.onsnetproc.master), private data(2052), incarn(40)
20xx-12-16 20:36:02.921: [ cssd][2756265728]clssgmhandlegrockrcfgupdate: grock(clsn.onsnetproc.master), updateseq(120), status(0), sendresp(1)
20xx-12-16 20:36:02.922: [ cssd][2756265728]clssgmtestsetlastgrockupdate: grock(clsn.onsnetproc.master), updateseq(120) msgseq(121), lastupdt<0x7fbb5803dee0>, ignoreseq(0)
20xx-12-16 20:36:02.922: [ cssd][2756265728]clssgmgrockoptagprocess: request to commission member(-1) using key(1) for grock(clsn.onsnetproc.master)
20xx-12-16 20:36:02.922: [ cssd][2756265728]clssgmupdategrpdata: grock(clsn.onsnetproc.master), commissioner(-1/0)
20xx-12-16 20:36:02.922: [ cssd][2756265728]clssgmhandlegrockrcfgupdate: grock(clsn.onsnetproc.master), updateseq(121), status(0), sendresp(1)
20xx-12-16 20:36:05.064: [ cssd][2753111808]clssnmsendingthread: sending status msg to all nodes
20xx-12-16 20:36:05.064: [ cssd][2753111808]clssnmsendingthread: sent 5 status msgs to all nodes
20xx-12-16 20:36:09.065: [ cssd][2753111808]clssnmsendingthread: sending status msg to all nodes
20xx-12-16 20:36:09.065: [ cssd][2753111808]clssnmsendingthread: sent 4 status msgs to all nodes
20xx-12-16 20:36:14.066: [ cssd][2753111808]clssnmsendingthread: sending status msg to all nodes
...

根据日志能判断出bond信息变了吗?我当时没发现也没分析出来,最后同事说改了bond!当时不是说只换根网线重新排下线吗?我说改回去试试,果然如此,重启一切正常了

胡乱重启了下,没起来。。。
[root@myrac2 bin]# ./crsctl query crs activeversion
oracle cluster registry initialization failed accessing oracle cluster registry device: proc-26: error while accessing the physical storage
ora-15077: could not locate asm instance serving a required diskgroup

[root@myrac2 bin]# ./ocrcheck
prot-602: failed to retrieve data from the cluster registry
proc-26: error while accessing the physical storage
ora-15077: could not locate asm instance serving a required diskgroup

[grid@myrac2 ~]$ cd /u01/app/11.2.0/grid/bin/
[grid@myrac2 bin]$ srvctl start nodeapps -n myrac2
prcr-1070 : failed to check if resource ora.gsd is registered
cannot communicate with crsd
prcr-1070 : failed to check if resource ora.net1.network is registered
cannot communicate with crsd
prcr-1035 : failed to look up crs resource myrac2 for ora.cluster_vip.type
prcr-1068 : failed to query resources
cannot communicate with crsd
prcr-1070 : failed to check if resource ora.ons is registered
cannot communicate with crsd


[grid@myrac2 bin]$ srvctl start asm -n myrac2
prcr-1070 : failed to check if resource ora.asm is registered
cannot communicate with crsd


[grid@myrac2 bin]$ srvctl start database -d testdb2
prcd-1027 : failed to retrieve database testdb2
prcr-1115 : failed to find entities of type resource that match filters ((name == ora.testdb2.db) && (type == ora.database.type)) and contain attributes version,oracle_home,database_type
cannot communicate with crsd
[grid@myrac2 bin]$

节点2被修改的bond,明显跟1不一样
[root@myrac2 11.2.0]# service network status
configured devices:
lo bond0 bond1 em1 em2 em3 em4
currently active devices:
lo em1 em2 em3 em4 bond0 bond1
[root@myrac2 11.2.0]#

节点1
[root@myrac1 ~]# service network status
configured devices:
lo bond0 em1 em2 em3 em4 idrac
currently active devices:
lo em1 em2 em3 bond0

抛开技术行不行先不说,单这件事来说,同事之间的合作有时候更重要。一不小心你就会给别人挖个坑或掉到别人给你挖的坑