欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

HAhadoop集群namenode无法自动切换成active

程序员文章站 2022-03-24 10:21:11
...

配了个haHadoo集群,手动kill -9了1号机的namenode,发现2号不能自动变为active,查看日志报错:

写道
2018-10-31 14:11:02,098 INFO org.apache.hadoop.ha.NodeFencer: ====== Beginning Service Fencing Process... ======
2018-10-31 14:11:02,098 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
2018-10-31 14:11:02,099 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable to create SSH session
com.jcraft.jsch.JSchException: java.io.FileNotFoundException: /home/root/.ssh/id_rsa (No such file or directory)
at com.jcraft.jsch.KeyPair.load(KeyPair.java:543)
at com.jcraft.jsch.IdentityFile.newInstance(IdentityFile.java:40)
at com.jcraft.jsch.JSch.addIdentity(JSch.java:407)
at com.jcraft.jsch.JSch.addIdentity(JSch.java:367)
at org.apache.hadoop.ha.SshFenceByTcpPort.createSession(SshFenceByTcpPort.java:122)
at org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:91)
at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:532)
at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:505)
at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:892)
at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:921)
at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:820)
at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: java.io.FileNotFoundException: /home/root/.ssh/id_rsa (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:93)
at com.jcraft.jsch.Util.fromFile(Util.java:508)
at com.jcraft.jsch.KeyPair.load(KeyPair.java:540)
... 15 more
2018-10-31 14:11:02,099 WARN org.apache.hadoop.ha.NodeFencer: Fencing method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
2018-10-31 14:11:02,099 ERROR org.apache.hadoop.ha.NodeFencer: Unable to fence service by any configured method.
2018-10-31 14:11:02,099 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election
java.lang.RuntimeException: Unable to fence NameNode at hadoop1/192.168.150.151:9000
at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:533)
at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:505)
at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:892)
at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:921)
at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:820)
at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2018-10-31 14:11:02,099 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
2018-10-31 14:11:02,102 INFO org.apache.zookeeper.ZooKeeper: Session: 0x166c8a424df00fb closed
2018-10-31 14:11:03,102 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@14671dfe
2018-10-31 14:11:03,103 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hadoop2/192.168.150.152:2181. Will not attempt to authenticate using SASL (unknown error)
2018-10-31 14:11:03,104 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to hadoop2/192.168.150.152:2181, initiating session
2018-10-31 14:11:03,106 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server hadoop2/192.168.150.152:2181, sessionid = 0x266c8a421c9010f, negotiated timeout = 5000
2018-10-31 14:11:03,106 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2018-10-31 14:11:03,107 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected.
2018-10-31 14:11:03,107 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...
2018-10-31 14:11:03,108 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: 0a086d796861646f6f7012036e6e311a076861646f6f703120a84628d33e
2018-10-31 14:11:03,108 INFO org.apache.hadoop.ha.ZKFailoverController: Should fence: NameNode at hadoop1/192.168.150.151:9000

 主要是这个

写道
com.jcraft.jsch.JSchException: java.io.FileNotFoundException: /home/root/.ssh/id_rsa (No such file or directory)

 测试ssh几台机器都可以相互免密ssh登录,后来发现hdfs-site.xml 中的sshfence配置是用来通过 ssh 登录到前一个 active NameNode上将其补刀杀死用的,以便于确定只有一个 active NameNode,dfs.ha.fencing.ssh.private-key-files是配置本机私钥文件的存放地址,我的私钥地址配置错了,所以无法补刀,所以备用的namenode不能确定只有它活着,所以不敢转为active状态。

我的秘钥放在/root/.ssh/id_rsa,而之前配置成了 /home/root/.ssh/id_rsa