mesos-master 启动时:Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins
背景
master节点上安装了java8,mesos,zk。我的操作就是想在各个master节点启动mesos master,然后就遇到一下错误,折腾几个小时…记录一下。
mesos-master节点
master-1 192.168.137.20
master-2 192.168.137.21
master-3 192.168.137.22
版本信息
JDK 1.8
zookeeper 3.4.10
mesos 1.9.0
前提是已经启动了ZK
一下命令在zk安装bin目录下执行
启动: ./zkServer.sh start
查看状态: ./zkServer.sh status
master-1为leader
[aaa@qq.com bin]# ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /root/zookeeper-3.4.10/bin/../conf/zoo.cfg
Mode: leader
master-2为follower
[aaa@qq.com bin]# ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /root/zookeeper-3.4.10/bin/../conf/zoo.cfg
Mode: follower
master-3也为follower
[aaa@qq.com bin]# ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /root/zookeeper-3.4.10/bin/../conf/zoo.cfg
Mode: follower
ps:这里也可以通过 echo stat | nc 192.168.137.22 2181 查看zk的状态
messo集群方式启动
mesos的安装,我是安装官网来的;这里就不再细说了,官网有了简单的例子启动一个单节点mesos,启动成功。但是当我以集群方式启动时,启动命令
/root/mesos-1.9.0/build/bin/mesos-master.sh --zk=zk://192.168.137.20:2181,192.168.137.21:2181,192.168.137.22:2181/mesos --ip=0.0.0.0 --log_dir=/var/log/mesos/master --cluster=test-cluster --quorum=2 --work_dir=/var/lib/mesos/master
–zk :指定zk集群的地址
–log_dir : 保存log路径
–quorum:使用基于replicated-Log的注册表时复制的个数,改值需要为master总数量的一半以上,我的master为3个节点,所以我设置为2
–work_dir:工作目录
错误日志如下
F1211 21:35:52.226524 10404 master.cpp:1655] Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins
*** Check failure stack trace: ***
@ 0x7fb632ba9290 google::LogMessage::Fail()
@ 0x7fb632ba91ec google::LogMessage::SendToLog()
@ 0x7fb632ba8ba8 google::LogMessage::Flush()
@ 0x7fb632babe90 google::LogMessageFatal::~LogMessageFatal()
@ 0x7fb6301bd0d5 mesos::internal::master::fail()
@ 0x7fb6302e5bec _ZNSt5_BindIFPFvRKSsS1_EPKcSt12_PlaceholderILi1EEEE6__callIvIS1_EILm0ELm1EEEET_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
@ 0x7fb6302b8be8 std::_Bind<>::operator()<>()
@ 0x7fb63028139d _ZZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvRKSsS6_EPKcSt12_PlaceholderILi1EEEEvEERKS2_OT_NS2_6PreferEENKUlOSE_S6_E_clESK_S6_
@ 0x7fb63035514e _ZN5cpp176invokeIZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvRKSsS8_EPKcSt12_PlaceholderILi1EEEEvEERKS4_OT_NS4_6PreferEEUlOSG_S8_E_ISG_S8_EEEDTclcl7forwardIT_Efp_Espcl7forwardIT0_Efp0_EEEOSO_DpOSP_
@ 0x7fb63034a18f _ZN6lambda8internal7PartialIZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvRKSsS9_EPKcSt12_PlaceholderILi1EEEEvEERKS5_OT_NS5_6PreferEEUlOSH_S9_E_JSH_SF_EE13invoke_expandISO_St5tupleIJSH_SF_EESR_IJS9_EEJLm0ELm1EEEEDTcl6invokecl7forwardISK_Efp_Espcl6expandcl3getIXT2_EEcl7forwardIT0_Efp0_EEcl7forwardIT1_Efp2_EEEESL_OSU_N5cpp1416integer_sequenceImJXspT2_EEEEOSV_
@ 0x7fb630343710 _ZNO6lambda8internal7PartialIZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvRKSsS9_EPKcSt12_PlaceholderILi1EEEEvEERKS5_OT_NS5_6PreferEEUlOSH_S9_E_JSH_SF_EEclIJS9_EEEDTcl13invoke_expandcl4movedtdefpT1fEcl4movedtdefpT10bound_argsEcvN5cpp1416integer_sequenceImJLm0ELm1EEEE_Ecl16forward_as_tuplespcl7forwardIT_Efp_EEEEDpOSU_
@ 0x7fb63033f18f _ZN5cpp176invokeIN6lambda8internal7PartialIZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvRKSsSB_EPKcSt12_PlaceholderILi1EEEEvEERKS7_OT_NS7_6PreferEEUlOSJ_SB_E_ISJ_SH_EEEISB_EEEDTclcl7forwardIT_Efp_Espcl7forwardIT0_Efp0_EEEOSS_DpOST_
@ 0x7fb63033c6c1 _ZN6lambda8internal6InvokeIvEclINS0_7PartialIZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvRKSsSC_EPKcSt12_PlaceholderILi1EEEEvEERKS8_OT_NS8_6PreferEEUlOSK_SC_E_ISK_SI_EEEISC_EEEvOT_DpOT0_
@ 0x7fb63033971e _ZNO6lambda12CallableOnceIFvRKSsEE10CallableFnINS_8internal7PartialIZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvS2_S2_EPKcSt12_PlaceholderILi1EEEEvEERKSB_OT_NSB_6PreferEEUlOSL_S2_E_ISL_SJ_EEEEclES2_
@ 0x55f1e95286e8 _ZNO6lambda12CallableOnceIFvRKSsEEclES2_
@ 0x55f1e9523747 process::internal::run<>()
@ 0x55f1e951cb66 process::Future<>::fail()
@ 0x7fb62fbc87f4 process::Promise<>::fail()
@ 0x7fb6302e0f9d process::internal::thenf<>()
@ 0x7fb63034e385 _ZN5cpp176invokeIPFvON6lambda12CallableOnceIFN7process6FutureI7NothingEERKN5mesos8internal8RegistryEEEESt10unique_ptrINS3_7PromiseIS5_EESt14default_deleteISH_EERKNS4_IS9_EEEJSD_SK_SN_EEEDTclcl7forwardIT_Efp_Espcl7forwardIT0_Efp0_EEEOSQ_DpOSR_
@ 0x7fb630346c26 _ZN6lambda8internal7PartialIPFvONS_12CallableOnceIFN7process6FutureI7NothingEERKN5mesos8internal8RegistryEEEESt10unique_ptrINS3_7PromiseIS5_EESt14default_deleteISH_EERKNS4_IS9_EEEJSD_SK_St12_PlaceholderILi1EEEE13invoke_expandISP_St5tupleIJSD_SK_SR_EESU_IJSN_EEJLm0ELm1ELm2EEEEDTcl6invokecl7forwardIT_Efp_Espcl6expandcl3getIXT2_EEcl7forwardIT0_Efp0_EEcl7forwardIT1_Efp2_EEEEOSX_OSY_N5cpp1416integer_sequenceImJXspT2_EEEEOSZ_
@ 0x7fb6303412ba _ZNO6lambda8internal7PartialIPFvONS_12CallableOnceIFN7process6FutureI7NothingEERKN5mesos8internal8RegistryEEEESt10unique_ptrINS3_7PromiseIS5_EESt14default_deleteISH_EERKNS4_IS9_EEEISD_SK_St12_PlaceholderILi1EEEEclIISN_EEEDTcl13invoke_expandcl4movedtdefpT1fEcl4movedtdefpT10bound_argsEcvN5cpp1416integer_sequenceImILm0ELm1ELm2EEEE_Ecl16forward_as_tuplespcl7forwardIT_Efp_EEEEDpOSX_
@ 0x7fb63033e2a9 _ZN5cpp176invokeIN6lambda8internal7PartialIPFvONS1_12CallableOnceIFN7process6FutureI7NothingEERKN5mesos8internal8RegistryEEEESt10unique_ptrINS5_7PromiseIS7_EESt14default_deleteISJ_EERKNS6_ISB_EEEISF_SM_St12_PlaceholderILi1EEEEEISP_EEEDTclcl7forwardIT_Efp_Espcl7forwardIT0_Efp0_EEEOSV_DpOSW_
@ 0x7fb63033b6cf lambda::internal::Invoke<>::operator()<>()
@ 0x7fb630337a7a _ZNO6lambda12CallableOnceIFvRKN7process6FutureIN5mesos8internal8RegistryEEEEE10CallableFnINS_8internal7PartialIPFvONS0_IFNS2_I7NothingEERKS5_EEESt10unique_ptrINS1_7PromiseISE_EESt14default_deleteISN_EES8_EISJ_SQ_St12_PlaceholderILi1EEEEEEclES8_
@ 0x7fb630306f00 _ZNO6lambda12CallableOnceIFvRKN7process6FutureIN5mesos8internal8RegistryEEEEEclES8_
@ 0x7fb630402572 process::internal::run<>()
@ 0x7fb6303f5fc1 process::Future<>::fail()
@ 0x7fb63042841f std::_Mem_fn<>::operator()<>()
@ 0x7fb6304265f0 _ZNSt5_BindIFSt7_Mem_fnIMN7process6FutureIN5mesos8internal8RegistryEEEFbRKSsEES6_St12_PlaceholderILi1EEEE6__callIbIS8_EILm0ELm1EEEET_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
@ 0x7fb630422dc2 std::_Bind<>::operator()<>()
@ 0x7fb63041d411 _ZZNK7process6FutureIN5mesos8internal8RegistryEE8onFailedISt5_BindIFSt7_Mem_fnIMS4_FbRKSsEES4_St12_PlaceholderILi1EEEEbEERKS4_OT_NS4_6PreferEENKUlOSG_S9_E_clESM_S9_
已放弃
[aaa@qq.com master]#
我尝试着看看命令理配置的log目录 “/var/log/mesos/master”有没有什么错误,ERROR级别信息如下
[aaa@qq.com master]# cat lt-mesos-master.master-3.root.log.ERROR.20191211-213552.10388
Log file created at: 2019/12/11 21:35:52
Running on machine: master-3
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
F1211 21:35:52.226524 10404 master.cpp:1655] Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins
错误原因
看这些log第一感觉就是连接超时,是不是连ZK的时候超时呢?查了网上了资料,提示加上–ip参数
解决
master1执行:
/root/mesos-1.9.0/build/bin/mesos-master.sh --ip=192.168.137.20 --zk=zk://192.168.137.20:2181,192.168.137.21:2181,192.168.137.22:2181/mesos --log_dir=/var/log/mesos/master --cluster=test-cluster --quorum=2 --work_dir=/var/lib/mesos/master
master2执行:
/root/mesos-1.9.0/build/bin/mesos-master.sh --ip=192.168.137.21 --zk=zk://192.168.137.20:2181,192.168.137.21:2181,192.168.137.22:2181/mesos --log_dir=/var/log/mesos/master --cluster=test-cluster --quorum=2 --work_dir=/var/lib/mesos/master
master3执行
/root/mesos-1.9.0/build/bin/mesos-master.sh --ip=192.168.137.22 --zk=zk://192.168.137.20:2181,192.168.137.21:2181,192.168.137.22:2181/mesos --log_dir=/var/log/mesos/master --cluster=test-cluster --quorum=2 --work_dir=/var/lib/mesos/master
其实我就是加了–ip的可以,key为节点的ip地址
启动验证,访问master节点的5050端口
补充,如果这个页面是不是有个“failed to connect …”的弹窗,通过抓包可以看到,页面上的的定时请求是根据我的虚拟机的主机名来发起请求的。改一下本地的host文件就行了
host文件的路径:C:\Windows\System32\drivers\etc ;修改如下
#虚拟机配置start
192.168.137.20 master-1
192.168.137.21 master-2
192.168.137.22 master-3
#虚拟机配置end