[大数据学习研究] 4. Zookeeper-分布式服务的协同管理神器

程序员文章站 2022-08-05 23:52:17

本来这一节想写Hadoop的分布式高可用环境的搭建，写到一半，发现还是有必要先介绍一下ZooKeeper这个东西。 ZooKeeper理念介绍 ZooKeeper是为分布式应用来提供协同服务的，而且ZooKeeper本身也是分布式的，由分布在至少三台机器上，这几台机器形成一个Quorum，就像一个剧 ......

本来这一节想写hadoop的分布式高可用环境的搭建，写到一半，发现还是有必要先介绍一下zookeeper这个东西。

zookeeper理念介绍

zookeeper是为分布式应用来提供协同服务的，而且zookeeper本身也是分布式的，由分布在至少三台机器上，这几台机器形成一个quorum，就像一个剧团一样。这个团里有个团长，就是leader的角色，其他的是follower。这个剧团里的每个人脑子里都记住同样的东西（zookeeper是基于内存的），并且及时和leader保持同步，所有client可连接任何一个server即可。剧团里的每个人都有一个编号myid。如果剧团里的leader挂断后，剩下的几个要重新选举出新的leader来确保服务正常运行。

[大数据学习研究] 4. Zookeeper-分布式服务的协同管理神器

1. zookeepe的安装

zookeeper的安装挺简单，就是解压，设置环境变量就可以了

[root@hadoop100 bin]# tar -zxvf /opt/software/zookeeper-3.4.10.tar.gz -c /opt/modules/

打开/ect/profile 编辑环境变量，加上下面的内容：

#java_home
export java_home=/opt/modules/jdk1.8.0_121
export path=$path:$java_home/bin

#hadoop_home
export hadoop_home=/opt/modules/hadoop-2.7.3
export path=$path:$hadoop_home/bin:$hadoop_home/sbin

#zookeeper
export zookeeper_home=/opt/modules/zookeeper-3.4.10
export path=$path:zookeeper_home/bin

然后 source /ect/profile 让更改生效。记得，把更改同步到整个集群。

[root@hadoop100 bin]# xsync /etc/profile
[root@hadoop100 bin]# xcall source /etc/profile

2. zookeeper的配置

1. zookeeper 需要一个data目录，用于存储zookeeper内存数据库的镜像和日志。然后更改zoo.cfg文件。zookeeper解压后提供了一个/opt/modules/zookeeper-3.4.10/conf/zoo_sample.cfg文件，把这个复制一下或者改个名字叫zoo.cfg, 修改一下里面的datadir的指向。

# the number of milliseconds of each tick
ticktime=2000
# the number of ticks that the initial
# synchronization phase can take
initlimit=10
# the number of ticks that can pass between
# sending a request and getting an acknowledgement
synclimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
datadir=/opt/modules/zookeeper-3.4.10/zkdata
# the port at which the clients will connect
clientport=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxclientcnxns=60
#
# be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperadmin.html#sc_maintenance
#
# the number of snapshots to retain in datadir
#autopurge.snapretaincount=3
# purge task interval in hours
# set to "0" to disable auto purge feature
#autopurge.purgeinterval=1
~

要搭建zookeeper的机器环境，zookeeper服务器的数量应该是奇数台。最少要3台。

# 连接到leader 服务器的tick数，超过这个tick数 这台服务器还没有连接上leader，那这台机
器就被认为是死掉了
initlimit = 5
# 在和leader同步过程中所允许落后的最大tick数，如果超过这个，那就是掉队了
synclimit = 2
server.100=hadoop100:2888:3888
server.101=hadoop101:2888:3888
server.102=hadoop102:2888:3888
server.103=hadoop103:2888:3888
server.104=hadoop104:2888:3888

机器的参数配置的格式是这样的：

server.a=b:c:d。
a是一个数字，表示这个是第几号服务器；
b是这个服务器的ip地址；
c是这个服务器与集群中的leader服务器交换信息的端口；
d是万一集群中的leader服务器挂了，需要一个端口来重新进行选举，选出一个新的leader，而这个端口就是用来执行选举时服务器相互通信的端口。

注意更改完毕后别忘了分发到集群中。zookeeper本身是也分布式的。先把相关文件分发到集群中的其他机器上。

[root@hadoop100 modules]# xsync zookeeper-3.4.10/

然后为每台机器做上独特的标记，在data目录里创建myid文件，内容就是上面配置文件中的数字

[root@hadoop100 zookeeper-3.4.10]# cd zkdata/
[root@hadoop100 zkdata]# echo 100 > myid

在集群的其他几台机器上修改myid文件的内容，让myid的内容和配置文件中的编号一致。这时候只能麻烦点，依次登录到每台机器上创建 data目录下的myid文件了。

[root@hadoop100 zkdata]# ssh hadoop101

last login: thu sep 19 14:10:35 2019 from gateway
[root@hadoop101 ~]# echo 101 > /opt/modules/zookeeper-3.4.10/zkdata/myid
[root@hadoop101 ~]#exit

[root@hadoop100 zkdata]# ssh hadoop101
last login: thu sep 19 14:10:35 2019 from gateway
[root@hadoop101 ~]# echo 101 > /opt/modules/zookeeper-3.4.10/zkdata/myid
[root@hadoop101 ~]# exit
logout
connection to hadoop101 closed.
[root@hadoop100 zkdata]# ssh hadoop102
last login: tue sep 17 13:26:48 2019 from hadoop100
[root@hadoop102 ~]# echo 102 > /opt/modules/zookeeper-3.4.10/zkdata/myid
[root@hadoop102 ~]# exit
logout
connection to hadoop102 closed.
[root@hadoop100 zkdata]# ssh hadoop103
last login: tue sep 17 13:17:00 2019 from hadoop100
[root@hadoop103 ~]# echo 103 > /opt/modules/zookeeper-3.4.10/zkdata/myid
[root@hadoop103 ~]# exit
logout
connection to hadoop103 closed.
[root@hadoop100 zkdata]# ssh hadoop104
last login: tue sep 17 11:04:38 2019 from hadoop100
[root@hadoop104 ~]# echo 104 > /opt/modules/zookeeper-3.4.10/zkdata/myid
[root@hadoop104 ~]# exit
logout
connection to hadoop104 closed.

检查一下确保没问题

[root@hadoop100 bin]# xcall cat /opt/modules/zookeeper-3.4.10/zkdata/myid
---------running at localhost--------
100
---------running at hadoop101-------
101
---------running at hadoop102-------
102
---------running at hadoop103-------
103
---------running at hadoop104-------
104
[root@hadoop100 bin]#

好了，基本配置好了，准备启动了，zookeeper集群都要启动zookeeper服务。我用之前介绍过的超级脚本xcall. (后来发现用这种方式靠不住，说是启动了，其实没启动 ;;;)

[root@hadoop100 zkdata]# xcall /opt/modules/zookeeper-3.4.10/bin/zkserver.sh start
---------running at localhost--------
zookeeper jmx enabled by default
using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
starting zookeeper ... started
---------running at hadoop101-------
zookeeper jmx enabled by default
using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
starting zookeeper ... started
---------running at hadoop102-------
zookeeper jmx enabled by default
using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
starting zookeeper ... started
---------running at hadoop103-------
zookeeper jmx enabled by default
using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
starting zookeeper ... started
---------running at hadoop104-------
zookeeper jmx enabled by default
using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
starting zookeeper ... started
[root@hadoop100 zkdata]#

错误排查：error contacting service. it is probably not running.

查看一下运行状态, 啊哦，怎么没启动呢？

[root@hadoop100 bin]# xcall /opt/modules/zookeeper-3.4.10/bin/zkserver.sh status
---------running at localhost--------
zookeeper jmx enabled by default
using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
error contacting service. it is probably not running.
---------running at hadoop101-------
zookeeper jmx enabled by default
using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
error contacting service. it is probably not running.
---------running at hadoop102-------
zookeeper jmx enabled by default
using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
error contacting service. it is probably not running.
---------running at hadoop103-------
zookeeper jmx enabled by default
using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
error contacting service. it is probably not running.
---------running at hadoop104-------
zookeeper jmx enabled by default
using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
error contacting service. it is probably not running.
[root@hadoop100 bin]#

后来发现需要单独ssh到每台机器上单独启动就可以了，可能是xcall神器有的时候不可靠。不过提示一点，zkserver.sh start-foreground 命令，可以在查看详细启动过程，方便排查错误。

[root@hadoop101 ~]# /opt/modules/zookeeper-3.4.10/bin/zkserver.sh start-foreground
zookeeper jmx enabled by default
using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
2019-09-19 14:52:29,093 [myid:] - info  [main:quorumpeerconfig@134] - reading configuration from: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
2019-09-19 14:52:29,122 [myid:] - info  [main:quorumpeer$quorumserver@167] - resolved hostname: hadoop104 to address: hadoop104/192.168.56.104
2019-09-19 14:52:29,123 [myid:] - info  [main:quorumpeer$quorumserver@167] - resolved hostname: hadoop103 to address: hadoop103/192.168.56.103
2019-09-19 14:52:29,123 [myid:] - info  [main:quorumpeer$quorumserver@167] - resolved hostname: hadoop102 to address: hadoop102/192.168.56.102
2019-09-19 14:52:29,124 [myid:] - info  [main:quorumpeer$quorumserver@167] - resolved hostname: hadoop101 to address: hadoop101/192.168.56.101
2019-09-19 14:52:29,124 [myid:] - info  [main:quorumpeer$quorumserver@167] - resolved hostname: hadoop100 to address: hadoop100/192.168.56.100
2019-09-19 14:52:29,124 [myid:] - info  [main:quorumpeerconfig@396] - defaulting to majority quorums
2019-09-19 14:52:29,134 [myid:101] - info  [main:datadircleanupmanager@78] - autopurge.snapretaincount set to 3
2019-09-19 14:52:29,135 [myid:101] - info  [main:datadircleanupmanager@79] - autopurge.purgeinterval set to 0
2019-09-19 14:52:29,135 [myid:101] - info  [main:datadircleanupmanager@101] - purge task is not scheduled.
2019-09-19 14:52:29,150 [myid:101] - info  [main:quorumpeermain@127] - starting quorum peer
2019-09-19 14:52:29,171 [myid:101] - info  [main:nioservercnxnfactory@89] - binding to port 0.0.0.0/0.0.0.0:2181
2019-09-19 14:52:29,172 [myid:101] - error [main:quorumpeermain@89] - unexpected exception, exiting abnormally
java.net.bindexception: address already in use
    at sun.nio.ch.net.bind0(native method)
    at sun.nio.ch.net.bind(net.java:433)
    at sun.nio.ch.net.bind(net.java:425)
    at sun.nio.ch.serversocketchannelimpl.bind(serversocketchannelimpl.java:223)
    at sun.nio.ch.serversocketadaptor.bind(serversocketadaptor.java:74)
    at sun.nio.ch.serversocketadaptor.bind(serversocketadaptor.java:67)
    at org.apache.zookeeper.server.nioservercnxnfactory.configure(nioservercnxnfactory.java:90)
    at org.apache.zookeeper.server.quorum.quorumpeermain.runfromconfig(quorumpeermain.java:130)
    at org.apache.zookeeper.server.quorum.quorumpeermain.initializeandrun(quorumpeermain.java:111)
    at org.apache.zookeeper.server.quorum.quorumpeermain.main(quorumpeermain.java:78)
[root@hadoop101 ~]#

如果jps命令能看到quorumpeermain就是已经启动成功了。

[root@hadoop100 bin]# jps
1885 quorumpeermain
2029 jps

ssh单独登录到各个服务器上依次启动，并查看状态，可以发现我现在的集群环境中hadoop102是leader，其他几台是follower：

[root@hadoop100 bin]# /opt/modules/zookeeper-3.4.10/bin/zkserver.sh status
zookeeper jmx enabled by default
using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
mode: follower
[root@hadoop100 bin]# ssh hadoop101
last login: thu sep 19 15:04:12 2019 from hadoop100
[root@hadoop101 ~]# /opt/modules/zookeeper-3.4.10/bin/zkserver.sh status
zookeeper jmx enabled by default
using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
mode: follower
[root@hadoop101 ~]# exit
logout
connection to hadoop101 closed.
[root@hadoop100 bin]# ssh hadoop102
last login: thu sep 19 15:04:48 2019 from hadoop100
[root@hadoop102 ~]# /opt/modules/zookeeper-3.4.10/bin/zkserver.sh status
zookeeper jmx enabled by default
using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
mode: leader
[root@hadoop102 ~]# exit
logout
connection to hadoop102 closed.
[root@hadoop100 bin]# ssh hadoop103
last login: thu sep 19 15:05:07 2019 from hadoop100
[root@hadoop103 ~]# /opt/modules/zookeeper-3.4.10/bin/zkserver.sh status
zookeeper jmx enabled by default
using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
mode: follower
[root@hadoop103 ~]# exit
logout
connection to hadoop103 closed.
[root@hadoop100 bin]# ssh hadoop104
last login: thu sep 19 15:05:51 2019 from hadoop100
[root@hadoop104 ~]# /opt/modules/zookeeper-3.4.10/bin/zkserver.sh status
zookeeper jmx enabled by default
using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
mode: follower
[root@hadoop104 ~]# exit
logout
connection to hadoop104 closed.
[root@hadoop100 bin]#

好了，到现在为止，我的zookeeper集群环境已经搭建成功了。

题外话

学习研究的话可以用虚拟机，真要认真做点事还是要上云，比如阿里云。如果你需要，可以用我的下面这个链接，有折扣返现。

https://promotion.aliyun.com/ntms/yunparter/invite.html?usercode=vltv9frd

上一篇：为什么 Java 不是纯面向对象语言？

下一篇： ASP UTF-8页面乱码+GB2312转UTF-8 +生成UTF-8格式的文件(编码)第1/2页