Hadoop HA环境搭建
程序员文章站
2022-07-06 16:20:08
...
1.环境准备
由于我是在青云服务器上搭建的VPC网络,详细步骤请参考青云官方文档:https://docs.qingcloud.com/product/network/vpc.html
2.软件准备
软件名 | 版本 |
---|---|
操作系统 | centos6.8 |
java | 1.8_191 |
hadoop | hadoop-2.6.0-cdh5.7.0.tar.gz |
zookeeper | zookeeper-3.4.6.tar.gz |
3.节点规划
节点IP | 节点名称 | 安装软件 | 进程 |
---|---|---|---|
192.168.110.2 | hadoop001 | hadoop,zookeeper | NameNode DFSZKFailoverController JournalNode DataNode ResourceManager JobHistoryServer NodeManager QuorumPeerMain |
192.168.110.3 | hadoop002 | hadoop,zookeeper | NameNode DFSZKFailoverController JournalNode DataNode ResourceManager JobHistoryServer NodeManager QuorumPeerMain |
192.168.110.4 | hadoop003 | hadoop,zookeeper | JournalNode DataNode NodeManager QuorumPeerMain |
4.目录规划
名称 | 路径 | 备注 |
---|---|---|
$HADOOP_HOME | /home/hadoop/app/hadoop-2.6.0-cdh5.7.0 | |
data | /home/hadoop/data | |
logs | /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs | |
hadoop.tmp.dir | /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/tmp | 需要手工创建,权限为777,用户为hadoop:hadoop |
$ZOOKEEPER_HOME | /home/hadoop/app/zookeeper-3.4.6 |
5.主机无密码登陆
登陆到第一台服务器,创建用户名hadoop
[[email protected] ~]# useradd hadoop
切换到hadoop用户:
[[email protected] ~]# su - hadoop
在hadoop用户的根目录下,产生死要文件
[[email protected] ~]$ ssh-******
然后一直回车,在当前用户的根目录下产生一个.ssh目录,以及相应的文件,在其他两台机器上,也执行这样的操作,然后将文件传到第一台机器上。
将私钥内容写入到认证文件中
[[email protected] .ssh]$ pwd
/home/hadoop/.ssh
[[email protected] .ssh]$ cat id_dsa.pub >> authorized_keys
配置hosts文件(切换到root用户)
[[email protected] ~]# vi /etc/hosts
添加如下内容:
192.168.110.2 hadoop001
192.168.110.3 hadoop002
192.168.110.4 hadoop003
然后将认证文件、host文件传输到两外两个服务器上
[[email protected] .ssh]$ scp -r authorized_keys [email protected]:/home/hadoop/.ssh/
[[email protected] .ssh]$ scp -r authorized_keys [email protected]:/home/hadoop/.ssh/
[[email protected] ~]# scp -r /etc/hosts [email protected]:/etc/
[[email protected] ~]# scp -r /etc/hosts [email protected]:/etc/
然后在登陆下机器就可以不用输入密码啦。
6.安装软件
将本地的文件上传的云主机上,使用scp命令,如果使用的是windows,可以选择使用rzsz命令。在上传的时候,都是上传到root用户下,因为没有hadoop用户的密码。除了jdk软件外,其他的需要修改权限。
6.1安装jdk
创建jdk的目录
[[email protected] ~]# mkdir -p /usr/java/
解压文件
[[email protected] ~]# tar -zxvf /home/hadoop/software/jdk-8u181-linux-x64.tar.gz -C /usr/java/
修改文件权限
[[email protected] ~]# chown -R root:root /usr/java/jdk1.8.0_181/
修改配置文件
[[email protected] ~]# vi /etc/profile
在文件最后,添加下面的内容:
export JAVA_HOME=/usr/java/jdk1.8.0_191
export PATH=$JAVA_HOME/bin:$RZDATA_HOME/bin:$PATH
生效文件
[[email protected] ~]# source /etc/profile
验证文件
[[email protected] ~]# java -version
java version "1.8.0_191"
Java(TM) SE Runtime Environment (build 1.8.0_191-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b13, mixed mode)
在其他两台服务器上执行相同的操作
6.2安装zookeeper
切换用户:
[[email protected] ~]# su - hadoop
解压文件:
[[email protected] ~]$ tar -zxvf software/zookeeper-3.4.6.tar.gz -C app/
修改文件的权限
[[email protected] ~]$ chown -R hadoop:hadoop app/
配置环境变量
[[email protected] zookeeper-3.4.6]$ vi ~/.bash_profile
添加内容:
export ZOOKEEPER_HOME=/home/hadoop/app/zookeeper-3.4.6
export PATH=$ZOOKEEPER_HOME/bin:$PATH
生效文件
[[email protected] zookeeper-3.4.6]$ source ~/.bash_profile
修改配置文件
[hadoo[email protected] app]$ cd zookeeper-3.4.6/conf/
[[email protected] conf]$ cp zoo_sample.cfg zoo.cfg
[[email protected] conf]$ vi zoo.cfg
dataDir=/home/hadoop/app/zookeeper-3.4.6/data
server.1=hadoop001:2888:3888
server.2=hadoop002:2888:3888
server.3=hadoop003:2888:3888
创建存储文件
[[email protected] zookeeper-3.4.6]$ cd data/
[[email protected] data]$ touch myid
[[email protected] data]$ vi myid
文件中填充的内容为1,在其他两台机器上修改这个文件的内容为2或者3。
传输文件:将hadoop001中的zookeeper文件传到另外两台机器上
[[email protected] app]$ scp -r zookeeper-3.4.6/ [email protected]:/home/hadoop/app/
[[email protected] app]$ scp -r zookeeper-3.4.6/ [email protected]:/home/hadoop/app/
6.3安装hadoop
解压文件:
[[email protected] ~]$ tar -zxvf software/hadoop-2.6.0-cdh5.7.0.tar.gz -C app/
修改文件的权限
[[email protected] ~]$ chown -R hadoop:hadoop app/
配置环境变量
[[email protected] ~]$ vi ~/.bash_profile
添加内容:
export HADOOP_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.7.0
export PATH=$HADOOP_HOME/bin:$PATH
生效文件
[[email protected] ~]$ source ~/.bash_profile
修改配置文件
[[email protected] hadoop]$ pwd
/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop
修改hadoop-env.sh
export JAVA_HOME="/usr/java/jdk1.8.0_191"
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native"
修改yarn-site.xml,文件内容如下:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- nodemanager 配置 ================================================= -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<!--shuflle的类-->
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.localizer.address</name>
<value>0.0.0.0:23344</value>
<description>Address where the localizer IPC is.</description>
</property>
<property>
<!--nm 页面地址-->
<name>yarn.nodemanager.webapp.address</name>
<value>0.0.0.0:23999</value>
<description>NM Webapp address.</description>
</property>
<!-- HA 配置 =============================================================== -->
<!-- Resource Manager Configs -->
<property>
<!--重试的间隔-->
<name>yarn.resourcemanager.connect.retry-interval.ms</name>
<value>2000</value>
</property>
<property>
<!--是否支持yarn的ha-->
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<!--是否支持yarn的故障转移-->
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 使嵌入式自动故障转移。HA环境启动,与 ZKRMStateStore 配合 处理fencing -->
<property>
<name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
<value>true</value>
</property>
<!-- 集群名称,确保HA选举时对应的集群 -->
<property>
<!--yarn集群的名字-->
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-cluster</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!--这里RM主备结点需要单独指定,(可选)
<property>
<name>yarn.resourcemanager.ha.id</name>
<value>rm2</value>
</property>
-->
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
<value>5000</value>
</property>
<!-- ZKRMStateStore 配置 -->
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
</property>
<property>
<name>yarn.resourcemanager.zk.state-store.address</name>
<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
</property>
<!-- Client访问RM的RPC地址 (applications manager interface) -->
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>hadoop001:23140</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>hadoop002:23140</value>
</property>
<!-- AM访问RM的RPC地址(scheduler interface) -->
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>hadoop001:23130</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>hadoop002:23130</value>
</property>
<!-- RM admin interface -->
<property>
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>hadoop001:23141</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>hadoop002:23141</value>
</property>
<!--NM访问RM的RPC端口 -->
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>hadoop001:23125</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>hadoop002:23125</value>
</property>
<!-- RM web application 地址 -->
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>hadoop001:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>hadoop002:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address.rm1</name>
<value>hadoop001:23189</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address.rm2</name>
<value>hadoop002:23189</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://hadoop001:19888/jobhistory/logs</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
<discription>单个任务可申请最少内存,默认1024MB</discription>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
<discription>单个任务可申请最大内存,默认8192MB</discription>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>
</configuration>
修改core-site.xml,文件内容如下:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!--Yarn 需要使用 fs.defaultFS 指定NameNode URI -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://ruozeclusterg5</value>
</property>
<!--==============================Trash机制======================================= -->
<property>
<!--多长时间创建CheckPoint NameNode截点上运行的CheckPointer 从Current文件夹创建CheckPoint;默认:0 由fs.trash.interval项指定 -->
<name>fs.trash.checkpoint.interval</name>
<value>0</value>
</property>
<property>
<!--多少分钟.Trash下的CheckPoint目录会被删除,该配置服务器设置优先级大于客户端,默认:0 不删除 -->
<name>fs.trash.interval</name>
<value>1440</value>
</property>
<!--指定hadoop临时目录, hadoop.tmp.dir 是hadoop文件系统依赖的基础配置,很多路径都依赖它。如果hdfs-site.xml中不配 置namenode和datanode的存放位置,默认就放在这>个路径中 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/tmp</value>
</property>
<!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
</property>
<!--指定ZooKeeper超时间隔,单位毫秒 -->
<property>
<name>ha.zookeeper.session-timeout.ms</name>
<value>2000</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.BZip2Codec,
org.apache.hadoop.io.compress.SnappyCodec
</value>
</property>
</configuration>
修改hdfs-site.xml文件,内容如下:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!--HDFS超级用户 -->
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
</property>
<!--开启web hdfs -->
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/data/dfs/name</value>
<description> namenode 存放name table(fsimage)本地目录(需要修改)</description>
</property>
<property>
<name>dfs.namenode.edits.dir</name>
<value>${dfs.namenode.name.dir}</value>
<description>namenode粗放 transaction file(edits)本地目录(需要修改)</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/data/dfs/data</value>
<description>datanode存放block本地目录(需要修改)</description>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- 块大小256M (默认128M) -->
<property>
<name>dfs.blocksize</name>
<value>268435456</value>
</property>
<!--======================================================================= -->
<!--HDFS高可用配置 -->
<!--指定hdfs的nameservice为ruozeclusterg5,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>ruozeclusterg5</value>
</property>
<property>
<!--设置NameNode IDs 此版本最大只支持两个NameNode -->
<name>dfs.ha.namenodes.ruozeclusterg5</name>
<value>nn1,nn2</value>
</property>
<!-- Hdfs HA: dfs.namenode.rpc-address.[nameservice ID] rpc 通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ruozeclusterg5.nn1</name>
<value>hadoop001:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ruozeclusterg5.nn2</name>
<value>hadoop002:8020</value>
</property>
<!-- Hdfs HA: dfs.namenode.http-address.[nameservice ID] http 通信地址 -->
<property>
<name>dfs.namenode.http-address.ruozeclusterg5.nn1</name>
<value>hadoop001:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.ruozeclusterg5.nn2</name>
<value>hadoop002:50070</value>
</property>
<!--==================Namenode editlog同步 ============================================ -->
<!--保证数据恢复 -->
<property>
<name>dfs.journalnode.http-address</name>
<value>0.0.0.0:8480</value>
</property>
<property>
<name>dfs.journalnode.rpc-address</name>
<value>0.0.0.0:8485</value>
</property>
<property>
<!--设置JournalNode服务器地址,QuorumJournalManager 用于存储editlog -->
<!--格式:qjournal://<host1:port1>;<host2:port2>;<host3:port3>/<journalId> 端口同journalnode.rpc-address -->
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop001:8485;hadoop002:8485;hadoop003:8485/ruozeclusterg5</value>
</property>
<property>
<!--JournalNode存放数据地址 -->
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/data/dfs/jn</value>
</property>
<!--==================DataNode editlog同步 ============================================ -->
<property>
<!--DataNode,Client连接Namenode识别选择Active NameNode策略 -->
<!-- 配置失败自动切换实现方式 -->
<name>dfs.client.failover.proxy.provider.ruozeclusterg5</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!--==================Namenode fencing:=============================================== -->
<!--Failover后防止停掉的Namenode启动,造成两个服务 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
<!--多少milliseconds 认为fencing失败 -->
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<!--==================NameNode auto failover base ZKFC and Zookeeper====================== -->
<!--开启基于Zookeeper -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!--动态许可datanode连接namenode列表 -->
<property>
<name>dfs.hosts</name>
<value>/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop/slaves</value>
</property>
</configuration>
修改mapred-site.xml文件,内容如下:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- 配置 MapReduce Applications -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- JobHistory Server ============================================================== -->
<!-- 配置 MapReduce JobHistory Server 地址 ,默认端口10020 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop001:10020</value>
</property>
<!-- 配置 MapReduce JobHistory Server web ui 地址, 默认端口19888 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop001:19888</value>
</property>
<!-- 配置 Map段输出的压缩,snappy-->
<property>
<name>mapreduce.map.output.compress</name>
<value>true</value>
</property>
<property>
<name>mapreduce.map.output.compress.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
</configuration>
修改slave文件,内容如下:
hadoop001
hadoop002
hadoop003
验证安装
[[email protected] ~]$ hadoop version
Hadoop 2.6.0-cdh5.7.0
Subversion http://github.com/cloudera/hadoop -r c00978c67b0d3fe9f3b896b5030741bd40bf541a
Compiled by jenkins on 2016-03-23T18:41Z
Compiled with protoc 2.5.0
From source with checksum b2eabfa328e763c88cb14168f9b372
This command was run using /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/common/hadoop-common-2.6.0-cdh5.7.0.jar
将文件传输到两台机器上
[[email protected] app]$ cd scp -r hadoop-2.6.0-cdh5.7.0/ [email protected]:/home/hadoop/app/
[[email protected] app]$ cd scp -r hadoop-2.6.0-cdh5.7.0/ [email protected]:/home/hadoop/app/
7.启动
7.1启动ZK
在每个机器上启动zk
[[email protected] bin]$ pwd
/home/hadoop/app/zookeeper-3.4.6/bin
[[email protected] bin]$ ./zkServer.sh start
格式化(在任意一个机器上都可以)
[[email protected] app]$ hdfs zkfc -formatZK
查看zk的状态
[[email protected] bin]$ ./zkServer.sh status
可以看到三台机器中zk的状态:mode:leader (hadoop001),mode:follower(hadoop002,hadoop003)
启动ZKFC
ZKFC(zookeeperFailoverController)是用来监控NN状态,协助实现主备NN切换的,所以仅仅在主备NN节点上启动就行。
在hadoop001,hadoop002中启动zkfc
[[email protected] sbin]$ pwd
/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/sbin
[[email protected] sbin]$ ./hadoop-daemon.sh start zkfc
启动JournalNode 用于主备NN之间同步元数据信息的共享存储系统, 在每个JN节点上启动:
[[email protected] sbin]$ ./hadoop-daemon.sh start journalnode
7.2启动hadoop
格式化hdfs
[[email protected] sbin]$ pwd
/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/sbin
[[email protected] sbin]$ hadoop namenode -format
在主NN节点执行命令启动NN:
[[email protected] sbin]$ ./hadoop-daemon.sh start namenode
在备NN(hadoop002)上同步主NN的元数据信息
[[email protected] sbin]$hdfs namenode -bootstrapStandby
在备NN上执行命令:
[[email protected] sbin]$./hadoop-daemon.sh start namenode
设置和确认主NN
[[email protected] sbin]$ hdfs haadmin -getServiceState nn1
[[email protected] sbin]$ hdfs haadmin -getServiceState nn2
本文配置的是自动切换,ZK已经自动选择一个节点作为主NN了,所以这一步可以省略,查看节点状态:hdfs haadmin -transitionToActive nn1
在主NN上启动Datanode:
[[email protected] sbin]$ ./hadoop-daemon.sh start datanode
如果想要一次启动多个datanode,可以使用下面的命令
[[email protected] sbin]$ ./hadoop-daemons.sh start datanode
7.3启动yarn
方法一:一次性启动ResourceManager和NodeManager命令:start-yarn.sh
方法二:分别启动ResourceManager和NodeManager:
yarn-daemon.sh start resourcemanager
yarn-daemon.sh start nodemanager(如果有多个datanode,需使用yarn-daemons.sh)
ResourceManager 也配置了HA,根据命令查看节点状态:
[[email protected] sbin]$yarn rmadmin –getServiceState serviceid
启动MR JobHistory Server
在hadoop001上运行MRJS :
[[email protected] sbin]$./mr-jobhistory-daemon.sh start historyserver
然后在页面上查看相关的内容是否可以出来,然后在验证一下hadoop ha和yarn ha功能。
转载于:https://www.jianshu.com/p/1a634da52d57
下一篇: Hadoop的HA搭建