欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Hadoop HA环境搭建

程序员文章站 2022-07-06 16:20:08
...

1.环境准备

 由于我是在青云服务器上搭建的VPC网络,详细步骤请参考青云官方文档:https://docs.qingcloud.com/product/network/vpc.html

2.软件准备

软件名 版本
操作系统 centos6.8
java 1.8_191
hadoop hadoop-2.6.0-cdh5.7.0.tar.gz
zookeeper zookeeper-3.4.6.tar.gz

3.节点规划

节点IP 节点名称 安装软件 进程
192.168.110.2 hadoop001 hadoop,zookeeper NameNode DFSZKFailoverController JournalNode DataNode ResourceManager JobHistoryServer NodeManager QuorumPeerMain
192.168.110.3 hadoop002 hadoop,zookeeper NameNode DFSZKFailoverController JournalNode DataNode ResourceManager JobHistoryServer NodeManager QuorumPeerMain
192.168.110.4 hadoop003 hadoop,zookeeper JournalNode DataNode NodeManager QuorumPeerMain

4.目录规划

名称 路径 备注
$HADOOP_HOME /home/hadoop/app/hadoop-2.6.0-cdh5.7.0
data /home/hadoop/data
logs /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs
hadoop.tmp.dir /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/tmp 需要手工创建,权限为777,用户为hadoop:hadoop
$ZOOKEEPER_HOME /home/hadoop/app/zookeeper-3.4.6

5.主机无密码登陆

登陆到第一台服务器,创建用户名hadoop
[[email protected] ~]# useradd hadoop
切换到hadoop用户:
[[email protected] ~]# su - hadoop
在hadoop用户的根目录下,产生死要文件
[[email protected] ~]$ ssh-****** 
然后一直回车,在当前用户的根目录下产生一个.ssh目录,以及相应的文件,在其他两台机器上,也执行这样的操作,然后将文件传到第一台机器上。
将私钥内容写入到认证文件中
[[email protected] .ssh]$ pwd
/home/hadoop/.ssh
[[email protected] .ssh]$ cat id_dsa.pub >> authorized_keys 
配置hosts文件(切换到root用户)
[[email protected] ~]# vi /etc/hosts
添加如下内容:
192.168.110.2  hadoop001
192.168.110.3  hadoop002
192.168.110.4  hadoop003
然后将认证文件、host文件传输到两外两个服务器上
[[email protected] .ssh]$ scp -r authorized_keys [email protected]:/home/hadoop/.ssh/
[[email protected] .ssh]$ scp -r authorized_keys [email protected]:/home/hadoop/.ssh/
[[email protected] ~]# scp -r /etc/hosts [email protected]:/etc/
[[email protected] ~]# scp -r /etc/hosts [email protected]:/etc/
然后在登陆下机器就可以不用输入密码啦。

6.安装软件

将本地的文件上传的云主机上,使用scp命令,如果使用的是windows,可以选择使用rzsz命令。在上传的时候,都是上传到root用户下,因为没有hadoop用户的密码。除了jdk软件外,其他的需要修改权限。

6.1安装jdk

创建jdk的目录
[[email protected] ~]# mkdir -p /usr/java/
解压文件
[[email protected] ~]# tar -zxvf /home/hadoop/software/jdk-8u181-linux-x64.tar.gz -C /usr/java/
修改文件权限
[[email protected] ~]# chown -R root:root  /usr/java/jdk1.8.0_181/
修改配置文件
[[email protected] ~]# vi /etc/profile
在文件最后,添加下面的内容:
export JAVA_HOME=/usr/java/jdk1.8.0_191
export PATH=$JAVA_HOME/bin:$RZDATA_HOME/bin:$PATH
生效文件
[[email protected] ~]# source /etc/profile
验证文件
[[email protected] ~]# java -version
java version "1.8.0_191"
Java(TM) SE Runtime Environment (build 1.8.0_191-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b13, mixed mode)
在其他两台服务器上执行相同的操作

6.2安装zookeeper

切换用户:
[[email protected] ~]# su - hadoop
解压文件:
[[email protected] ~]$ tar -zxvf software/zookeeper-3.4.6.tar.gz -C app/
修改文件的权限
[[email protected] ~]$ chown -R hadoop:hadoop app/
配置环境变量
[[email protected] zookeeper-3.4.6]$ vi ~/.bash_profile 
添加内容:
export ZOOKEEPER_HOME=/home/hadoop/app/zookeeper-3.4.6
export PATH=$ZOOKEEPER_HOME/bin:$PATH
生效文件
[[email protected] zookeeper-3.4.6]$ source ~/.bash_profile
修改配置文件
[hadoo[email protected] app]$ cd zookeeper-3.4.6/conf/
[[email protected] conf]$ cp zoo_sample.cfg zoo.cfg
[[email protected] conf]$ vi zoo.cfg 
dataDir=/home/hadoop/app/zookeeper-3.4.6/data
server.1=hadoop001:2888:3888 
server.2=hadoop002:2888:3888 
server.3=hadoop003:2888:3888
创建存储文件
[[email protected] zookeeper-3.4.6]$ cd data/
[[email protected] data]$ touch myid
[[email protected] data]$ vi myid 
文件中填充的内容为1,在其他两台机器上修改这个文件的内容为2或者3。
传输文件:将hadoop001中的zookeeper文件传到另外两台机器上
[[email protected] app]$ scp -r zookeeper-3.4.6/ [email protected]:/home/hadoop/app/
[[email protected] app]$ scp -r zookeeper-3.4.6/ [email protected]:/home/hadoop/app/

6.3安装hadoop

解压文件:
[[email protected] ~]$ tar -zxvf software/hadoop-2.6.0-cdh5.7.0.tar.gz -C app/
修改文件的权限
[[email protected] ~]$ chown -R hadoop:hadoop app/
配置环境变量
[[email protected] ~]$ vi ~/.bash_profile 
添加内容:
export HADOOP_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.7.0
export PATH=$HADOOP_HOME/bin:$PATH
生效文件
[[email protected] ~]$ source ~/.bash_profile
修改配置文件
[[email protected] hadoop]$ pwd
/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop
修改hadoop-env.sh
export JAVA_HOME="/usr/java/jdk1.8.0_191"
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native"
修改yarn-site.xml,文件内容如下:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- nodemanager 配置 ================================================= -->
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
<property>
    <!--shuflle的类-->
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
    <name>yarn.nodemanager.localizer.address</name>
    <value>0.0.0.0:23344</value>
    <description>Address where the localizer IPC is.</description>
</property>
<property>
    <!--nm 页面地址-->
    <name>yarn.nodemanager.webapp.address</name>
    <value>0.0.0.0:23999</value>
    <description>NM Webapp address.</description>
</property>

<!-- HA 配置 =============================================================== -->
<!-- Resource Manager Configs -->
<property>
    <!--重试的间隔-->
    <name>yarn.resourcemanager.connect.retry-interval.ms</name>
    <value>2000</value>
</property>
<property>
    <!--是否支持yarn的ha-->
    <name>yarn.resourcemanager.ha.enabled</name>
    <value>true</value>
</property>
<property>
    <!--是否支持yarn的故障转移-->
    <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
    <value>true</value>
</property>
<!-- 使嵌入式自动故障转移。HA环境启动,与 ZKRMStateStore 配合 处理fencing -->
<property>
    <name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
    <value>true</value>
</property>
<!-- 集群名称,确保HA选举时对应的集群 -->
<property>
    <!--yarn集群的名字-->
    <name>yarn.resourcemanager.cluster-id</name>
    <value>yarn-cluster</value>
</property>
<property>
    <name>yarn.resourcemanager.ha.rm-ids</name>
    <value>rm1,rm2</value>
</property>


<!--这里RM主备结点需要单独指定,(可选)
<property>
     <name>yarn.resourcemanager.ha.id</name>
     <value>rm2</value>
 </property>
 -->

<property>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
    <name>yarn.resourcemanager.recovery.enabled</name>
    <value>true</value>
</property>
<property>
    <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
    <value>5000</value>
</property>
<!-- ZKRMStateStore 配置 -->
<property>
    <name>yarn.resourcemanager.store.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
    <name>yarn.resourcemanager.zk-address</name>
    <value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
</property>
<property>
    <name>yarn.resourcemanager.zk.state-store.address</name>
    <value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
</property>
<!-- Client访问RM的RPC地址 (applications manager interface) -->
<property>
    <name>yarn.resourcemanager.address.rm1</name>
    <value>hadoop001:23140</value>
</property>
<property>
    <name>yarn.resourcemanager.address.rm2</name>
    <value>hadoop002:23140</value>
</property>
<!-- AM访问RM的RPC地址(scheduler interface) -->
<property>
    <name>yarn.resourcemanager.scheduler.address.rm1</name>
    <value>hadoop001:23130</value>
</property>
<property>
    <name>yarn.resourcemanager.scheduler.address.rm2</name>
    <value>hadoop002:23130</value>
</property>
<!-- RM admin interface -->
<property>
    <name>yarn.resourcemanager.admin.address.rm1</name>
    <value>hadoop001:23141</value>
</property>
<property>
    <name>yarn.resourcemanager.admin.address.rm2</name>
    <value>hadoop002:23141</value>
</property>
<!--NM访问RM的RPC端口 -->
<property>
    <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
    <value>hadoop001:23125</value>
</property>
<property>
    <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
    <value>hadoop002:23125</value>
</property>
<!-- RM web application 地址 -->
<property>
    <name>yarn.resourcemanager.webapp.address.rm1</name>
    <value>hadoop001:8088</value>
</property>
<property>
    <name>yarn.resourcemanager.webapp.address.rm2</name>
    <value>hadoop002:8088</value>
</property>
<property>
    <name>yarn.resourcemanager.webapp.https.address.rm1</name>
    <value>hadoop001:23189</value>
</property>
<property>
    <name>yarn.resourcemanager.webapp.https.address.rm2</name>
    <value>hadoop002:23189</value>
</property>

<property>
   <name>yarn.log-aggregation-enable</name>
   <value>true</value>
</property>
<property>
     <name>yarn.log.server.url</name>
     <value>http://hadoop001:19888/jobhistory/logs</value>
</property>


<property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>2048</value>
</property>
<property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>1024</value>
    <discription>单个任务可申请最少内存,默认1024MB</discription>
 </property>


<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
<discription>单个任务可申请最大内存,默认8192MB</discription>
</property>

<property>
   <name>yarn.nodemanager.resource.cpu-vcores</name>
   <value>2</value>
</property>

 </configuration>
修改core-site.xml,文件内容如下:
 <?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!--Yarn 需要使用 fs.defaultFS 指定NameNode URI -->
    <property>
            <name>fs.defaultFS</name>
            <value>hdfs://ruozeclusterg5</value>
    </property>
    <!--==============================Trash机制======================================= -->
    <property>
            <!--多长时间创建CheckPoint NameNode截点上运行的CheckPointer 从Current文件夹创建CheckPoint;默认:0 由fs.trash.interval项指定 -->
            <name>fs.trash.checkpoint.interval</name>
            <value>0</value>
    </property>
    <property>
            <!--多少分钟.Trash下的CheckPoint目录会被删除,该配置服务器设置优先级大于客户端,默认:0 不删除 -->
            <name>fs.trash.interval</name>
            <value>1440</value>
    </property>

     <!--指定hadoop临时目录, hadoop.tmp.dir 是hadoop文件系统依赖的基础配置,很多路径都依赖它。如果hdfs-site.xml中不配 置namenode和datanode的存放位置,默认就放在这>个路径中 -->
    <property>   
            <name>hadoop.tmp.dir</name>
            <value>/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/tmp</value>
    </property>

     <!-- 指定zookeeper地址 -->
    <property>
            <name>ha.zookeeper.quorum</name>
            <value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
    </property>
     <!--指定ZooKeeper超时间隔,单位毫秒 -->
    <property>
            <name>ha.zookeeper.session-timeout.ms</name>
            <value>2000</value>
    </property>

    <property>
       <name>hadoop.proxyuser.hadoop.hosts</name>
       <value>*</value> 
    </property> 
    <property> 
        <name>hadoop.proxyuser.hadoop.groups</name> 
        <value>*</value> 
   </property> 


  <property>
      <name>io.compression.codecs</name>
      <value>org.apache.hadoop.io.compress.GzipCodec,
        org.apache.hadoop.io.compress.DefaultCodec,
        org.apache.hadoop.io.compress.BZip2Codec,
        org.apache.hadoop.io.compress.SnappyCodec
      </value>
  </property>
  </configuration>
  修改hdfs-site.xml文件,内容如下:
  <?xml version="1.0" encoding="UTF-8"?>
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 <configuration>
<!--HDFS超级用户 -->
<property>
    <name>dfs.permissions.superusergroup</name>
    <value>hadoop</value>
</property>

<!--开启web hdfs -->
<property>
    <name>dfs.webhdfs.enabled</name>
    <value>true</value>
</property>
<property>
    <name>dfs.namenode.name.dir</name>
    <value>/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/data/dfs/name</value>
    <description> namenode 存放name table(fsimage)本地目录(需要修改)</description>
</property>
<property>
    <name>dfs.namenode.edits.dir</name>
    <value>${dfs.namenode.name.dir}</value>
    <description>namenode粗放 transaction file(edits)本地目录(需要修改)</description>
</property>
<property>
    <name>dfs.datanode.data.dir</name>
    <value>/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/data/dfs/data</value>
    <description>datanode存放block本地目录(需要修改)</description>
</property>
<property>
    <name>dfs.replication</name>
    <value>3</value>
</property>
<!-- 块大小256M (默认128M) -->
<property>
    <name>dfs.blocksize</name>
    <value>268435456</value>
</property>
<!--======================================================================= -->
<!--HDFS高可用配置 -->
<!--指定hdfs的nameservice为ruozeclusterg5,需要和core-site.xml中的保持一致 -->
<property>
    <name>dfs.nameservices</name>
    <value>ruozeclusterg5</value>
</property>
<property>
    <!--设置NameNode IDs 此版本最大只支持两个NameNode -->
    <name>dfs.ha.namenodes.ruozeclusterg5</name>
    <value>nn1,nn2</value>
</property>

<!-- Hdfs HA: dfs.namenode.rpc-address.[nameservice ID] rpc 通信地址 -->
<property>
    <name>dfs.namenode.rpc-address.ruozeclusterg5.nn1</name>
    <value>hadoop001:8020</value>
</property>
<property>
    <name>dfs.namenode.rpc-address.ruozeclusterg5.nn2</name>
    <value>hadoop002:8020</value>
</property>

<!-- Hdfs HA: dfs.namenode.http-address.[nameservice ID] http 通信地址 -->
<property>
    <name>dfs.namenode.http-address.ruozeclusterg5.nn1</name>
    <value>hadoop001:50070</value>
</property>
<property>
    <name>dfs.namenode.http-address.ruozeclusterg5.nn2</name>
    <value>hadoop002:50070</value>
</property>

<!--==================Namenode editlog同步 ============================================ -->
<!--保证数据恢复 -->
<property>
    <name>dfs.journalnode.http-address</name>
    <value>0.0.0.0:8480</value>
</property>
<property>
    <name>dfs.journalnode.rpc-address</name>
    <value>0.0.0.0:8485</value>
</property>
<property>
    <!--设置JournalNode服务器地址,QuorumJournalManager 用于存储editlog -->
    <!--格式:qjournal://<host1:port1>;<host2:port2>;<host3:port3>/<journalId> 端口同journalnode.rpc-address -->
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://hadoop001:8485;hadoop002:8485;hadoop003:8485/ruozeclusterg5</value>
</property>

<property>
    <!--JournalNode存放数据地址 -->
    <name>dfs.journalnode.edits.dir</name>
    <value>/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/data/dfs/jn</value>
</property>
<!--==================DataNode editlog同步 ============================================ -->
<property>
    <!--DataNode,Client连接Namenode识别选择Active NameNode策略 -->
                         <!-- 配置失败自动切换实现方式 -->
    <name>dfs.client.failover.proxy.provider.ruozeclusterg5</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!--==================Namenode fencing:=============================================== -->
<!--Failover后防止停掉的Namenode启动,造成两个服务 -->
<property>
    <name>dfs.ha.fencing.methods</name>
    <value>sshfence</value>
</property>
<property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
    <!--多少milliseconds 认为fencing失败 -->
    <name>dfs.ha.fencing.ssh.connect-timeout</name>
    <value>30000</value>
</property>

<!--==================NameNode auto failover base ZKFC and Zookeeper====================== -->
<!--开启基于Zookeeper  -->
<property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
</property>
<!--动态许可datanode连接namenode列表 -->
 <property>
   <name>dfs.hosts</name>
   <value>/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop/slaves</value>
 </property>
    </configuration>
  修改mapred-site.xml文件,内容如下:
  <?xml version="1.0" encoding="UTF-8"?>
  <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  <configuration>
<!-- 配置 MapReduce Applications -->
<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
</property>
<!-- JobHistory Server ============================================================== -->
<!-- 配置 MapReduce JobHistory Server 地址 ,默认端口10020 -->
<property>
    <name>mapreduce.jobhistory.address</name>
    <value>hadoop001:10020</value>
</property>
<!-- 配置 MapReduce JobHistory Server web ui 地址, 默认端口19888 -->
<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>hadoop001:19888</value>
</property>

   <!-- 配置 Map段输出的压缩,snappy-->
  <property>
     <name>mapreduce.map.output.compress</name> 
     <value>true</value>
  </property>
          
<property>
  <name>mapreduce.map.output.compress.codec</name> 
  <value>org.apache.hadoop.io.compress.SnappyCodec</value>
 </property>

 </configuration>
 修改slave文件,内容如下:
  hadoop001
  hadoop002
  hadoop003
  验证安装
  [[email protected] ~]$ hadoop version
  Hadoop 2.6.0-cdh5.7.0
  Subversion http://github.com/cloudera/hadoop -r           c00978c67b0d3fe9f3b896b5030741bd40bf541a
  Compiled by jenkins on 2016-03-23T18:41Z
  Compiled with protoc 2.5.0
  From source with checksum b2eabfa328e763c88cb14168f9b372
  This command was run using /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/common/hadoop-common-2.6.0-cdh5.7.0.jar
 将文件传输到两台机器上
 [[email protected] app]$ cd scp -r hadoop-2.6.0-cdh5.7.0/ [email protected]:/home/hadoop/app/
 [[email protected] app]$ cd scp -r hadoop-2.6.0-cdh5.7.0/ [email protected]:/home/hadoop/app/

7.启动

7.1启动ZK

在每个机器上启动zk
[[email protected] bin]$ pwd
/home/hadoop/app/zookeeper-3.4.6/bin
[[email protected] bin]$ ./zkServer.sh start
格式化(在任意一个机器上都可以)
[[email protected] app]$ hdfs zkfc -formatZK
查看zk的状态
[[email protected] bin]$ ./zkServer.sh status
可以看到三台机器中zk的状态:mode:leader (hadoop001),mode:follower(hadoop002,hadoop003)
启动ZKFC
ZKFC(zookeeperFailoverController)是用来监控NN状态,协助实现主备NN切换的,所以仅仅在主备NN节点上启动就行。
 在hadoop001,hadoop002中启动zkfc
 [[email protected] sbin]$ pwd
 /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/sbin
[[email protected] sbin]$ ./hadoop-daemon.sh start zkfc
启动JournalNode 用于主备NN之间同步元数据信息的共享存储系统, 在每个JN节点上启动:
[[email protected] sbin]$ ./hadoop-daemon.sh start journalnode

7.2启动hadoop

  格式化hdfs
   [[email protected] sbin]$ pwd
  /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/sbin
 [[email protected] sbin]$ hadoop namenode -format
 在主NN节点执行命令启动NN: 
 [[email protected] sbin]$ ./hadoop-daemon.sh start namenode
 在备NN(hadoop002)上同步主NN的元数据信息
 [[email protected] sbin]$hdfs namenode -bootstrapStandby
 在备NN上执行命令:
 [[email protected] sbin]$./hadoop-daemon.sh start namenode
 设置和确认主NN
  [[email protected] sbin]$ hdfs haadmin -getServiceState nn1
  [[email protected] sbin]$ hdfs haadmin -getServiceState nn2
 本文配置的是自动切换,ZK已经自动选择一个节点作为主NN了,所以这一步可以省略,查看节点状态:hdfs haadmin -transitionToActive nn1
  在主NN上启动Datanode:
  [[email protected] sbin]$ ./hadoop-daemon.sh start datanode
  如果想要一次启动多个datanode,可以使用下面的命令
  [[email protected] sbin]$ ./hadoop-daemons.sh start datanode

7.3启动yarn

  方法一:一次性启动ResourceManager和NodeManager命令:start-yarn.sh
  方法二:分别启动ResourceManager和NodeManager:
  yarn-daemon.sh start resourcemanager
  yarn-daemon.sh start nodemanager(如果有多个datanode,需使用yarn-daemons.sh)
  ResourceManager 也配置了HA,根据命令查看节点状态:
  [[email protected] sbin]$yarn rmadmin –getServiceState serviceid
  启动MR JobHistory Server
  在hadoop001上运行MRJS :
   [[email protected] sbin]$./mr-jobhistory-daemon.sh start historyserver
  然后在页面上查看相关的内容是否可以出来,然后在验证一下hadoop ha和yarn ha功能。

转载于:https://www.jianshu.com/p/1a634da52d57