欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Hadoop集群搭建

程序员文章站 2022-03-08 20:47:40
...

Hadoop集群搭建

1. Hadoop集群运行基础环境

Hadoop集群搭建需要jdk和ssh工具。

1.1 安装jdk

Hadoop是用java语言开发的,Hadoop需要运行在jdk平台上;

  1. 我这里用的是jdk-8u144-linux-x64.tar.gz,在官网下载https://www.oracle.com/technetwork/java/javase/downloads/index.html

  2. 解压压缩包,目录可以自己定,后面配置的时候配置自己的目录即可,我这里的解压目录为 /usr/java,命令为:tar -xzvf jdk-8u144-linux-x64.tar.gz,解压后为/usr/java/jdk1.8.0_144

  3. 配置环境变量

    vi  /etc/profile
    

    添加以下

    export JAVA_HOME=/usr/java/jdk1.8.0_144
    export JRE_HOME=${JAVA_HOME}/jre
    export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH
    export JAVA_PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin
    export PATH=$PATH:${JAVA_PATH}
    
    

    为了使配置的环境变量生效,执行:

    source /etc/profile
    
  4. 查看是否配置成功

    java -version
    显示如下:
    java version "1.8.0_144"
    Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
    Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
    
    

    说明成功。

  5. 三台机器上都要进行相同的配置,可以直接通过scp命令远程拷贝到其他节点。

1.2 设置ssh免密登录

  1. 设置集群ip
   vi /etc/hosts

在主节点上通过

hostname master 命令修改服务器名称为master

其他两个节点同理:

hostname slave1

hostname slave2

根据准备的集群环境的ip添加以下内容:

   101.10.113.61 master
   101.10.113.62 slave1
   101.10.113.63 slave2
  1. 设置ssh

    在三台机器上分别执行

    ssh-****** -t rsa
    

    按提示一直回车即可。

    然后在主节点上将生成的公钥复制到其他节点

    ssh-copy-id -i ~/.ssh/id_rsa.pub master
    ssh-copy-id -i ~/.ssh/id_rsa.pub slave1
    ssh-copy-id -i ~/.ssh/id_rsa.pub slave2
    

    执行完之后可以ssh到slave节点试一试。

2. 安装Hadoop

2.1 下载Hadoop

官网地址,http://hadoop.apache.org/,我下载的是hadoop-2.8.5.tar.gz版本

下载完后解压到 /usr目录下,可以重命名为hadoop,安装好的Hadoop目录为

/usr/hadoop;

2.2 配置Hadoop环境变量

同样是配置/etc/profile文件,通过vi /etc/profile命令,修改后执行source /etc/profile命令保存

export JAVA_HOME=/usr/java/jdk1.8.0_144
export HADOOP_HOME=/usr/hadoop

export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH
export JAVA_PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin
export PATH=$PATH:${JAVA_PATH}:$HADOOP_HOME/bin

配置成功后执行 hadoop 命令可以发现以下提示:

Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
  CLASSNAME            run the class named CLASSNAME
 or
  where COMMAND is one of:
  fs                   run a generic filesystem user client
  version              print the version
  jar <jar>            run a jar file
                       note: please use "yarn jar" to launch
                             YARN applications, not this command.
  checknative [-a|-h]  check native hadoop and compression libraries availability
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
  classpath            prints the class path needed to get the
                       Hadoop jar and the required libraries
  credential           interact with credential providers
  daemonlog            get/set the log level for each daemon
  trace                view and modify Hadoop tracing settings

Most commands print help when invoked w/o parameters.

说明配置成功。

2.3 Hadoop配置

hadoop的配置建议根据版本参考官网的集群配置,之前在网上我参照了别人的配置,导致slave节点的nodemanager起不来,后面根据官网进行了配置文件的设置才正常。

官网链接:http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html

版本变动的时候可能会失效,失效的话直接去官网http://hadoop.apache.org/,点击Getting started即可。

这里我们配置下重要的配置文件:

etc/hadoop/core-site.xml, etc/hadoop/hdfs-site.xml, etc/hadoop/yarn-site.xml and etc/hadoop/mapred-site.xml

  1. 配置core-site.xml

    <configuration>
         <property>
                <name>fs.defaultFS</name>
                <value>hdfs://master:9000</value>
         </property>
    
        <property>
             <name>io.file.buffer.size</name>
             <value>131072</value>
           </property>
    
            <!-- 指定hadoop临时目录,自行创建 -->
            <property>
               <name>hadoop.tmp.dir</name>
               <value>/usr/hadoop/tmp</value>
            </property>
    </configuration>
    

如没有配置hadoop.tmp.dir参数,此时系统默认的临时目录为:/tmp/hadoo-hadoop。而这个目录在每次重启后都会被干掉,必须重新执行format才行,否则会出错。

  1. 配置hdfs-site.xml

    <configuration>
        <property>
          <name>dfs.namenode.secondary.http-address</name>
          <value>master:50090</value>
        </property>
        <property>
          <name>dfs.replication</name>
          <value>2</value>
        </property>
        #目录要自己建
        <property>
          <name>dfs.namenode.name.dir</name>
          <value>file:/usr/hadoop/name</value>
        </property>
        <property>
          <name>dfs.datanode.data.dir</name>
          <value>file:/usr/hadoop/data</value>
        </property>
    </configuration>
    
  2. 配置yarn-site.xml

    <configuration>
         <property>
              <name>yarn.nodemanager.aux-services</name>
              <value>mapreduce_shuffle</value>
         </property>
         <property>
               <name>yarn.resourcemanager.address</name>
               <value>master:8032</value>
         </property>
         <property>
              <name>yarn.resourcemanager.scheduler.address</name>
              <value>master:8030</value>
          </property>
         <property>
             <name>yarn.resourcemanager.resource-tracker.address</name>
             <value>master:8031</value>
         </property>
         <property>
             <name>yarn.resourcemanager.admin.address</name>
             <value>master:8033</value>
         </property>
         <property>
             <name>yarn.resourcemanager.webapp.address</name>
             <value>master:8088</value>
         </property>
    </configuration>
    
  3. 配置maped-site.xml

    <configuration>
     <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
      </property>
      <property>
         <name>mapreduce.jobhistory.address</name>
         <value>master:10020</value>
      </property>
      <property>
          <name>mapreduce.jobhistory.address</name>
          <value>master:19888</value>
      </property>
    </configuration>
    

2.4 复制hadoop到其他节点

配置完成后,可以通过scp命令直接把配置好的hadoop复制到其他节点上

scp -r /usr/hadoop [email protected]:/usr
scp -r /usr/hadoop [email protected]:/usr

注意目录保持一致。

3. 启动hadoop集群

  1. 格式化namenode

    在master节点上执行:

    hdfs namenode -format
    
  2. 启动

    切换到/usr/hadoop/sbin目录下,在mster节点上执行:

    ./start-all.sh
    

    显示如下:

    This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
    Starting namenodes on [master]
    master: starting namenode, logging to /usr/hadoop/logs/hadoop-root-namenode-master.out
    slave2: starting datanode, logging to /usr/hadoop/logs/hadoop-root-datanode-slave2.out
    slave1: starting datanode, logging to /usr/hadoop/logs/hadoop-root-datanode-slave1.out
    Starting secondary namenodes [master]
    master: starting secondarynamenode, logging to /usr/hadoop/logs/hadoop-root-secondarynamenode-master.out
    starting yarn daemons
    starting resourcemanager, logging to /usr/hadoop/logs/yarn-root-resourcemanager-master.out
    slave2: starting nodemanager, logging to /usr/hadoop/logs/yarn-root-nodemanager-slave2.out
    slave1: starting nodemanager, logging to /usr/hadoop/logs/yarn-root-nodemanager-slave1.out
    
    

    通过 jps命令在master节点上查看:

    2449 NameNode
    2664 SecondaryNameNode
    3118 Jps
    2847 ResourceManager
    

    slave1、slave2节点上通过jps命令查看:

    2389 NodeManager
    2281 DataNode
    2543 Jps
    

    说明配置启动成功。

相关标签: hadoop