Hadoop集群搭建
Hadoop集群搭建
1. Hadoop集群运行基础环境
Hadoop集群搭建需要jdk和ssh工具。
1.1 安装jdk
Hadoop是用java语言开发的,Hadoop需要运行在jdk平台上;
-
我这里用的是jdk-8u144-linux-x64.tar.gz,在官网下载https://www.oracle.com/technetwork/java/javase/downloads/index.html
-
解压压缩包,目录可以自己定,后面配置的时候配置自己的目录即可,我这里的解压目录为 /usr/java,命令为:tar -xzvf jdk-8u144-linux-x64.tar.gz,解压后为/usr/java/jdk1.8.0_144
-
配置环境变量
vi /etc/profile
添加以下
export JAVA_HOME=/usr/java/jdk1.8.0_144 export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH export JAVA_PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin export PATH=$PATH:${JAVA_PATH}
为了使配置的环境变量生效,执行:
source /etc/profile
-
查看是否配置成功
java -version 显示如下: java version "1.8.0_144" Java(TM) SE Runtime Environment (build 1.8.0_144-b01) Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
说明成功。
-
三台机器上都要进行相同的配置,可以直接通过scp命令远程拷贝到其他节点。
1.2 设置ssh免密登录
- 设置集群ip
vi /etc/hosts
在主节点上通过
hostname master 命令修改服务器名称为master
其他两个节点同理:
hostname slave1
hostname slave2
根据准备的集群环境的ip添加以下内容:
101.10.113.61 master
101.10.113.62 slave1
101.10.113.63 slave2
-
设置ssh
在三台机器上分别执行
ssh-****** -t rsa
按提示一直回车即可。
然后在主节点上将生成的公钥复制到其他节点
ssh-copy-id -i ~/.ssh/id_rsa.pub master ssh-copy-id -i ~/.ssh/id_rsa.pub slave1 ssh-copy-id -i ~/.ssh/id_rsa.pub slave2
执行完之后可以ssh到slave节点试一试。
2. 安装Hadoop
2.1 下载Hadoop
官网地址,http://hadoop.apache.org/,我下载的是hadoop-2.8.5.tar.gz版本
下载完后解压到 /usr目录下,可以重命名为hadoop,安装好的Hadoop目录为
/usr/hadoop;
2.2 配置Hadoop环境变量
同样是配置/etc/profile文件,通过vi /etc/profile命令,修改后执行source /etc/profile命令保存
export JAVA_HOME=/usr/java/jdk1.8.0_144
export HADOOP_HOME=/usr/hadoop
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH
export JAVA_PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin
export PATH=$PATH:${JAVA_PATH}:$HADOOP_HOME/bin
配置成功后执行 hadoop 命令可以发现以下提示:
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
CLASSNAME run the class named CLASSNAME
or
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
note: please use "yarn jar" to launch
YARN applications, not this command.
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
Hadoop jar and the required libraries
credential interact with credential providers
daemonlog get/set the log level for each daemon
trace view and modify Hadoop tracing settings
Most commands print help when invoked w/o parameters.
说明配置成功。
2.3 Hadoop配置
hadoop的配置建议根据版本参考官网的集群配置,之前在网上我参照了别人的配置,导致slave节点的nodemanager起不来,后面根据官网进行了配置文件的设置才正常。
官网链接:http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
版本变动的时候可能会失效,失效的话直接去官网http://hadoop.apache.org/,点击Getting started即可。
这里我们配置下重要的配置文件:
etc/hadoop/core-site.xml
, etc/hadoop/hdfs-site.xml
, etc/hadoop/yarn-site.xml
and etc/hadoop/mapred-site.xml
-
配置core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <!-- 指定hadoop临时目录,自行创建 --> <property> <name>hadoop.tmp.dir</name> <value>/usr/hadoop/tmp</value> </property> </configuration>
如没有配置hadoop.tmp.dir参数,此时系统默认的临时目录为:/tmp/hadoo-hadoop。而这个目录在每次重启后都会被干掉,必须重新执行format才行,否则会出错。
-
配置hdfs-site.xml
<configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>master:50090</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> #目录要自己建 <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/hadoop/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/hadoop/data</value> </property> </configuration>
-
配置yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property> </configuration>
-
配置maped-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:19888</value> </property> </configuration>
2.4 复制hadoop到其他节点
配置完成后,可以通过scp命令直接把配置好的hadoop复制到其他节点上
scp -r /usr/hadoop [email protected]:/usr
scp -r /usr/hadoop [email protected]:/usr
注意目录保持一致。
3. 启动hadoop集群
-
格式化namenode
在master节点上执行:
hdfs namenode -format
-
启动
切换到/usr/hadoop/sbin目录下,在mster节点上执行:
./start-all.sh
显示如下:
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh Starting namenodes on [master] master: starting namenode, logging to /usr/hadoop/logs/hadoop-root-namenode-master.out slave2: starting datanode, logging to /usr/hadoop/logs/hadoop-root-datanode-slave2.out slave1: starting datanode, logging to /usr/hadoop/logs/hadoop-root-datanode-slave1.out Starting secondary namenodes [master] master: starting secondarynamenode, logging to /usr/hadoop/logs/hadoop-root-secondarynamenode-master.out starting yarn daemons starting resourcemanager, logging to /usr/hadoop/logs/yarn-root-resourcemanager-master.out slave2: starting nodemanager, logging to /usr/hadoop/logs/yarn-root-nodemanager-slave2.out slave1: starting nodemanager, logging to /usr/hadoop/logs/yarn-root-nodemanager-slave1.out
通过 jps命令在master节点上查看:
2449 NameNode 2664 SecondaryNameNode 3118 Jps 2847 ResourceManager
slave1、slave2节点上通过jps命令查看:
2389 NodeManager 2281 DataNode 2543 Jps
说明配置启动成功。
上一篇: Hadoop 1.0架构集群搭建