hadoop2.7.3完全分布式集群搭建
hadoop2.7.3完全分布式集群搭建
系统及软件配置:
Centos 7
jdk-8u131-linux-x64.tar.gz
hadoop2.7.3
节点:
spark1(192.168.6.137)
spark2(192.168.6.138)
spark3(192.168.6.139)
1.首先设置每个节点的静态IP地址
分别对应 192.168.6.137, 192.168.6.138, 192.168.6.139
设置方法在之前的博客已经设置,这里不在阐述。详细请看Vmware虚拟机设置静态IP地址
2.配置host
编辑spark1/etc/hosts,添加如下代码
192.168.6.137 spark1
192.168.6.138 spark2
192.168.6.139 spark3
3.jdk安装
利用SecureCRT对spark1上传jdk,解压下载的jdk1.8.0_131
tar -zxvf jdk-8u131-linux-x64.tar.gz -C /usr/localhost
spark1节点配置环境变量,vi /etc/profile
# jdk environment
alias cdha='cd /usr/local/hadoop-2.7.3/etc/hadoop'
export JAVA_HOME=/usr/local/jdk1.8.0_131
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin:$PATH
4.spark1节点分布式集群配置
修改hadoop-env.sh(目录/usr/local/hadoop-2.7.3/etc/hadoop)
export JAVA_HOME=/usr/local/jdk1.8.0_131
修改core-site.xml(目录/usr/local/hadoop-2.7.3/etc/hadoop)
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://spark1:9000</value>
</property>
</configuration>
修改hdfs-site.xml(目录/usr/local/hadoop-2.7.3/etc/hadoop)
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/usr/local/data/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/data/datanode</value>
</property>
<property>
<name>dfs.tmp.dir</name>
<value>/usr/local/data/tmp</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
修改mapred-site.xml(目录/usr/local/hadoop-2.7.3/etc/hadoop)
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
修改yarn-site.xml(目录/usr/local/hadoop-2.7.3/etc/hadoop)
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>spark1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
修改slaves(目录/usr/local/hadoop-2.7.3/etc/hadoop)
spark1
spark2
spark3
5.通过vmware对spark1整个系统进行复制
点击spark1虚拟机,右键-》管理-》克隆
克隆源选择=》虚拟机中的当前状态
克隆类型=》创建完整克隆
把spark1克隆,生成spark2和spark3
6.配置免密码登录本机及集群之间的机器
spark1
ssh-****** -t rsa
cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
spark2
ssh-****** -t rsa
cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
spark3
ssh-****** -t rsa
cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
然后
spark1
ssh-copy-id -i spark2
ssh-copy-id -i spark3
spark2
ssh-copy-id -i spark1
ssh-copy-id -i spark3
spark3
ssh-copy-id -i spark1
ssh-copy-id -i spark2
7.启动hadoop集群
在spark1上格式化namenode
hdfs namenode -format
启动
start-dfs.sh
[root@spark1 hadoop]# jps
3345 Jps
3236 SecondaryNameNode
3078 DataNode
2952 NameNode
[root@spark2 hadoop]# jps
2025 Jps
1951 DataNode
[root@spark3 hadoop]# jps
1970 DataNode
2035 Jps
启动
start-yarn.sh
[root@spark1 hadoop]# jps
1840 ResourceManager
1521 DataNode
2289 Jps
2005 NodeManager
1658 SecondaryNameNode
1389 NameNode
[root@spark2 ~]# jps
1173 NodeManager
1063 DataNode
1319 Jps
[root@spark3 ~]# jps
1312 Jps
1176 NodeManager
1066 DataNode
7.通过网页可以观看资源情况
上一篇: Minio分布式集群搭建部署
下一篇: 关于PHP变量分离及引用的问题