欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  科技

Spark实战之Spark2.0环境搭建教程

程序员文章站 2022-07-01 22:55:03
环境安装软件准备 CentOS-7-x86_64-Everything-1611.iso spark-2.0.1-bin-hadoop2.7.tgz hadoop-2.7.3.tar.g...
环境安装软件准备

CentOS-7-x86_64-Everything-1611.iso

spark-2.0.1-bin-hadoop2.7.tgz

hadoop-2.7.3.tar.gz

scala-2.11.8.tgz

jdk-8u91-linux-x64.tar.gz

建立Linux虚拟机(全节点)

客户机操作系统:CentOS-7-x86_64。

网络和主机名设置:

常规选项卡:可用时自动连接到这个网络,打勾。

IPv4选项卡设置如下:

hostname Address Netmask Gateway sparkmaster 192.168.169.221 255.255.255.0   sparknode1 192.168.169.222 255.255.255.0   sparknode2 192.168.169.223 255.255.255.0  

安装类型:最小安装

创建用户(全节点)
su root
useradd spark
passwd spark
su spark
cd ~
pwd
mkdir softwares
修改语系为英文语系(全节点)
# 显示目前所支持的语系
locale

LANG=en_US.utf8
export LC_ALL=en_US.utf8

# 修改系统预设
cat /etc/locale.conf

LANG=en_US.utf8
修改hostname(全节点)
vi /etc/hostname

# 192.168.169.221
sparkmaster
# 192.168.169.222
sparknode1
# 192.168.169.223
sparknode2
修改hosts(全节点)
su root
vi /etc/hosts

192.168.169.221 sparkmaster
192.168.169.222 sparknode1
192.168.169.223 sparknode2

为了使集群能够用域名在Windows下访问,Windows下配置hosts的路径为:C:\Windows\System32\drivers\etc。

配置固定IP(全节点)
vi /etc/sysconfig/network-scripts/ifcfg-ens33

# BOOTPROTO=dhcp
BOOTPROTO=static
IPADDR0=xxx
GATEWAY0=xxx
NETMASK=xxx
DNS1=xxx

systemctl restart network
关闭防火墙(全节点)
systemctl status firewalld.service

systemctl stop firewalld.service
systemctl disable firewalld.service
配置无密钥登录(全节点)
su spark
cd ~
ssh-keygen -t rsa -P '' 将每个节点生成的id_rsa.pub里面的内容拷贝出来 将所有节点拷贝好的公钥一起拷贝到每个节点用户家目录下的.ssh的authorized_keys这个文件中 每个节点的authorized_keys这个文件访问权限必须改成600,chmod 600 authoried_keys 上传软件(master节点)

把环境安装准备的软件jdk、Hadoop、Spark、Scala上传到sparkmaster:/home/spark/softwares

安装jdk(master节点)
tar -zxvf jdk-8u91-linux-x64.tar.gz
vi ~/.bashrc

export JAVA_HOME=/home/spark/softwares/jdk1.8.0_91
export PATH=$PATH:$JAVA_HOME/bin

source ~/.bashrc
which java
安装Scala(master节点)
tar -zxvf scala-2.11.8.tgz
vi ~/.bashrc

export SCALA_HOME=/home/spark/softwares/scala-2.11.8
export PATH=$PATH:$JAVA_HOME/bin:$SCALA_HOME/bin

source ~/.bashrc
which scala
安装Hadoop(master节点)
tar -zxvf hadoop-2.7.3.tar.gz

Hadoop配置文件所在目录:/home/spark/softwares/hadoop-2.7.3/etc/hadoop

core-site.xml


    fs.defaultFS
    hdfs://sparkmaster:8082

hdfs-site.xml


    dfs.name.dir
    file:/home/spark/softwares/hadoop-2.7.3/hdfs/name

    dfs.data.dir
    file:/home/spark/softwares/hadoop-2.7.3/hdfs/data

    dfs.replication
    3

    dfs.namenode.secondary.http-address
    sparkmaster:9001

    dfs.webhdfs.enabled
    true

masters

sparkmaster

slaves

sparkmaster
sparknode1
sparknode2

hadoop-env.sh

export JAVA_HOME=${JAVA_HOME}

环境变量

vi ~/.bashrc

export HADOOP_HOME=/home/spark/softwares/hadoop-2.7.3
export PATH=$PATH:$JAVA_HOME/bin:$SCALA_HOME/bin:$HADOOP_HOME/bin

source ~/.bashrc
安装Spark(master节点)
tar -zxvf spark-2.0.1-bin-hadoop2.7.tgz

# /home/spark/softwares/spark-2.0.1-bin-hadoop2.7/conf

vi slaves

sparkmaster
sparknode1
sparknode2

vi spark-env.sh

export SPARK_HOME=$SPARK_HOME
export HADOOP_HOME=$HADOOP_HOME
export MASTER=spark://sparkmaster:7077
export SCALE_HOME=$SCALE_HOME
export SPARK_MASTER_IP=sparkmaster
vi ~/.bashrc

export SPARK_HOME=/home/spark/softwares/spark-2.0.1-bin-hadoop2.7
export PATH=$PATH:$JAVA_HOME/bin:$SCALA_HOME/bin:$HADOOP_HOME/bin:$SPARK_HOME/bin

source ~/.bashrc
搭建本地yum源(local方式)(master节点)

挂载iso镜像文件,拷贝文件内容

su root
mkdir -p /mnt/CentOS /mnt/dvd
mount /dev/cdrom /mnt/dvd
df -h
cp -av /mnt/dvd/* /mnt/CentOS
umount /mnt/dvd

备份原有yum配置文件,

cd /etc/yum.repos.d
rename .repo .repo.bak *.repo

新建yum配置文件

vi /etc/yum.repos.d/local.repo

[local]
name=CentOS-$releasever - Local
baseurl=file:///mnt/CentOS
enabled=1
gpgcheck=0

# 验证
yum list | grep mysql
搭建本地yum源(http方式)(master节点)

启动httpd服务

# 验证是否安装httpd服务
rpm -qa|grep httpd
# yum install -y httpd
yum install -y httpd
# 启动httpd服务
# service httpd start
systemctl status httpd.service
systemctl start httpd.service
# 设置httpd服务开机自启动
# chkconfig httpd on
systemctl is-enabled httpd.service
systemctl enable httpd.service

安装yum源

# 在/var/www/html/下创建文件夹CentOS7
mkdir -p /var/www/html/CentOS7

# 将iso文件中的内容copy到CentOS7
# cp -av /mnt/CentOS/* /var/www/html/CentOS7/
# rm -rf /mnt/CentOS/*
mv /mnt/CentOS/* /var/www/html/CentOS7/

利用ISO镜像,yum源搭建OK。浏览器验证访问:

http://sparkmaster/CentOS7/

使用yum源

# 备份原有的repo文件
# mkdir -p /etc/yum.repos.d/repo.bak
# cd /etc/yum.repos.d/
# cp *.repo *.repo.bak repo.bak/
# rm -rf *.repo *.repo.bak

cd /etc/yum.repos.d/
# 新建文件CentOS-http.repo
vi CentOS-http.repo

[http]
name=CentOS-$releasever - http
baseurl=http://sparkmaster:80/CentOS7/
enabled=1
gpgcheck=1
gpgkey=http://sparkmaster:80/CentOS7/RPM-GPG-KEY-CentOS-7

# 把前面搭建的本地yum源禁用,设置local.repo中的enabled=0

# 更新yum源
yum clean
yum repolist
集群yum源配置(http方式)(全节点)
# sparknode1/sparknode2
cd /etc/yum.repos.d
rename .repo .repo.bak *.repo

# sparkmaster
scp /etc/yum.repos.d/*.repo sparknode1:/etc/yum.repos.d/
scp /etc/yum.repos.d/*.repo sparknode2:/etc/yum.repos.d/
异步传输工具(全节点)

利用异步传输工具进行master节点下/home/spark/softwares所安装软件jdk、Hadoop、Spark、Scala的同步。

rpm -qa | grep rsync
yum list | grep rsync
yum install -y rsync

vi sync_tools.sh

echo "-----begin to sync jobs to other workplat-----"
SERVER_LIST='sparknode1 sparknode2'
for SERVER in $SERVER_LIST
do
    rsync -avz ./* $SERVER:/home/spark/softwares
done
echo "-----sync jobs is done-----"
cd ~/softwares
chmod 700 sync_tools.sh
./sync_tools.sh
环境变量配置同步(全节点)
# sparknode1/sparknode2
mv ~/.bashrc ~/.bashrc.bak

# sparkmaster
su spark
scp ~/.bashrc sparknode1:~/.bashrc
scp ~/.bashrc sparknode2:~/.bashrc

# sparknode1/sparknode2
source ~/.bashrc
启动Spark及验证
cd $SPRAK_HOME
cd sbin
./stop-all.sh
./start-all.sh
jps

验证:

http://sparkmaster:8080/

启动HDFS及验证
cd $HADOOP_HOME
# 格式化
hadoop namenode -format
cd ../sbin
./stop-all.sh
./start-dfs.sh
jps

验证:

http://sparkmaster:50070

至此,Spark2.0环境搭建结束。