Spark实战之Spark2.0环境搭建教程
程序员文章站
2022-04-01 20:46:47
环境安装软件准备
CentOS-7-x86_64-Everything-1611.iso
spark-2.0.1-bin-hadoop2.7.tgz
hadoop-2.7.3.tar.g...
环境安装软件准备
CentOS-7-x86_64-Everything-1611.iso
spark-2.0.1-bin-hadoop2.7.tgz
hadoop-2.7.3.tar.gz
scala-2.11.8.tgz
jdk-8u91-linux-x64.tar.gz
建立Linux虚拟机(全节点)客户机操作系统:CentOS-7-x86_64。
网络和主机名设置:
常规选项卡:可用时自动连接到这个网络,打勾。
IPv4选项卡设置如下:
hostname Address Netmask Gateway sparkmaster 192.168.169.221 255.255.255.0 sparknode1 192.168.169.222 255.255.255.0 sparknode2 192.168.169.223 255.255.255.0安装类型:最小安装
创建用户(全节点)su root useradd spark passwd spark su spark cd ~ pwd mkdir softwares修改语系为英文语系(全节点)
# 显示目前所支持的语系 locale LANG=en_US.utf8 export LC_ALL=en_US.utf8 # 修改系统预设 cat /etc/locale.conf LANG=en_US.utf8修改hostname(全节点)
vi /etc/hostname # 192.168.169.221 sparkmaster # 192.168.169.222 sparknode1 # 192.168.169.223 sparknode2修改hosts(全节点)
su root vi /etc/hosts 192.168.169.221 sparkmaster 192.168.169.222 sparknode1 192.168.169.223 sparknode2
为了使集群能够用域名在Windows下访问,Windows下配置hosts的路径为:C:\Windows\System32\drivers\etc。
配置固定IP(全节点)vi /etc/sysconfig/network-scripts/ifcfg-ens33 # BOOTPROTO=dhcp BOOTPROTO=static IPADDR0=xxx GATEWAY0=xxx NETMASK=xxx DNS1=xxx systemctl restart network关闭防火墙(全节点)
systemctl status firewalld.service systemctl stop firewalld.service systemctl disable firewalld.service配置无密钥登录(全节点)
su spark cd ~ssh-keygen -t rsa -P '' 将每个节点生成的id_rsa.pub里面的内容拷贝出来 将所有节点拷贝好的公钥一起拷贝到每个节点用户家目录下的.ssh的authorized_keys这个文件中 每个节点的authorized_keys这个文件访问权限必须改成600,chmod 600 authoried_keys 上传软件(master节点)
把环境安装准备的软件jdk、Hadoop、Spark、Scala上传到sparkmaster:/home/spark/softwares
安装jdk(master节点)tar -zxvf jdk-8u91-linux-x64.tar.gz vi ~/.bashrc export JAVA_HOME=/home/spark/softwares/jdk1.8.0_91 export PATH=$PATH:$JAVA_HOME/bin source ~/.bashrc which java安装Scala(master节点)
tar -zxvf scala-2.11.8.tgz vi ~/.bashrc export SCALA_HOME=/home/spark/softwares/scala-2.11.8 export PATH=$PATH:$JAVA_HOME/bin:$SCALA_HOME/bin source ~/.bashrc which scala安装Hadoop(master节点)
tar -zxvf hadoop-2.7.3.tar.gz
Hadoop配置文件所在目录:/home/spark/softwares/hadoop-2.7.3/etc/hadoop
core-site.xml
fs.defaultFS hdfs://sparkmaster:8082
hdfs-site.xml
dfs.name.dir file:/home/spark/softwares/hadoop-2.7.3/hdfs/name dfs.data.dir file:/home/spark/softwares/hadoop-2.7.3/hdfs/data dfs.replication 3 dfs.namenode.secondary.http-address sparkmaster:9001 dfs.webhdfs.enabled true
masters
sparkmaster
slaves
sparkmaster sparknode1 sparknode2
hadoop-env.sh
export JAVA_HOME=${JAVA_HOME}
环境变量
vi ~/.bashrc export HADOOP_HOME=/home/spark/softwares/hadoop-2.7.3 export PATH=$PATH:$JAVA_HOME/bin:$SCALA_HOME/bin:$HADOOP_HOME/bin source ~/.bashrc安装Spark(master节点)
tar -zxvf spark-2.0.1-bin-hadoop2.7.tgz # /home/spark/softwares/spark-2.0.1-bin-hadoop2.7/conf vi slaves sparkmaster sparknode1 sparknode2 vi spark-env.sh export SPARK_HOME=$SPARK_HOME export HADOOP_HOME=$HADOOP_HOME export MASTER=spark://sparkmaster:7077 export SCALE_HOME=$SCALE_HOME export SPARK_MASTER_IP=sparkmaster
vi ~/.bashrc export SPARK_HOME=/home/spark/softwares/spark-2.0.1-bin-hadoop2.7 export PATH=$PATH:$JAVA_HOME/bin:$SCALA_HOME/bin:$HADOOP_HOME/bin:$SPARK_HOME/bin source ~/.bashrc搭建本地yum源(local方式)(master节点)
挂载iso镜像文件,拷贝文件内容
su root mkdir -p /mnt/CentOS /mnt/dvd mount /dev/cdrom /mnt/dvd df -h cp -av /mnt/dvd/* /mnt/CentOS umount /mnt/dvd
备份原有yum配置文件,
cd /etc/yum.repos.d rename .repo .repo.bak *.repo
新建yum配置文件
vi /etc/yum.repos.d/local.repo [local] name=CentOS-$releasever - Local baseurl=file:///mnt/CentOS enabled=1 gpgcheck=0 # 验证 yum list | grep mysql搭建本地yum源(http方式)(master节点)
启动httpd服务
# 验证是否安装httpd服务 rpm -qa|grep httpd # yum install -y httpd yum install -y httpd # 启动httpd服务 # service httpd start systemctl status httpd.service systemctl start httpd.service # 设置httpd服务开机自启动 # chkconfig httpd on systemctl is-enabled httpd.service systemctl enable httpd.service
安装yum源
# 在/var/www/html/下创建文件夹CentOS7 mkdir -p /var/www/html/CentOS7 # 将iso文件中的内容copy到CentOS7 # cp -av /mnt/CentOS/* /var/www/html/CentOS7/ # rm -rf /mnt/CentOS/* mv /mnt/CentOS/* /var/www/html/CentOS7/
利用ISO镜像,yum源搭建OK。浏览器验证访问:
http://sparkmaster/CentOS7/
使用yum源
# 备份原有的repo文件 # mkdir -p /etc/yum.repos.d/repo.bak # cd /etc/yum.repos.d/ # cp *.repo *.repo.bak repo.bak/ # rm -rf *.repo *.repo.bak cd /etc/yum.repos.d/ # 新建文件CentOS-http.repo vi CentOS-http.repo [http] name=CentOS-$releasever - http baseurl=http://sparkmaster:80/CentOS7/ enabled=1 gpgcheck=1 gpgkey=http://sparkmaster:80/CentOS7/RPM-GPG-KEY-CentOS-7 # 把前面搭建的本地yum源禁用,设置local.repo中的enabled=0 # 更新yum源 yum clean yum repolist集群yum源配置(http方式)(全节点)
# sparknode1/sparknode2 cd /etc/yum.repos.d rename .repo .repo.bak *.repo # sparkmaster scp /etc/yum.repos.d/*.repo sparknode1:/etc/yum.repos.d/ scp /etc/yum.repos.d/*.repo sparknode2:/etc/yum.repos.d/异步传输工具(全节点)
利用异步传输工具进行master节点下/home/spark/softwares所安装软件jdk、Hadoop、Spark、Scala的同步。
rpm -qa | grep rsync yum list | grep rsync yum install -y rsync vi sync_tools.sh echo "-----begin to sync jobs to other workplat-----" SERVER_LIST='sparknode1 sparknode2' for SERVER in $SERVER_LIST do rsync -avz ./* $SERVER:/home/spark/softwares done echo "-----sync jobs is done-----"
cd ~/softwares chmod 700 sync_tools.sh ./sync_tools.sh环境变量配置同步(全节点)
# sparknode1/sparknode2 mv ~/.bashrc ~/.bashrc.bak # sparkmaster su spark scp ~/.bashrc sparknode1:~/.bashrc scp ~/.bashrc sparknode2:~/.bashrc # sparknode1/sparknode2 source ~/.bashrc启动Spark及验证
cd $SPRAK_HOME cd sbin ./stop-all.sh ./start-all.sh jps
验证:
http://sparkmaster:8080/
启动HDFS及验证cd $HADOOP_HOME # 格式化 hadoop namenode -format cd ../sbin ./stop-all.sh ./start-dfs.sh jps
验证:
http://sparkmaster:50070
至此,Spark2.0环境搭建结束。