Spark3.0.1 高可用分布式集群搭建(HA模式)
Spark3.0.1 高可用分布式集群搭建(HA模式)
Spark Standalone集群是Master-Slaves架构的集群模式,当Worker调度出现问题的时候会自动弹性容错,可以将出错的Task调度到其他Worker执行,但存在着Master单点故障的问题。
这里基于zookeeper的Standby Masters(Standby Master with zookeeper)实现高可用。基本原理是通过zookeeper来选举一个Master,其他的Master处于Standby状态。将spark集群连接到同一个Zookeeper实例并启动多个Master,利用zookeeper提供的选举和状态保存功能,使一个Master被选举为active状态,其他Master处于Standby状态。如果现任Master死去,另一个Master会通过选举产生,并恢复到旧的Master状态,然后恢复调度,整个过程可能要1-2分钟。
一、修改配置文件
在Spark Standalone配置的基础上,修改/moudle/spark-3.0.1/conf/spark-env.sh配置文件,主要包括注释SPARK_MASTER_HOST,增加SPARK_DAEMON_JAVA_OPTS配置。
vim /moudle/spark-3.0.1/conf/spark-env.sh
#文件末尾加入以下内容
export JAVA_HOME=/moudle/jdk1.8
export HADOOP_HOME=/moudle/hadoop-3.3.0
export HADOOP_CONF_DIR=/moudle/hadoop-3.3.0/etc/hadoop
#注释SPARK_MASTER_HOST
export SPARK_MASTER_HOST=bigdata1
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_CORES=1
export SPARK_DRIVER_MEMORY=1g
export SPARK_WORKER_MEMORY=1g
export SPARK_MASTER_WEBUI_PORT=8080
#日志聚合
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=4000
-Dspark.history.retainedApplications=3
-Dspark.history.fs.logDirectory=hdfs://bigdata1:9000/sparklog"
#增加下面设置
#增加Spark运行时参数,从而指定Zookeeper的位置
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=bigdata1:2181,bigdata2:2181,bigdata3:2181 -Dspark.deploy.zookeeper.dir=/spark"
二、启动Zookeeper集群
bigdata1、bigdata2、bigdata3三个节点启动Zookeeper
zkServer.sh start
三、启动Hadoope集群
Spark主要用到HDFS
start-all.sh
四、启动Spark集群
bigdata1节点启动Spark集群
start-spark-all.sh
bigdata2节点再启动一个Master节点
start-master.sh
启动日志聚合功能
[root@bigdata1 conf]# start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to /moudle/spark-3.0.1/logs/spark-root-org.apache.spark.deploy.history.HistoryServer-1-bigdata1.out
#通过/moudle/spark-3.0.1/logs/spark-root-org.apache.spark.deploy.history.HistoryServer-1-bigdata1.out可以查看日志聚合网页
cat /moudle/spark-3.0.1/logs/spark-root-org.apache.spark.deploy.history.HistoryServer-1-bigdata1.out
#2020-11-11 11:11:59,943 INFO history.HistoryServer: Bound HistoryServer to 0.0.0.0, and started at http://bigdata1:4000
#发现日志查看网页是http://bigdata1:4000
五、查看进程
查看bigdata1进程
[root@bigdata1 conf]# jps
2272 QuorumPeerMain
3459 DFSZKFailoverController
17523 Worker
3860 ResourceManager
2949 DataNode
4006 NodeManager
18470 HistoryServer
17433 Master
18747 Jps
2797 NameNode
3213 JournalNode
查看bigdata2进程
[root@bigdata2 conf]# jps
2114 DataNode
2036 NameNode
8120 Worker
2569 NodeManager
2219 JournalNode
2347 DFSZKFailoverController
2491 ResourceManager
8204 Master
8780 Jps
1646 QuorumPeerMain
查看bigdata3进程
[root@bigdata3 conf]# jps
13027 Jps
12134 Worker
2567 NodeManager
1946 QuorumPeerMain
2428 JournalNode
2319 DataNode
六、浏览器查看Spark集群
http://192.168.239.131:8081/
bigdata1节点状态为:
Status: ALIVE
http://192.168.239.132:8082/
bigdata2节点状态为:
Status: STANDBY
查看Spark应用程序日志页面
http://192.168.239.131:4000/
测试Master节点故障自动切换
[root@bigdata1 conf]# stop-master.sh
stopping org.apache.spark.deploy.master.Master
停止bigdata1节点的Master后,bigdata1节点页面无法访问。
查看bigdata2节点页面,发现bigdata2节点状态从STANDBY状态切换为ALIVE状态。
七、Spark应用程序实例
高可用模式下涉及到多个Master,应用程序需要知道当前的Master的IP地址和端口,因此需要向SparkContext提交一个Master列表。应用程序会轮寻列表,找到活着的Master,有几个Master在–master参数后面就写几个host:port。
[root@bigdata1 conf]# spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://bigdata1:7077,bigdata2:7077,bigdata3:7077 \
--executor-memory 1G \
--total-executor-cores 1 \
/moudle/spark-3.0.1/examples/jars/spark-examples_2.12-3.0.1.jar \
100
运行结果输出如下:
[root@bigdata1 conf]# spark-submit --class org.apache.spark.examples.SparkPi --master spark://bigdata1:7077,bigdata2:7077,bigdata3:7077 --executor-memory 1G --total-executor-cores 1 /moudle/spark-3.0.1/examples/jars/spark-examples_2.12-3.0.1.jar 100
2020-11-11 12:07:29,138 INFO spark.SparkContext: Running Spark version 3.0.1
2020-11-11 12:07:29,255 INFO resource.ResourceUtils: ==============================================================
2020-11-11 12:07:29,257 INFO resource.ResourceUtils: Resources for spark.driver:
2020-11-11 12:07:29,257 INFO resource.ResourceUtils: ==============================================================
2020-11-11 12:07:30,019 INFO util.Utils: Successfully started service 'sparkDriver' on port 34971.
2020-11-11 12:07:30,246 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-74311b0f-855c-43a8-8090-88596c5b2d19
2020-11-11 12:07:30,307 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MiB
ServerConnector@42a9a63e{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2020-11-11 12:07:30,823 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
2020-11-11 12:07:30,943 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://bigdata1:4040
2020-11-11 12:07:30,979 INFO spark.SparkContext: Added JAR file:/moudle/spark-3.0.1/examples/jars/spark-examples_2.12-3.0.1.jar at spark://bigdata1:34971/jars/spark-examples_2.12-3.0.1.jar with timestamp 1605067650978
2020-11-11 12:07:31,035 WARN spark.SparkContext: Please ensure that the number of slots available on your executors is limited by the number of cores to task cpus and not another custom resource. If cores is not the limiting resource then dynamic allocation will not work properly!
2020-11-11 12:07:31,592 INFO client.StandaloneAppClient$ClientEndpoint: Connecting to master spark://bigdata1:7077...
2020-11-11 12:07:31,593 INFO client.StandaloneAppClient$ClientEndpoint: Connecting to master spark://bigdata2:7077...
2020-11-11 12:07:31,594 INFO client.StandaloneAppClient$ClientEndpoint: Connecting to master spark://bigdata3:7077...
2020-11-11 12:07:31,759 INFO client.TransportClientFactory: Successfully created connection to bigdata2/192.168.239.132:7077 after 98 ms (0 ms spent in bootstraps)
2020-11-11 12:07:31,762 INFO client.TransportClientFactory: Successfully created connection to bigdata1/192.168.239.131:7077 after 90 ms (0 ms spent in bootstraps)
2020-11-11 12:07:31,780 INFO client.TransportClientFactory: Successfully created connection to bigdata3/192.168.239.133:7077 after 104 ms (0 ms spent in bootstraps)
2020-11-11 12:07:32,168 INFO cluster.StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20201111120732-0003
2020-11-11 12:07:32,188 INFO client.StandaloneAppClient$ClientEndpoint: Executor added: app-20201111120732-0003/0 on worker-20201111104311-192.168.239.132-37607 (192.168.239.132:37607) with 1 core(s)
2020-11-11 12:07:32,193 INFO cluster.StandaloneSchedulerBackend: Granted executor ID app-20201111120732-0003/0 on hostPort 192.168.239.132:37607 with 1 core(s), 1024.0 MiB RAM
2020-11-11 12:07:32,253 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 36803.
2020-11-11 12:07:32,255 INFO netty.NettyBlockTransferService: Server created on bigdata1:36803
2020-11-11 12:07:32,282 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2020-11-11 12:07:32,307 INFO client.StandaloneAppClient$ClientEndpoint: Executor updated: app-20201111120732-0003/0 is now RUNNING
2020-11-11 12:07:36,882 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 100 tasks
2020-11-11 12:07:39,581 INFO resource.ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
2020-11-11 12:07:40,922 INFO cluster.CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.239.132:50830) with ID 0
2020-11-11 12:07:41,359 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.239.132:36510 with 413.9 MiB RAM, BlockManagerId(0, 192.168.239.132, 36510, None)
2020-11-11 12:07:41,518 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 192.168.239.132, executor 0, partition 0, PROCESS_LOCAL, 7397 bytes)
2020-11-11 12:07:42,615 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.239.132:36510 (size: 1815.0 B, free: 413.9 MiB)
2020-11-11 12:07:43,978 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 192.168.239.132, executor 0, partition 1, PROCESS_LOCAL, 7397 bytes)
2020-11-11 12:07:43,994 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 2510 ms on 192.168.239.132 (executor 0) (1/100)
2020-11-11 12:07:49,057 INFO scheduler.TaskSetManager: Finished task 99.0 in stage 0.0 (TID 99) in 45 ms on 192.168.239.132 (executor 0) (100/100)
2020-11-11 12:07:49,059 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
2020-11-11 12:07:49,110 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 12.548 s
2020-11-11 12:07:49,124 INFO scheduler.DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
2020-11-11 12:07:49,125 INFO scheduler.TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished
2020-11-11 12:07:49,131 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 13.017405 s
#输出结果如下:
Pi is roughly 3.1415607141560713
2020-11-11 12:07:49,174 INFO server.AbstractConnector: Stopped Spark@42a9a63e{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2020-11-11 12:07:49,177 INFO ui.SparkUI: Stopped Spark web UI at http://bigdata1:4040
2020-11-11 12:07:49,186 INFO cluster.StandaloneSchedulerBackend: Shutting down all executors
2020-11-11 12:07:49,762 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-63cbd8f4-e267-4d7a-99fe-12d519afd369
2020-11-11 12:07:49,767 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-2c9874ac-4195-4f9b-acba-e2ac9f139c6a
至此,完成Spark3.0.1 高可用分布式集群搭建(HA模式)。
本文地址:https://blog.csdn.net/zhengzaifeidelushang/article/details/109589327
上一篇: Linux usb system(descriptor)
下一篇: 树莓派基础操作-经验总结
推荐阅读
-
搭建高可用的redis集群,避免standalone模式带给你的苦难
-
搭建高可用的redis集群,避免standalone模式带给你的苦难
-
一张图讲解最少机器搭建FastDFS高可用分布式集群安装说明
-
Spark快速入门系列(6) | Spark环境搭建—standalone(3) 配置HA高可用模式
-
Rancher2.2.2-HA 高可用k8s容器集群搭建
-
Hadoop HA 高可用集群搭建
-
三分钟快速搭建分布式高可用的Redis集群
-
Spark3.0.1 高可用分布式集群搭建(HA模式)
-
详解三分钟快速搭建分布式高可用的Redis集群
-
Flink 集群搭建,Standalone,集群部署,HA高可用部署