欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

Spark3.0.1 高可用分布式集群搭建(HA模式)

程序员文章站 2022-06-22 22:41:54
十三、实现Spark集群的高可用搭建Spark Standalone集群是Master-Slaves架构的集群模式,和大部分的Master-Slaves结构集群一样,存在着Master单点故障的问题。这里基于zookeeper的Standby Masters(Standby Master with zookeeper)实现高可用。基本原理是通过zookeeper来选举一个Master,其他的Master处于Standby状态。将spark集群连接到同一个Zookeeper实例并启动多个Master,利用z...

Spark3.0.1 高可用分布式集群搭建(HA模式)

Spark Standalone集群是Master-Slaves架构的集群模式,当Worker调度出现问题的时候会自动弹性容错,可以将出错的Task调度到其他Worker执行,但存在着Master单点故障的问题。
这里基于zookeeper的Standby Masters(Standby Master with zookeeper)实现高可用。基本原理是通过zookeeper来选举一个Master,其他的Master处于Standby状态。将spark集群连接到同一个Zookeeper实例并启动多个Master,利用zookeeper提供的选举和状态保存功能,使一个Master被选举为active状态,其他Master处于Standby状态。如果现任Master死去,另一个Master会通过选举产生,并恢复到旧的Master状态,然后恢复调度,整个过程可能要1-2分钟。

一、修改配置文件

在Spark Standalone配置的基础上,修改/moudle/spark-3.0.1/conf/spark-env.sh配置文件,主要包括注释SPARK_MASTER_HOST,增加SPARK_DAEMON_JAVA_OPTS配置。

vim /moudle/spark-3.0.1/conf/spark-env.sh
#文件末尾加入以下内容
export JAVA_HOME=/moudle/jdk1.8
export HADOOP_HOME=/moudle/hadoop-3.3.0
export HADOOP_CONF_DIR=/moudle/hadoop-3.3.0/etc/hadoop
#注释SPARK_MASTER_HOST
export SPARK_MASTER_HOST=bigdata1
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_CORES=1
export SPARK_DRIVER_MEMORY=1g
export SPARK_WORKER_MEMORY=1g
export SPARK_MASTER_WEBUI_PORT=8080
#日志聚合
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=4000
-Dspark.history.retainedApplications=3
-Dspark.history.fs.logDirectory=hdfs://bigdata1:9000/sparklog"

#增加下面设置
#增加Spark运行时参数,从而指定Zookeeper的位置
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=bigdata1:2181,bigdata2:2181,bigdata3:2181 -Dspark.deploy.zookeeper.dir=/spark"

二、启动Zookeeper集群

bigdata1、bigdata2、bigdata3三个节点启动Zookeeper

zkServer.sh start

三、启动Hadoope集群

Spark主要用到HDFS

start-all.sh

四、启动Spark集群

bigdata1节点启动Spark集群

start-spark-all.sh

bigdata2节点再启动一个Master节点

start-master.sh

启动日志聚合功能

[root@bigdata1 conf]# start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to /moudle/spark-3.0.1/logs/spark-root-org.apache.spark.deploy.history.HistoryServer-1-bigdata1.out

#通过/moudle/spark-3.0.1/logs/spark-root-org.apache.spark.deploy.history.HistoryServer-1-bigdata1.out可以查看日志聚合网页

cat /moudle/spark-3.0.1/logs/spark-root-org.apache.spark.deploy.history.HistoryServer-1-bigdata1.out
#2020-11-11 11:11:59,943 INFO history.HistoryServer: Bound HistoryServer to 0.0.0.0, and started at http://bigdata1:4000

#发现日志查看网页是http://bigdata1:4000

五、查看进程

查看bigdata1进程

[root@bigdata1 conf]# jps
2272 QuorumPeerMain
3459 DFSZKFailoverController
17523 Worker
3860 ResourceManager
2949 DataNode
4006 NodeManager
18470 HistoryServer
17433 Master
18747 Jps
2797 NameNode
3213 JournalNode

查看bigdata2进程

[root@bigdata2 conf]# jps
2114 DataNode
2036 NameNode
8120 Worker
2569 NodeManager
2219 JournalNode
2347 DFSZKFailoverController
2491 ResourceManager
8204 Master
8780 Jps
1646 QuorumPeerMain

查看bigdata3进程

[root@bigdata3 conf]# jps
13027 Jps
12134 Worker
2567 NodeManager
1946 QuorumPeerMain
2428 JournalNode
2319 DataNode

六、浏览器查看Spark集群

http://192.168.239.131:8081/
bigdata1节点状态为:
Status: ALIVE
Spark3.0.1 高可用分布式集群搭建(HA模式)
http://192.168.239.132:8082/
bigdata2节点状态为:
Status: STANDBY
Spark3.0.1 高可用分布式集群搭建(HA模式)
查看Spark应用程序日志页面
http://192.168.239.131:4000/
Spark3.0.1 高可用分布式集群搭建(HA模式)

测试Master节点故障自动切换

[root@bigdata1 conf]# stop-master.sh
stopping org.apache.spark.deploy.master.Master

停止bigdata1节点的Master后,bigdata1节点页面无法访问。
查看bigdata2节点页面,发现bigdata2节点状态从STANDBY状态切换为ALIVE状态。
Spark3.0.1 高可用分布式集群搭建(HA模式)

七、Spark应用程序实例

高可用模式下涉及到多个Master,应用程序需要知道当前的Master的IP地址和端口,因此需要向SparkContext提交一个Master列表。应用程序会轮寻列表,找到活着的Master,有几个Master在–master参数后面就写几个host:port。

[root@bigdata1 conf]# spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://bigdata1:7077,bigdata2:7077,bigdata3:7077 \
--executor-memory 1G \
--total-executor-cores 1 \
/moudle/spark-3.0.1/examples/jars/spark-examples_2.12-3.0.1.jar \
100

运行结果输出如下:

[root@bigdata1 conf]# spark-submit --class org.apache.spark.examples.SparkPi --master spark://bigdata1:7077,bigdata2:7077,bigdata3:7077 --executor-memory 1G --total-executor-cores 1 /moudle/spark-3.0.1/examples/jars/spark-examples_2.12-3.0.1.jar 100

2020-11-11 12:07:29,138 INFO spark.SparkContext: Running Spark version 3.0.1
2020-11-11 12:07:29,255 INFO resource.ResourceUtils: ==============================================================
2020-11-11 12:07:29,257 INFO resource.ResourceUtils: Resources for spark.driver:

2020-11-11 12:07:29,257 INFO resource.ResourceUtils: ==============================================================
2020-11-11 12:07:30,019 INFO util.Utils: Successfully started service 'sparkDriver' on port 34971.

2020-11-11 12:07:30,246 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-74311b0f-855c-43a8-8090-88596c5b2d19
2020-11-11 12:07:30,307 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MiB

ServerConnector@42a9a63e{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2020-11-11 12:07:30,823 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.

2020-11-11 12:07:30,943 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://bigdata1:4040
2020-11-11 12:07:30,979 INFO spark.SparkContext: Added JAR file:/moudle/spark-3.0.1/examples/jars/spark-examples_2.12-3.0.1.jar at spark://bigdata1:34971/jars/spark-examples_2.12-3.0.1.jar with timestamp 1605067650978
2020-11-11 12:07:31,035 WARN spark.SparkContext: Please ensure that the number of slots available on your executors is limited by the number of cores to task cpus and not another custom resource. If cores is not the limiting resource then dynamic allocation will not work properly!
2020-11-11 12:07:31,592 INFO client.StandaloneAppClient$ClientEndpoint: Connecting to master spark://bigdata1:7077...
2020-11-11 12:07:31,593 INFO client.StandaloneAppClient$ClientEndpoint: Connecting to master spark://bigdata2:7077...
2020-11-11 12:07:31,594 INFO client.StandaloneAppClient$ClientEndpoint: Connecting to master spark://bigdata3:7077...
2020-11-11 12:07:31,759 INFO client.TransportClientFactory: Successfully created connection to bigdata2/192.168.239.132:7077 after 98 ms (0 ms spent in bootstraps)
2020-11-11 12:07:31,762 INFO client.TransportClientFactory: Successfully created connection to bigdata1/192.168.239.131:7077 after 90 ms (0 ms spent in bootstraps)
2020-11-11 12:07:31,780 INFO client.TransportClientFactory: Successfully created connection to bigdata3/192.168.239.133:7077 after 104 ms (0 ms spent in bootstraps)
2020-11-11 12:07:32,168 INFO cluster.StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20201111120732-0003
2020-11-11 12:07:32,188 INFO client.StandaloneAppClient$ClientEndpoint: Executor added: app-20201111120732-0003/0 on worker-20201111104311-192.168.239.132-37607 (192.168.239.132:37607) with 1 core(s)
2020-11-11 12:07:32,193 INFO cluster.StandaloneSchedulerBackend: Granted executor ID app-20201111120732-0003/0 on hostPort 192.168.239.132:37607 with 1 core(s), 1024.0 MiB RAM
2020-11-11 12:07:32,253 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 36803.
2020-11-11 12:07:32,255 INFO netty.NettyBlockTransferService: Server created on bigdata1:36803
2020-11-11 12:07:32,282 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2020-11-11 12:07:32,307 INFO client.StandaloneAppClient$ClientEndpoint: Executor updated: app-20201111120732-0003/0 is now RUNNING

2020-11-11 12:07:36,882 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 100 tasks
2020-11-11 12:07:39,581 INFO resource.ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
2020-11-11 12:07:40,922 INFO cluster.CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.239.132:50830) with ID 0
2020-11-11 12:07:41,359 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.239.132:36510 with 413.9 MiB RAM, BlockManagerId(0, 192.168.239.132, 36510, None)
2020-11-11 12:07:41,518 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 192.168.239.132, executor 0, partition 0, PROCESS_LOCAL, 7397 bytes)
2020-11-11 12:07:42,615 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.239.132:36510 (size: 1815.0 B, free: 413.9 MiB)
2020-11-11 12:07:43,978 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 192.168.239.132, executor 0, partition 1, PROCESS_LOCAL, 7397 bytes)
2020-11-11 12:07:43,994 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 2510 ms on 192.168.239.132 (executor 0) (1/100)
2020-11-11 12:07:49,057 INFO scheduler.TaskSetManager: Finished task 99.0 in stage 0.0 (TID 99) in 45 ms on 192.168.239.132 (executor 0) (100/100)
2020-11-11 12:07:49,059 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
2020-11-11 12:07:49,110 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 12.548 s
2020-11-11 12:07:49,124 INFO scheduler.DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
2020-11-11 12:07:49,125 INFO scheduler.TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished
2020-11-11 12:07:49,131 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 13.017405 s
#输出结果如下:
Pi is roughly 3.1415607141560713

2020-11-11 12:07:49,174 INFO server.AbstractConnector: Stopped Spark@42a9a63e{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2020-11-11 12:07:49,177 INFO ui.SparkUI: Stopped Spark web UI at http://bigdata1:4040
2020-11-11 12:07:49,186 INFO cluster.StandaloneSchedulerBackend: Shutting down all executors

2020-11-11 12:07:49,762 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-63cbd8f4-e267-4d7a-99fe-12d519afd369
2020-11-11 12:07:49,767 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-2c9874ac-4195-4f9b-acba-e2ac9f139c6a

至此,完成Spark3.0.1 高可用分布式集群搭建(HA模式)。

本文地址:https://blog.csdn.net/zhengzaifeidelushang/article/details/109589327