欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Spark执行任务卡死:SparkException: Failed to connect to driver! unable to launch application master

程序员文章站 2022-04-29 18:53:37
...

1.背景

提交一个spark任务到集群,发现,一直处于卡死状态
Spark执行任务卡死:SparkException: Failed to connect to driver! unable to launch application master
然后查看yarn界面
Spark执行任务卡死:SparkException: Failed to connect to driver! unable to launch application master
可以看到资源是充足的,
资源不充足报错:Spark学习-SparkSQL–05-SparkSQL CLI Application report for application_15_0022 (state: ACCEPTED)
Spark执行任务卡死:SparkException: Failed to connect to driver! unable to launch application master
点击查看日志
Spark执行任务卡死:SparkException: Failed to connect to driver! unable to launch application master
报错

19/11/15 16:03:37 ERROR yarn.ApplicationMaster: Uncaught exception: 
org.apache.spark.SparkException: Failed to connect to driver!
	at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:629)
	at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:489)
	at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:303)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:241)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:241)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:241)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:782)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
	at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:781)
	at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:240)
	at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:806)
	at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:836)
	at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
19/11/15 16:03:37 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)
19/11/15 16:03:37 INFO util.ShutdownHookManager: Shutdown hook called

在最终作业运行失败

19/11/14 09:23:52 INFO yarn.Client: 
	 client token: N/A
	 diagnostics: Application application_1573720456567_0048 failed 10 times due to AM Container for appattempt_1573720456567_0048_000010 exited with  exitCode: 13
For more detailed output, check application tracking page:http://xxx:8088/proxy/application_1573720456567_0048/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1573720456567_0048_10_000001
Exit code: 13
Stack trace: ExitCodeException exitCode=13: 
	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
	at java.lang.Thread.run(Thread.java:748)


Container exited with a non-zero exit code 13
Failing this attempt. Failing the application.
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: root.default
	 start time: 1573804911750
	 final status: FAILED
	 tracking URL: http://xxx:8088/cluster/app/application_1573720456567_0048
	 user: deploy
19/11/14 09:23:52 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
	at com.dtwave.cheetah.node.spark.structured.streaming.runtime.TopoSparkSubmitter$.submit(TopoSparkSubmitter.scala:88)
	at com.dtwave.cheetah.node.spark.structured.streaming.StructureStreamingExecutor$.main(StructureStreamingExecutor.scala:49)
	at com.dtwave.cheetah.node.spark.structured.streaming.StructureStreamingExecutor.main(StructureStreamingExecutor.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:892)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
19/11/14 09:23:52 INFO server.AbstractConnector: Stopped aaa@qq.com{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
19/11/14 09:23:52 INFO ui.SparkUI: Stopped Spark web UI at http://stream_test_nb:4040
19/11/14 09:23:53 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/11/14 09:23:53 INFO storage.BlockManager: BlockManager stopped
19/11/14 09:23:53 WARN metrics.MetricsSystem: Stopping a MetricsSystem that is not running
19/11/14 09:23:53 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/11/14 09:23:53 INFO spark.SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
	at com.dtwave.cheetah.node.spark.structured.streaming.runtime.TopoSparkSubmitter$.submit(TopoSparkSubmitter.scala:88)
	at com.dtwave.cheetah.node.spark.structured.streaming.StructureStreamingExecutor$.main(StructureStreamingExecutor.scala:49)
	at com.dtwave.cheetah.node.spark.structured.streaming.StructureStreamingExecutor.main(StructureStreamingExecutor.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:892)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
作业运行失败(Failed)

解决方法参考:请点击