Spark执行任务卡死:SparkException: Failed to connect to driver! unable to launch application master
程序员文章站
2022-04-29 18:53:37
...
1.背景
提交一个spark任务到集群,发现,一直处于卡死状态
然后查看yarn界面
可以看到资源是充足的,
资源不充足报错:Spark学习-SparkSQL–05-SparkSQL CLI Application report for application_15_0022 (state: ACCEPTED)
点击查看日志
报错
19/11/15 16:03:37 ERROR yarn.ApplicationMaster: Uncaught exception:
org.apache.spark.SparkException: Failed to connect to driver!
at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:629)
at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:489)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:303)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:241)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:241)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:241)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:782)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:781)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:240)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:806)
at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:836)
at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
19/11/15 16:03:37 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)
19/11/15 16:03:37 INFO util.ShutdownHookManager: Shutdown hook called
在最终作业运行失败
19/11/14 09:23:52 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1573720456567_0048 failed 10 times due to AM Container for appattempt_1573720456567_0048_000010 exited with exitCode: 13
For more detailed output, check application tracking page:http://xxx:8088/proxy/application_1573720456567_0048/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1573720456567_0048_10_000001
Exit code: 13
Stack trace: ExitCodeException exitCode=13:
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 13
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.default
start time: 1573804911750
final status: FAILED
tracking URL: http://xxx:8088/cluster/app/application_1573720456567_0048
user: deploy
19/11/14 09:23:52 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
at com.dtwave.cheetah.node.spark.structured.streaming.runtime.TopoSparkSubmitter$.submit(TopoSparkSubmitter.scala:88)
at com.dtwave.cheetah.node.spark.structured.streaming.StructureStreamingExecutor$.main(StructureStreamingExecutor.scala:49)
at com.dtwave.cheetah.node.spark.structured.streaming.StructureStreamingExecutor.main(StructureStreamingExecutor.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:892)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
19/11/14 09:23:52 INFO server.AbstractConnector: Stopped aaa@qq.com{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
19/11/14 09:23:52 INFO ui.SparkUI: Stopped Spark web UI at http://stream_test_nb:4040
19/11/14 09:23:53 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/11/14 09:23:53 INFO storage.BlockManager: BlockManager stopped
19/11/14 09:23:53 WARN metrics.MetricsSystem: Stopping a MetricsSystem that is not running
19/11/14 09:23:53 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/11/14 09:23:53 INFO spark.SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
at com.dtwave.cheetah.node.spark.structured.streaming.runtime.TopoSparkSubmitter$.submit(TopoSparkSubmitter.scala:88)
at com.dtwave.cheetah.node.spark.structured.streaming.StructureStreamingExecutor$.main(StructureStreamingExecutor.scala:49)
at com.dtwave.cheetah.node.spark.structured.streaming.StructureStreamingExecutor.main(StructureStreamingExecutor.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:892)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
作业运行失败(Failed)
解决方法参考:请点击