Spark异常:A master URL must be set in your configuration处理记录
程序员文章站
2022-06-29 16:54:57
问题描述: 项目中一位同事提交了一部分代码,代码分为一个抽象类,里面含有sparkcontent,sparkSession对象;然后又三个子类实例化上述抽象类,这三个子类处理三个任务,最后在同一个Main类,里面调用这个子类的处理任务的方法,进行计算;在本地(local)运行,一切正常,部署到测试服 ......
问题描述:
项目中一位同事提交了一部分代码,代码分为一个抽象类,里面含有sparkcontent,sparkSession对象;然后又三个子类实例化上述抽象类,这三个子类处理三个任务,最后在同一个Main类,里面调用这个子类的处理任务的方法,进行计算;在本地(local)运行,一切正常,部署到测试服务器,会报如下异常:
18/07/03 14:11:58 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, emr-worker-1.cluster-65494, executor 1): java.lang.ExceptionInInitializerError at task.api_monitor.HttpStatusTask$$anonfun$2.apply(HttpStatusTask.scala:91) at task.api_monitor.HttpStatusTask$$anonfun$2.apply(HttpStatusTask.scala:85) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:193) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.SparkException: A master URL must be set in your configuration at org.apache.spark.SparkContext.(SparkContext.scala:376) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2516) at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:918) at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:910) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:910) at task.AbstractApiMonitorTask.(AbstractApiMonitorTask.scala:22) at task.api_monitor.HttpStatusTask$.(HttpStatusTask.scala:18) at task.api_monitor.HttpStatusTask$.(HttpStatusTask.scala) ... 12 more
分析异常发现是由于没有指定Master的URL导致子类不能正常初始化。
解决:查找网上资源,结合自身代码结构发现,在spark运行日志中(运行模式是yarn)会有三个yarn.client出现,说明每个子类任务都会有一个相对应的driver,这个说明每个子类的任务开始都会实例化自身的sparkSession,但是一个spark 应用对应了一个main函数,放在一个driver里,driver里有一个对应的实例(spark context).driver 负责向各个节点分发资源以及数据。那么如果你把创建实例放在了main函数的外面,driver就没法分发了。所以如果这样写在local模式下是可以成功的,在分布式就会报错。(参考来源:https://blog.csdn.net/sinat_33761963/article/details/51723175)因此,改变代码结构把抽象类中的公有的资源,在main函数中创建,顺利解决问题。
总结:出现上述问题,主要是对spark的分布式运行理解的不是很透彻,仍需努力提升!
推荐阅读
-
Spark异常:A master URL must be set in your configuration处理记录
-
IDEA开发SparkSQL报错:org.apache.spark.SparkException: A master URL must be set in your configuration
-
IDEA使用SparkSession读取Json文件报错 A master URL must be set in your configuration
-
Spark程序报错排查:A master URL must be set in your configuration
-
异常解决:A master URL must be set in your configuration
-
org.apache.spark.SparkException: A master URL must be set in your configuration
-
解决A master URL must be set in your configuration错误
-
spark 异常解决:A master URL must be set in your configuration
-
Spark异常:A master URL must be set in your configuration处理记录
-
本地开发Spark,运行JavaSparkPi例子报错:A master URL must be set in your configuration