记Spark提交任务到Yarn错误汇总
我们经常使用Spark on yarn的模式进行开发和任务调度,但是常常会出现各种错误。
本文将这些问题汇总并提出解决:
先贴一个spark提交任务到yarn的脚本:
nohup /data_dev/software/spark-2.2.0-bin-2.6.0-cdh5.14.0/bin/spark-submit \
--class log_analysis.Ktr_Log \
--master yarn \
--deploy-mode cluster \
--driver-memory 1g \
--executor-memory 2g \
--executor-cores 2 \
/data_dev/software/kettle_log_analysis.jar \
> /data_dev/software/logs/SparkTest.log 2>&1 &
1、spark找不到main类:
20/06/29 09:24:21 ERROR yarn.ApplicationMaster: Uncaught exception:
java.lang.ClassNotFoundException: src/main/scala/log_analysis/Ktr_Log
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.deploy.yarn.ApplicationMaster.startUserApplication(ApplicationMaster.scala:629)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:394)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:254)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:764)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:762)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
20/06/29 09:24:21 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: java.lang.ClassNotFoundException: src/main/scala/log_analysis/Ktr_Log)
20/06/29 09:24:21 INFO util.ShutdownHookManager: Shutdown hook called
解决:
该问题抛出找不到main类,是因为代码中的master在提交到yarn的时候,没有把.master("local[20]")注释掉。导致提交不成功。解决如下:
lazy val spark = SparkSession.builder()
// .master("local[20]")
.appName("ReadKettleKjb")
.config("spark.debug.maxToStringFields", "400")
.getOrCreate()
2、找不到读取的文件:
20/06/29 09:31:12 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.ExceptionInInitializerError
java.lang.ExceptionInInitializerError
at log_analysis.Ktr_Log.main(Ktr_Log.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635)
Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://192.168.21.30:8020/data_dev/software/kettle_dev_1.log
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:194)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.ZippedWithIndexRDD.<init>(ZippedWithIndexRDD.scala:44)
at org.apache.spark.rdd.RDD$$anonfun$zipWithIndex$1.apply(RDD.scala:1294)
at org.apache.spark.rdd.RDD$$anonfun$zipWithIndex$1.apply(RDD.scala:1294)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.zipWithIndex(RDD.scala:1293)
at utils.DataFrameUtils$.withLineNOColumn(DataFrameUtils.scala:24)
at log_analysis.Ktr_Log$.<init>(Ktr_Log.scala:60)
at log_analysis.Ktr_Log$.<clinit>(Ktr_Log.scala)
... 6 more
解决:
指定服务器文件的时候,要指定绝对路径。
如果是读取hdfs文件,需要检查服务器ip和端口号、文件路径等。
3、找不到对应的jar包依赖:
java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:549)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:301)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:156)
at log_analysis.ReadKettleKjb$.analysis_kjb(ReadKettleKjb.scala:117)
at log_analysis.ReadKettleKjb$.get_kjb_result(ReadKettleKjb.scala:41)
at log_analysis.Ktr_Log$.main(Ktr_Log.scala:77)
at log_analysis.Ktr_Log.main(Ktr_Log.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635)
Caused by: java.lang.ClassNotFoundException: com.databricks.spark.xml.DefaultSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$21$$anonfun$apply$12.apply(DataSource.scala:533)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$21$$anonfun$apply$12.apply(DataSource.scala:533)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$21.apply(DataSource.scala:533)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$21.apply(DataSource.scala:533)
at scala.util.Try.orElse(Try.scala:84)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:533)
... 14 more
由于我在代码中指定了读取的文件方式,故需要将依赖的jar包同时在idea打包过程中加入。
val df = spark.sqlContext
.read
.format("com.databricks.spark.xml")
.option("rowTag", "book")
.schema(customSchema)
.load("data/books_complex.xml")
4、没有指定master地址
20/06/29 09:54:40 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 10.0 (TID 14, node01, executor 1): java.lang.ExceptionInInitializerError
at log_analysis.ReadKettleKjb$$anonfun$get_kjb_result$1.apply(ReadKettleKjb.scala:73)
at log_analysis.ReadKettleKjb$$anonfun$get_kjb_result$1.apply(ReadKettleKjb.scala:64)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:918)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:918)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: A master URL must be set in your configuration
at org.apache.spark.SparkContext.<init>(SparkContext.scala:376)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901)
at log_analysis.ReadKettleKjb$.spark$lzycompute(ReadKettleKjb.scala:27)
at log_analysis.ReadKettleKjb$.spark(ReadKettleKjb.scala:23)
at log_analysis.ReadKettleKjb$.context$lzycompute(ReadKettleKjb.scala:29)
at log_analysis.ReadKettleKjb$.context(ReadKettleKjb.scala:29)
at log_analysis.ReadKettleKjb$.<init>(ReadKettleKjb.scala:30)
at log_analysis.ReadKettleKjb$.<clinit>(ReadKettleKjb.scala)
报错说在配置中必须设置一个master URL.但是我明明在提交应用的时候设置了–master呀,为什么说我没有,于是非常惊讶,怀疑是不是计算机自己疯掉了。
其实这是初学者很容易犯的错误,原因在于没有真正理解spark分布式或伪分布式的运行原理。出错的小伙伴往往把创建spark实例,或者sc.textFile读取数据等放在了main函数的外面。
如果检查代码中没有指定master,需要特别注意的是,创建sparkSession对象必须在main方法里面,否则driver无法分发。
在伪分布式中,一个spark 应用对应了一个main函数,放在一个driver里,driver里有一个对应的实例(spark context).driver 负责向各个节点分发资源以及数据。那么如果你把创建实例放在了main函数的外面,driver就没法分发了。所以如果这样写在local模式下是可以成功的,在分布式就会报错。
正确姿势如下:
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder()
.config("spark.sql.shuffle.partitions", 300)
// .master("yarn-cluster")
// .master("local[10]")
.appName("ktr")
// .enableHiveSupport()
.getOrCreate()
val context = spark.sparkContext
context.setLogLevel("WARN")
xxx...(业务代码)
}
4、scala环境没有生效
20/06/29 10:15:20 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 10.0 (TID 14, node01, executor 2): java.lang.ExceptionInInitializerError
at log_analysis.ReadKettleKjb$$anonfun$get_kjb_result$1.apply(ReadKettleKjb.scala:74)
at log_analysis.ReadKettleKjb$$anonfun$get_kjb_result$1.apply(ReadKettleKjb.scala:65)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:918)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:918)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Library directory '/data_dev/software/hadoop-2.6.0-cdh5.14.0/hadoopDatas/tempDatas/nm-local-dir/usercache/root/appcache/application_1593315523377_0017/container_1593315523377_0017_01_000003/assembly/target/scala-2.11/jars' does not exist; make sure Spark is built.
at org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:248)
at org.apache.spark.launcher.CommandBuilderUtils.findJarsDir(CommandBuilderUtils.java:347)
at org.apache.spark.launcher.YarnCommandBuilderUtils$.findJarsDir(YarnCommandBuilderUtils.scala:38)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:526)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:814)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:169)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:173)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901)
at log_analysis.ReadKettleKjb$.spark$lzycompute(ReadKettleKjb.scala:28)
at log_analysis.ReadKettleKjb$.spark(ReadKettleKjb.scala:23)
at log_analysis.ReadKettleKjb$.context$lzycompute(ReadKettleKjb.scala:30)
at log_analysis.ReadKettleKjb$.context(ReadKettleKjb.scala:30)
at log_analysis.ReadKettleKjb$.<init>(ReadKettleKjb.scala:31)
at log_analysis.ReadKettleKjb$.<clinit>(ReadKettleKjb.scala)
... 14 more
该错误提示没有scala相关的jar包,此时应该检查服务器上是否安装了scala的sdk:
[aaa@qq.com XXF_EDW_SS_1]# scala -version
Scala code runner version 2.11.12 -- Copyright 2002-2017, LAMP/EPFL
显示以上版本说明scala安装成功,且环境变量配置生效。
如果没有安装,在idea对代码打包的时候,应该将scala相关的jar包继承到目标jar包中,如下:
5、Exception from container-launch.Exit code: 1 Stack trace: ExitCodeException exitCode=1:
Yarn错误日志如下:
yarn主界面状态显示failed,显示如下:
Diagnostics: Exception from container-launch.
Container id: container_1574829788169_0011_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:604)
at org.apache.hadoop.util.Shell.run(Shell.java:507)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
但是点进去job日志详情,查看状态为successed。
查看很多文档都不能解决这个问题,最后查看代码,发现:
SparkConf sparkConf = new SparkConf()
// .setMaster("local[2]")
.setAppName("javaSparkWordcount");
将setmaster(“local[2]”)注释掉,完美解决。