欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

记Spark提交任务到Yarn错误汇总

程序员文章站 2022-04-01 22:04:47
...

 

 

    我们经常使用Spark on yarn的模式进行开发和任务调度,但是常常会出现各种错误

    本文将这些问题汇总并提出解决:

 

    先贴一个spark提交任务到yarn的脚本:

nohup /data_dev/software/spark-2.2.0-bin-2.6.0-cdh5.14.0/bin/spark-submit \
 --class log_analysis.Ktr_Log \
--master yarn \
--deploy-mode cluster \
--driver-memory 1g \
--executor-memory 2g \
--executor-cores 2 \
/data_dev/software/kettle_log_analysis.jar \
  > /data_dev/software/logs/SparkTest.log 2>&1 &

 

 

1、spark找不到main类:

20/06/29 09:24:21 ERROR yarn.ApplicationMaster: Uncaught exception: 
java.lang.ClassNotFoundException: src/main/scala/log_analysis/Ktr_Log
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at org.apache.spark.deploy.yarn.ApplicationMaster.startUserApplication(ApplicationMaster.scala:629)
	at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:394)
	at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:254)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:764)
	at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67)
	at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
	at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
	at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:762)
	at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
20/06/29 09:24:21 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: java.lang.ClassNotFoundException: src/main/scala/log_analysis/Ktr_Log)
20/06/29 09:24:21 INFO util.ShutdownHookManager: Shutdown hook called

解决:

    该问题抛出找不到main类,是因为代码中的master在提交到yarn的时候,没有把.master("local[20]")注释掉。导致提交不成功。解决如下:

lazy val spark = SparkSession.builder()
//    .master("local[20]")
    .appName("ReadKettleKjb")
    .config("spark.debug.maxToStringFields", "400")
    .getOrCreate()





 

2、找不到读取的文件:

20/06/29 09:31:12 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.ExceptionInInitializerError
java.lang.ExceptionInInitializerError
	at log_analysis.Ktr_Log.main(Ktr_Log.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635)
Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://192.168.21.30:8020/data_dev/software/kettle_dev_1.log
	at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
	at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:194)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
	at org.apache.spark.rdd.ZippedWithIndexRDD.<init>(ZippedWithIndexRDD.scala:44)
	at org.apache.spark.rdd.RDD$$anonfun$zipWithIndex$1.apply(RDD.scala:1294)
	at org.apache.spark.rdd.RDD$$anonfun$zipWithIndex$1.apply(RDD.scala:1294)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
	at org.apache.spark.rdd.RDD.zipWithIndex(RDD.scala:1293)
	at utils.DataFrameUtils$.withLineNOColumn(DataFrameUtils.scala:24)
	at log_analysis.Ktr_Log$.<init>(Ktr_Log.scala:60)
	at log_analysis.Ktr_Log$.<clinit>(Ktr_Log.scala)
	... 6 more

 

解决:

指定服务器文件的时候,要指定绝对路径。

如果是读取hdfs文件,需要检查服务器ip和端口号、文件路径等。

 

 

3、找不到对应的jar包依赖:

java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark.apache.org/third-party-projects.html
	at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:549)
	at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86)
	at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:301)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:156)
	at log_analysis.ReadKettleKjb$.analysis_kjb(ReadKettleKjb.scala:117)
	at log_analysis.ReadKettleKjb$.get_kjb_result(ReadKettleKjb.scala:41)
	at log_analysis.Ktr_Log$.main(Ktr_Log.scala:77)
	at log_analysis.Ktr_Log.main(Ktr_Log.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635)
Caused by: java.lang.ClassNotFoundException: com.databricks.spark.xml.DefaultSource
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$21$$anonfun$apply$12.apply(DataSource.scala:533)
	at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$21$$anonfun$apply$12.apply(DataSource.scala:533)
	at scala.util.Try$.apply(Try.scala:192)
	at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$21.apply(DataSource.scala:533)
	at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$21.apply(DataSource.scala:533)
	at scala.util.Try.orElse(Try.scala:84)
	at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:533)
	... 14 more

由于我在代码中指定了读取的文件方式,故需要将依赖的jar包同时在idea打包过程中加入。

val df = spark.sqlContext
      .read
      .format("com.databricks.spark.xml")
      .option("rowTag", "book")
      .schema(customSchema)
      .load("data/books_complex.xml")

 

4、没有指定master地址

20/06/29 09:54:40 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 10.0 (TID 14, node01, executor 1): java.lang.ExceptionInInitializerError
	at log_analysis.ReadKettleKjb$$anonfun$get_kjb_result$1.apply(ReadKettleKjb.scala:73)
	at log_analysis.ReadKettleKjb$$anonfun$get_kjb_result$1.apply(ReadKettleKjb.scala:64)
	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
	at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:918)
	at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:918)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:108)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: A master URL must be set in your configuration
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:376)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901)
	at log_analysis.ReadKettleKjb$.spark$lzycompute(ReadKettleKjb.scala:27)
	at log_analysis.ReadKettleKjb$.spark(ReadKettleKjb.scala:23)
	at log_analysis.ReadKettleKjb$.context$lzycompute(ReadKettleKjb.scala:29)
	at log_analysis.ReadKettleKjb$.context(ReadKettleKjb.scala:29)
	at log_analysis.ReadKettleKjb$.<init>(ReadKettleKjb.scala:30)
	at log_analysis.ReadKettleKjb$.<clinit>(ReadKettleKjb.scala)

   

报错说在配置中必须设置一个master URL.但是我明明在提交应用的时候设置了–master呀,为什么说我没有,于是非常惊讶,怀疑是不是计算机自己疯掉了。

其实这是初学者很容易犯的错误,原因在于没有真正理解spark分布式或伪分布式的运行原理。出错的小伙伴往往把创建spark实例,或者sc.textFile读取数据等放在了main函数的外面。

如果检查代码中没有指定master,需要特别注意的是,创建sparkSession对象必须在main方法里面,否则driver无法分发。

在伪分布式中,一个spark 应用对应了一个main函数,放在一个driver里,driver里有一个对应的实例(spark context).driver 负责向各个节点分发资源以及数据。那么如果你把创建实例放在了main函数的外面,driver就没法分发了。所以如果这样写在local模式下是可以成功的,在分布式就会报错。
 

正确姿势如下:

def main(args: Array[String]): Unit = {
    val spark = SparkSession
      .builder()
      .config("spark.sql.shuffle.partitions", 300)
      //    .master("yarn-cluster")

      //      .master("local[10]")
      .appName("ktr")
      //      .enableHiveSupport()
      .getOrCreate()

    val context = spark.sparkContext
    context.setLogLevel("WARN")


    xxx...(业务代码)
}

 

4、scala环境没有生效

20/06/29 10:15:20 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 10.0 (TID 14, node01, executor 2): java.lang.ExceptionInInitializerError
	at log_analysis.ReadKettleKjb$$anonfun$get_kjb_result$1.apply(ReadKettleKjb.scala:74)
	at log_analysis.ReadKettleKjb$$anonfun$get_kjb_result$1.apply(ReadKettleKjb.scala:65)
	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
	at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:918)
	at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:918)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:108)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Library directory '/data_dev/software/hadoop-2.6.0-cdh5.14.0/hadoopDatas/tempDatas/nm-local-dir/usercache/root/appcache/application_1593315523377_0017/container_1593315523377_0017_01_000003/assembly/target/scala-2.11/jars' does not exist; make sure Spark is built.
	at org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:248)
	at org.apache.spark.launcher.CommandBuilderUtils.findJarsDir(CommandBuilderUtils.java:347)
	at org.apache.spark.launcher.YarnCommandBuilderUtils$.findJarsDir(YarnCommandBuilderUtils.scala:38)
	at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:526)
	at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:814)
	at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:169)
	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:173)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901)
	at log_analysis.ReadKettleKjb$.spark$lzycompute(ReadKettleKjb.scala:28)
	at log_analysis.ReadKettleKjb$.spark(ReadKettleKjb.scala:23)
	at log_analysis.ReadKettleKjb$.context$lzycompute(ReadKettleKjb.scala:30)
	at log_analysis.ReadKettleKjb$.context(ReadKettleKjb.scala:30)
	at log_analysis.ReadKettleKjb$.<init>(ReadKettleKjb.scala:31)
	at log_analysis.ReadKettleKjb$.<clinit>(ReadKettleKjb.scala)
	... 14 more

    该错误提示没有scala相关的jar包,此时应该检查服务器上是否安装了scala的sdk:

[aaa@qq.com XXF_EDW_SS_1]# scala -version
Scala code runner version 2.11.12 -- Copyright 2002-2017, LAMP/EPFL

    显示以上版本说明scala安装成功,且环境变量配置生效。

   如果没有安装,在idea对代码打包的时候,应该将scala相关的jar包继承到目标jar包中,如下:

记Spark提交任务到Yarn错误汇总

 

 

5、Exception from container-launch.Exit code: 1 Stack trace: ExitCodeException exitCode=1:

Yarn错误日志如下:

yarn主界面状态显示failed,显示如下:

Diagnostics: Exception from container-launch.
Container id: container_1574829788169_0011_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:604)
    at org.apache.hadoop.util.Shell.run(Shell.java:507)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)


Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.





但是点进去job日志详情,查看状态为successed。

查看很多文档都不能解决这个问题,最后查看代码,发现:

SparkConf sparkConf = new SparkConf()
//                .setMaster("local[2]")
                .setAppName("javaSparkWordcount");


将setmaster(“local[2]”)注释掉,完美解决。