欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Exception in thread "main" org.apache.spark.SparkException: Task not serializable

程序员文章站 2022-07-15 12:55:00
...

运行一个spark应用时报如下错误:

Exception in thread "main" org.apache.spark.SparkException: Task not serializable
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
    at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:2287)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$flatMapValues$1.apply(PairRDDFunctions.scala:769)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$flatMapValues$1.apply(PairRDDFunctions.scala:768)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
    at org.apache.spark.rdd.PairRDDFunctions.flatMapValues(PairRDDFunctions.scala:768)
    at com.my.spark.project.userbehaviour.UserClickTrackETL$.main(UserClickTrackETL.scala:72)
    at com.my.spark.project.userbehaviour.UserClickTrackETL.main(UserClickTrackETL.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.NotSerializableException: org.apache.spark.SparkContext
Serialization stack:
    - object not serializable (class: org.apache.spark.SparkContext, value: org.apache.spark.SparkContext@1192b58e)
    - field (class: com.my.spark.project.userbehaviour.UserClickTrackETL$$anonfun$4, name: sc$1, type: class org.apache.spark.SparkContext)
	- object (class com.my.spark.project.userbehaviour.UserClickTrackETL$$anonfun$4, <function1>)
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
    ... 20 more

异常中提示是org.apache.spark.SparkContext没有序列化,
出错的代码片段

    val sessionRDD = parsedLogRDD.groupBy((rowLog: TrackerLog) => rowLog.getCookie.toString, partitioner).flatMapValues {
      case iter =>
        //对每一个cookie下的日志按照日志的时间升序排序
        val sortedParseLogs = iter.toArray.sortBy(_.getLogServerTime.toString)
        //对每一个cookie下的日志进行遍历,按30分钟切割会话
        val sessionParsedLogsResult = cutSession(sortedParseLogs)
        //获取hdfs中的cookie_label.txt
        val cookieLabelRDD = sc.textFile(s"${trackerDataPath}/cookie_label.txt");
        val cLArray = cookieLabelRDD.map(line => {
          val labels = line.split("\\|")
          labels(0) -> labels(1)
        }).collect()

其中变量sc是SparkContext的实例,它是运行在Driver端的,而flatMapValues是运行在Executor端的,所以报错,因此将sc的使用移出flatMapValues即可。
总结可得,我们需要弄清一个spark应用那些代码段是运行在Driver端,哪些是运行在Executor端,一般RDD的Api是运行在Exectuor端的。

相关标签: spark rdd