Exception in thread "main" org.apache.spark.SparkException: Task not serializable
程序员文章站
2022-07-15 12:55:00
...
运行一个spark应用时报如下错误:
Exception in thread "main" org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2287)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$flatMapValues$1.apply(PairRDDFunctions.scala:769)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$flatMapValues$1.apply(PairRDDFunctions.scala:768)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.PairRDDFunctions.flatMapValues(PairRDDFunctions.scala:768)
at com.my.spark.project.userbehaviour.UserClickTrackETL$.main(UserClickTrackETL.scala:72)
at com.my.spark.project.userbehaviour.UserClickTrackETL.main(UserClickTrackETL.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.NotSerializableException: org.apache.spark.SparkContext
Serialization stack:
- object not serializable (class: org.apache.spark.SparkContext, value: org.apache.spark.SparkContext@1192b58e)
- field (class: com.my.spark.project.userbehaviour.UserClickTrackETL$$anonfun$4, name: sc$1, type: class org.apache.spark.SparkContext)
- object (class com.my.spark.project.userbehaviour.UserClickTrackETL$$anonfun$4, <function1>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
... 20 more
异常中提示是org.apache.spark.SparkContext没有序列化,
出错的代码片段
val sessionRDD = parsedLogRDD.groupBy((rowLog: TrackerLog) => rowLog.getCookie.toString, partitioner).flatMapValues {
case iter =>
//对每一个cookie下的日志按照日志的时间升序排序
val sortedParseLogs = iter.toArray.sortBy(_.getLogServerTime.toString)
//对每一个cookie下的日志进行遍历,按30分钟切割会话
val sessionParsedLogsResult = cutSession(sortedParseLogs)
//获取hdfs中的cookie_label.txt
val cookieLabelRDD = sc.textFile(s"${trackerDataPath}/cookie_label.txt");
val cLArray = cookieLabelRDD.map(line => {
val labels = line.split("\\|")
labels(0) -> labels(1)
}).collect()
其中变量sc是SparkContext的实例,它是运行在Driver端的,而flatMapValues是运行在Executor端的,所以报错,因此将sc的使用移出flatMapValues即可。
总结可得,我们需要弄清一个spark应用那些代码段是运行在Driver端,哪些是运行在Executor端,一般RDD的Api是运行在Exectuor端的。
上一篇: Exception in thread “main“ org.apache.spark.SparkException: Task not serializable
下一篇: 在java代码中运行spark任务报异常org.apache.spark.SparkException: Task not serializable
推荐阅读
-
storm报错:Exception in thread "main" java.lang.RuntimeException: InvalidTopologyException(msg:Component: [mybolt] subscribes from non-existent
-
JDBC:Exception in thread “main“ java.sql.SQLException: The server time zone value ‘�й���ʱ�报错
-
归并排序栈溢出异常Exception in thread "main" java.lang.*Error
-
Exception in thread “main“ java.lang.*Error ——Spark栈溢出解决方案
-
org.apache.spark.SparkException: Task not serializable
-
org.apache.spark.SparkException: Task not serializable
-
spark 程序 org.apache.spark.SparkException: Task not serializable
-
spark学习-52-Spark的org.apache.spark.SparkException: Task not serializable
-
org.apache.spark.SparkException: Task not serializable
-
Exception in thread “main“ org.apache.spark.SparkException: Task not serializable