spark集群模式调试以及远程配置
最近学习spark,在本地模式跑完程序,想再去集群上面测试,但是发现一直报下面错误:
java.lang.NoSuchMethodError: scala.runtime.ObjectRef.create(Ljava/lang/Object;)Lscala/runtime/ObjectRef;
at CF$$anonfun$3.apply(CF.scala:33)
at CF$$anonfun$3.apply(CF.scala:24)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
18/08/28 23:48:39 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-0,5,main]
java.lang.NoSuchMethodError: scala.runtime.ObjectRef.create(Ljava/lang/Object;)Lscala/runtime/ObjectRef;
at CF$$anonfun$3.apply(CF.scala:33)
at CF$$anonfun$3.apply(CF.scala:24)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
经过大神的提醒,终于找到问题,发现是scala和spark版本不对应,本人scala版本2.11.4,没有换spark之前版本是1.6.3,后续查看官网的说明,spark1.6.3不适用于2.11.x版本,于是把spark版本替换到2.2.0版本,重新在集群上提交代码,没有发现任何错误。这个版本还是需要注意一下
另外,我们把spark打好包上传到集群上后,需要用spark-submit方式提交到集群,或者是用run脚本执行,其实原理都是一样的,只不过是封装了一下而已。
IDEA编程spark,打包远程上传集群,进行集群模式调试
接下来说下spark在win或mac宿主机上打包远程上传
一、配置远程主机,找到idea的tools下面的deployment,然后选择configuration
接下来点击+号,选择sftp,输入远程连接的名称,这里我输入master
接下来配置主机和上传包的一级路径,Root Path 是一级路径,就是说,我们上传包以后,这个包会放在哪里,其实这里面还有个二级路径,我一般只设置一级路径设置到自己想要的地方,就不设置二级路径
二级路径我默认设置为 /
最后我们点击apply 然后ok,这里既然打包的话,就不用自动上传了,手动上传吧。下面会有介绍,我们然后打开上传的小控制台,如下,点击后会在最右侧出现下面的第二张
最后我们看下怎么手动上传:
接下来说下打包:我们配置pom的时候没有配置build,会发现类打不进去,我们按照下面操作,把依赖加上, 重新编译打包即可
这里选择自己的类
然后直接ok,我一般是右上角有个build,点击一下,然后maven install
或者可以:
最后,我贴一下我自己的spark本地idea用maven构建工程的pom文件
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.scalaTest</groupId>
<artifactId>scalaTest</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<spark.version>2.2.0</spark.version>
<scala.version>2.11</scala.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<archive>
<manifest>
<mainClass>CF</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>
</plugins>
</build>
</project>
上一篇: springmvc入门案例
下一篇: 使用scala编程九九乘法表