使用idea编写spark程序并提交到yarn集群例子
程序员文章站
2024-02-22 12:07:46
...
- 需提前安装jdk、scala
1.创建新项目
2.增加Maven
3.修改pom.xml文件
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>hzmt-demo</groupId>
<artifactId>hzmt-demo</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<scala.version>2.11.8</scala.version>
<scala.binary.version>2.11</scala.binary.version>
<spark.version>2.0.0</spark.version>
</properties>
<repositories>
<repository>
<id>nexus-aliyun</id>
<name>Nexus aliyun</name>
<url>http://maven.aliyun.com/nexus/content/groups/public</url>
</repository>
</repositories>
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.3</version>
<configuration>
<classifier>dist</classifier>
<appendAssemblyId>true</appendAssemblyId>
<descriptorRefs>
<descriptor>jar-with-dependencies</descriptor>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
4.编写WordCount.scala
import org.apache.spark.{SparkConf, SparkContext}
/**
* Created by drguo on 2019/3/4.
*/
object WordCount {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("wordcount")
val sc = new SparkContext(conf)
// 不写/install.log前的/,则默认为提交任务的用户的家目录下 hdfs://master.hadoop:8020/user/hdfs/install.log
val wc = sc.textFile("/install.log").flatMap(line => line.split(" ")).map((_,1)).reduceByKey(_+_)
wc.take(10).foreach(println)
}
}
5.Add JAR
(不用选main class)
6.将集群中有的jar包去掉
7.1生成jar包方式一
点击Build后生成jar包如下:
7.2生成jar包方式二
不打入集群中有的jar包
8.上传到服务器并提交到yarn-cluster执行
[aaa@qq.com ~]$spark-submit --class WordCount --master yarn --deploy-mode cluster --driver-memory 500m --executor-memory 500m --executor-cores 1 --queue default hzmt-demo.jar
19/03/03 21:48:35 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/03/03 21:48:38 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
19/03/03 21:48:38 INFO RMProxy: Connecting to ResourceManager at master.hadoop/192.168.10.101:8050
19/03/03 21:48:38 INFO Client: Requesting a new application from cluster with 3 NodeManagers
19/03/03 21:48:38 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (1024 MB per container)
19/03/03 21:48:38 INFO Client: Will allocate AM container, with 884 MB memory including 384 MB overhead
19/03/03 21:48:38 INFO Client: Setting up container launch context for our AM
19/03/03 21:48:38 INFO Client: Setting up the launch environment for our AM container
19/03/03 21:48:38 INFO Client: Preparing resources for our AM container
19/03/03 21:48:38 INFO Client: Use hdfs cache file as spark.yarn.archive for HDP, hdfsCacheFile:hdfs:///hdp/apps/2.5.3.0-37/spark2/spark2-hdp-yarn-archive.tar.gz
19/03/03 21:48:38 INFO Client: Source and destination file systems are the same. Not copying hdfs:/hdp/apps/2.5.3.0-37/spark2/spark2-hdp-yarn-archive.tar.gz
19/03/03 21:48:38 INFO Client: Uploading resource file:/home/hdfs/hzmt-demo.jar -> hdfs://master.hadoop:8020/user/hdfs/.sparkStaging/application_1551244095161_0012/hzmt-demo.jar
19/03/03 21:48:39 INFO Client: Uploading resource file:/tmp/spark-588f171c-11f4-45ea-a78b-465cb8c862e0/__spark_conf__7552022607007331155.zip -> hdfs://master.hadoop:8020/user/hdfs/.sparkStaging/application_1551244095161_0012/__spark_conf__.zip
19/03/03 21:48:39 INFO SecurityManager: Changing view acls to: hdfs
19/03/03 21:48:39 INFO SecurityManager: Changing modify acls to: hdfs
19/03/03 21:48:39 INFO SecurityManager: Changing view acls groups to:
19/03/03 21:48:39 INFO SecurityManager: Changing modify acls groups to:
19/03/03 21:48:39 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdfs); groups with view permissions: Set(); users with modify permissions: Set(hdfs); groups with modify permissions: Set()
19/03/03 21:48:39 INFO Client: Submitting application application_1551244095161_0012 to ResourceManager
19/03/03 21:48:39 INFO YarnClientImpl: Submitted application application_1551244095161_0012
19/03/03 21:48:40 INFO Client: Application report for application_1551244095161_0012 (state: ACCEPTED)
19/03/03 21:48:40 INFO Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1551678519752
final status: UNDEFINED
tracking URL: http://Master.Hadoop:8088/proxy/application_1551244095161_0012/
user: hdfs
19/03/03 21:48:41 INFO Client: Application report for application_1551244095161_0012 (state: ACCEPTED)
19/03/03 21:48:42 INFO Client: Application report for application_1551244095161_0012 (state: ACCEPTED)
19/03/03 21:48:43 INFO Client: Application report for application_1551244095161_0012 (state: ACCEPTED)
19/03/03 21:48:44 INFO Client: Application report for application_1551244095161_0012 (state: ACCEPTED)
19/03/03 21:48:45 INFO Client: Application report for application_1551244095161_0012 (state: ACCEPTED)
19/03/03 21:48:46 INFO Client: Application report for application_1551244095161_0012 (state: ACCEPTED)
19/03/03 21:48:47 INFO Client: Application report for application_1551244095161_0012 (state: ACCEPTED)
19/03/03 21:48:48 INFO Client: Application report for application_1551244095161_0012 (state: ACCEPTED)
19/03/03 21:48:49 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:48:49 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 192.168.10.103
ApplicationMaster RPC port: 0
queue: default
start time: 1551678519752
final status: UNDEFINED
tracking URL: http://Master.Hadoop:8088/proxy/application_1551244095161_0012/
user: hdfs
19/03/03 21:48:50 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:48:54 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:48:55 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:48:56 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:48:57 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:48:58 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:48:59 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:49:00 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:49:01 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:49:02 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:49:03 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:49:04 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:49:05 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:49:06 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:49:07 INFO Client: Application report for application_1551244095161_0012 (state: FINISHED)
19/03/03 21:49:07 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 192.168.10.103
ApplicationMaster RPC port: 0
queue: default
start time: 1551678519752
final status: SUCCEEDED
tracking URL: http://Master.Hadoop:8088/proxy/application_1551244095161_0012/
user: hdfs
19/03/03 21:49:07 INFO ShutdownHookManager: Shutdown hook called
19/03/03 21:49:07 INFO ShutdownHookManager: Deleting directory /tmp/spark-588f171c-11f4-45ea-a78b-465cb8c862e0
9.查看执行结果
因为代码将结果打印到控制台(wc.take(10).foreach(println)),但使用yarn-cluster模式计算,结果不会输出在控制台,而是写在了Hadoop集群的日志中。此外,当控制台输出的日志无法排查错误时,也可通过此查看详细错误日志。
如:Exception in thread “main” org.apache.spark.SparkException: Application application_1551244095161_0010 finished with failed status