欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

使用idea编写spark程序并提交到yarn集群例子

程序员文章站 2024-02-22 12:07:46
...
  • 需提前安装jdk、scala

1.创建新项目
使用idea编写spark程序并提交到yarn集群例子
使用idea编写spark程序并提交到yarn集群例子


2.增加Maven
使用idea编写spark程序并提交到yarn集群例子
使用idea编写spark程序并提交到yarn集群例子


3.修改pom.xml文件

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>hzmt-demo</groupId>
    <artifactId>hzmt-demo</artifactId>
    <version>1.0-SNAPSHOT</version>
    <properties>
        <scala.version>2.11.8</scala.version>
        <scala.binary.version>2.11</scala.binary.version>
        <spark.version>2.0.0</spark.version>
    </properties>
    <repositories>
        <repository>
            <id>nexus-aliyun</id>
            <name>Nexus aliyun</name>
            <url>http://maven.aliyun.com/nexus/content/groups/public</url>
        </repository>
    </repositories>
    <dependencies>
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11 -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.11</artifactId>
            <version>${spark.version}</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>2.3</version>
                <configuration>
                    <classifier>dist</classifier>
                    <appendAssemblyId>true</appendAssemblyId>
                    <descriptorRefs>
                        <descriptor>jar-with-dependencies</descriptor>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
    
</project>


4.编写WordCount.scala

import org.apache.spark.{SparkConf, SparkContext}

/**
  * Created by drguo on 2019/3/4.
  */
object WordCount {

  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("wordcount")
    val sc = new SparkContext(conf)
    // 不写/install.log前的/,则默认为提交任务的用户的家目录下 hdfs://master.hadoop:8020/user/hdfs/install.log
    val wc = sc.textFile("/install.log").flatMap(line => line.split(" ")).map((_,1)).reduceByKey(_+_)
    wc.take(10).foreach(println)
  }
}


5.Add JAR
使用idea编写spark程序并提交到yarn集群例子
使用idea编写spark程序并提交到yarn集群例子
(不用选main class)


6.将集群中有的jar包去掉
使用idea编写spark程序并提交到yarn集群例子


7.1生成jar包方式一
使用idea编写spark程序并提交到yarn集群例子
使用idea编写spark程序并提交到yarn集群例子
点击Build后生成jar包如下:
使用idea编写spark程序并提交到yarn集群例子


7.2生成jar包方式二
使用idea编写spark程序并提交到yarn集群例子
使用idea编写spark程序并提交到yarn集群例子
使用idea编写spark程序并提交到yarn集群例子
不打入集群中有的jar包
使用idea编写spark程序并提交到yarn集群例子


8.上传到服务器并提交到yarn-cluster执行
[aaa@qq.com ~]$spark-submit --class WordCount --master yarn --deploy-mode cluster --driver-memory 500m --executor-memory 500m --executor-cores 1 --queue default hzmt-demo.jar

19/03/03 21:48:35 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/03/03 21:48:38 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
19/03/03 21:48:38 INFO RMProxy: Connecting to ResourceManager at master.hadoop/192.168.10.101:8050
19/03/03 21:48:38 INFO Client: Requesting a new application from cluster with 3 NodeManagers
19/03/03 21:48:38 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (1024 MB per container)
19/03/03 21:48:38 INFO Client: Will allocate AM container, with 884 MB memory including 384 MB overhead
19/03/03 21:48:38 INFO Client: Setting up container launch context for our AM
19/03/03 21:48:38 INFO Client: Setting up the launch environment for our AM container
19/03/03 21:48:38 INFO Client: Preparing resources for our AM container
19/03/03 21:48:38 INFO Client: Use hdfs cache file as spark.yarn.archive for HDP, hdfsCacheFile:hdfs:///hdp/apps/2.5.3.0-37/spark2/spark2-hdp-yarn-archive.tar.gz
19/03/03 21:48:38 INFO Client: Source and destination file systems are the same. Not copying hdfs:/hdp/apps/2.5.3.0-37/spark2/spark2-hdp-yarn-archive.tar.gz
19/03/03 21:48:38 INFO Client: Uploading resource file:/home/hdfs/hzmt-demo.jar -> hdfs://master.hadoop:8020/user/hdfs/.sparkStaging/application_1551244095161_0012/hzmt-demo.jar
19/03/03 21:48:39 INFO Client: Uploading resource file:/tmp/spark-588f171c-11f4-45ea-a78b-465cb8c862e0/__spark_conf__7552022607007331155.zip -> hdfs://master.hadoop:8020/user/hdfs/.sparkStaging/application_1551244095161_0012/__spark_conf__.zip
19/03/03 21:48:39 INFO SecurityManager: Changing view acls to: hdfs
19/03/03 21:48:39 INFO SecurityManager: Changing modify acls to: hdfs
19/03/03 21:48:39 INFO SecurityManager: Changing view acls groups to: 
19/03/03 21:48:39 INFO SecurityManager: Changing modify acls groups to: 
19/03/03 21:48:39 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hdfs); groups with view permissions: Set(); users  with modify permissions: Set(hdfs); groups with modify permissions: Set()
19/03/03 21:48:39 INFO Client: Submitting application application_1551244095161_0012 to ResourceManager
19/03/03 21:48:39 INFO YarnClientImpl: Submitted application application_1551244095161_0012
19/03/03 21:48:40 INFO Client: Application report for application_1551244095161_0012 (state: ACCEPTED)
19/03/03 21:48:40 INFO Client: 
	 client token: N/A
	 diagnostics: AM container is launched, waiting for AM container to Register with RM
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1551678519752
	 final status: UNDEFINED
	 tracking URL: http://Master.Hadoop:8088/proxy/application_1551244095161_0012/
	 user: hdfs
19/03/03 21:48:41 INFO Client: Application report for application_1551244095161_0012 (state: ACCEPTED)
19/03/03 21:48:42 INFO Client: Application report for application_1551244095161_0012 (state: ACCEPTED)
19/03/03 21:48:43 INFO Client: Application report for application_1551244095161_0012 (state: ACCEPTED)
19/03/03 21:48:44 INFO Client: Application report for application_1551244095161_0012 (state: ACCEPTED)
19/03/03 21:48:45 INFO Client: Application report for application_1551244095161_0012 (state: ACCEPTED)
19/03/03 21:48:46 INFO Client: Application report for application_1551244095161_0012 (state: ACCEPTED)
19/03/03 21:48:47 INFO Client: Application report for application_1551244095161_0012 (state: ACCEPTED)
19/03/03 21:48:48 INFO Client: Application report for application_1551244095161_0012 (state: ACCEPTED)
19/03/03 21:48:49 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:48:49 INFO Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: 192.168.10.103
	 ApplicationMaster RPC port: 0
	 queue: default
	 start time: 1551678519752
	 final status: UNDEFINED
	 tracking URL: http://Master.Hadoop:8088/proxy/application_1551244095161_0012/
	 user: hdfs
19/03/03 21:48:50 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:48:54 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:48:55 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:48:56 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:48:57 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:48:58 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:48:59 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:49:00 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:49:01 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:49:02 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:49:03 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:49:04 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:49:05 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:49:06 INFO Client: Application report for application_1551244095161_0012 (state: RUNNING)
19/03/03 21:49:07 INFO Client: Application report for application_1551244095161_0012 (state: FINISHED)
19/03/03 21:49:07 INFO Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: 192.168.10.103
	 ApplicationMaster RPC port: 0
	 queue: default
	 start time: 1551678519752
	 final status: SUCCEEDED
	 tracking URL: http://Master.Hadoop:8088/proxy/application_1551244095161_0012/
	 user: hdfs
19/03/03 21:49:07 INFO ShutdownHookManager: Shutdown hook called
19/03/03 21:49:07 INFO ShutdownHookManager: Deleting directory /tmp/spark-588f171c-11f4-45ea-a78b-465cb8c862e0


9.查看执行结果
因为代码将结果打印到控制台(wc.take(10).foreach(println)),但使用yarn-cluster模式计算,结果不会输出在控制台,而是写在了Hadoop集群的日志中。此外,当控制台输出的日志无法排查错误时,也可通过此查看详细错误日志。
如:Exception in thread “main” org.apache.spark.SparkException: Application application_1551244095161_0010 finished with failed status
使用idea编写spark程序并提交到yarn集群例子
使用idea编写spark程序并提交到yarn集群例子
使用idea编写spark程序并提交到yarn集群例子