欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Spark —— spark-submit应用提交注意事项

程序员文章站 2022-07-14 12:41:15
...

配置加载优先级

我们知道Spark应用程序在提交的时候会加载多个地方的配置信息:

  1. 通过配置文件conf/spark-defaults.conf

    ...
    # Default system properties included when running spark-submit.
    # This is useful for setting default environmental settings.
    
    # Example:
    # spark.master                     spark://master:7077
    # spark.eventLog.enabled           true
    # spark.eventLog.dir               hdfs://namenode:8021/directory
    # spark.serializer                 org.apache.spark.serializer.KryoSerializer
    # spark.driver.memory              5g
    # spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
    
  2. 通过spark-submit命令行参数–conf

    bin/spark-submit \
    --class com.my.spark.App \
    --master yarn \
    --deploy-mode cluster \
    --driver-memory 2g \
    --num-executors 60 \
    --executor-cores 2 \
    --executor-memory 4g \
    --conf "spark.sql.shuffle.partitions=20" \ //指定配置
    --conf "spark.dynamicAllocation.enabled=false" \
    --conf "spark.sql.autoBroadcastJoinThreshold=20" \
    /opt/bigdata.jar
    
  3. 通过Spark应用程序代码中的SparkConf/SparkSession类

    //通过SparkConf
    val conf = new SparkConf()
                .set("hive.metastore.uris", "thrift://master-1:9083") //指定配置
    val spark = new SparkContext(conf)
    
    //通过SparkSession
    val spark = SparkSession
    	.builder()
    	.appName("name")
    	.config("hive.metastore.uris", "thrift://master-1:9083") //指定配置
    	.enableHiveSupport()
    	.getOrCreate()
    

那么以上3中配置加载方式的优先级是怎么样的呢?3种配置优先级从高到底依次为:

  1. 通过Spark应用程序代码中的SparkConf/SparkSession类
  2. 通过spark-submit命令行参数–conf
  3. 通过配置文件conf/spark-defaults.conf

指定多个依赖jar包

我们在提交Spark应用程序的时候,要通过–jars来指定多个依赖包,那么依赖包之间要以逗号","分隔,且逗号前后不能有空格:

./bin/spark-submit \
--class com.my.spark.App \
--master yarn \
--deploy-mode cluster \
--driver-memory 2g \
--num-executors 60 \
--executor-cores 2 \
--executor-memory 4g \
--jars /opt/dependencies/fastjson-1.2.62.jar,/opt/dependencies/hbase-spark-1.2.0-cdh5.16.2.jar,/opt/dependencies/elasticsearch-spark-20_2.11-7.1.0.jar \
--conf "spark.sql.shuffle.partitions=20" \
/opt/jar/bigdata.jar
相关标签: Spark Spark