Spark —— spark-submit应用提交注意事项
程序员文章站
2022-07-14 12:41:15
...
文章目录
配置加载优先级
我们知道Spark应用程序在提交的时候会加载多个地方的配置信息:
-
通过配置文件conf/spark-defaults.conf
... # Default system properties included when running spark-submit. # This is useful for setting default environmental settings. # Example: # spark.master spark://master:7077 # spark.eventLog.enabled true # spark.eventLog.dir hdfs://namenode:8021/directory # spark.serializer org.apache.spark.serializer.KryoSerializer # spark.driver.memory 5g # spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
-
通过spark-submit命令行参数–conf
bin/spark-submit \ --class com.my.spark.App \ --master yarn \ --deploy-mode cluster \ --driver-memory 2g \ --num-executors 60 \ --executor-cores 2 \ --executor-memory 4g \ --conf "spark.sql.shuffle.partitions=20" \ //指定配置 --conf "spark.dynamicAllocation.enabled=false" \ --conf "spark.sql.autoBroadcastJoinThreshold=20" \ /opt/bigdata.jar
-
通过Spark应用程序代码中的SparkConf/SparkSession类
//通过SparkConf val conf = new SparkConf() .set("hive.metastore.uris", "thrift://master-1:9083") //指定配置 val spark = new SparkContext(conf) //通过SparkSession val spark = SparkSession .builder() .appName("name") .config("hive.metastore.uris", "thrift://master-1:9083") //指定配置 .enableHiveSupport() .getOrCreate()
那么以上3中配置加载方式的优先级是怎么样的呢?3种配置优先级从高到底依次为:
- 通过Spark应用程序代码中的SparkConf/SparkSession类
- 通过spark-submit命令行参数–conf
- 通过配置文件conf/spark-defaults.conf
指定多个依赖jar包
我们在提交Spark应用程序的时候,要通过–jars来指定多个依赖包,那么依赖包之间要以逗号","分隔,且逗号前后不能有空格:
./bin/spark-submit \
--class com.my.spark.App \
--master yarn \
--deploy-mode cluster \
--driver-memory 2g \
--num-executors 60 \
--executor-cores 2 \
--executor-memory 4g \
--jars /opt/dependencies/fastjson-1.2.62.jar,/opt/dependencies/hbase-spark-1.2.0-cdh5.16.2.jar,/opt/dependencies/elasticsearch-spark-20_2.11-7.1.0.jar \
--conf "spark.sql.shuffle.partitions=20" \
/opt/jar/bigdata.jar
上一篇: Gdb远程调试Linux内核遇到的Bug
下一篇: 拒绝不了打扰