hive配置详解

程序员文章站 2022-05-12 07:49:39

...

hive中有许多配置将帮助我们提升性能，其详细配置如下：

１、hive.auto.convert.join 　默认值为true

是否根据输入小表的大小，自动将 Reduce 端的 Common Join 转化为 Map Join，从而加快大表关联小表的 Join 速度。

2、hive.groupby.skewindata 默认值为false

用于决定 group by 操作是否支持倾斜的数据，即将数据进行负载均衡。原理是，在Group by中，对一些比较小的分区进行合并。

3、hive.default.fileformat 默认值为TextFile

Hive 默认的输出文件格式，与创建表时所指定的相同，可选项为 'TextFile' 、 'SequenceFile' 或者 'RCFile'

4、hive.mapred.mode 默认值为nonstrict

Map/Redure 模式，如果设置为 strict，将不允许笛卡尔积

5、hive.exec.reducers.max　默认值为999

用于设置reducer 个数的上限

6、hive.exec.compress.output　默认值为false

决定查询中最后一个 map/reduce job 的输出是否为压缩格式

7、hive.exec.parallel和hive.exec.parallel.thread.number

hive.exec.parallel用于设置job是否并行执行，默认hive.exec.parallel为false

hive.exec.parallel.thread.number 默认值为8，这个是要在hive.exec.parallel=true的情况才起效果，这个是用于设置并行度

8、hive.exec.max.dynamic.partitions 　默认值为1000

所允许的最大的动态分区的个数。可以手动增加分区。

9、hive.exec.max.dynamic.partitions.pernode　默认值为 100
单个 reduce 结点所允许的最大的动态分区的个数

10、hive.exec.default.partition.name
默认的动态分区的名称，当动态分区列为''或者null时，使用此名称：'__HIVE_DEFAULT_PARTITION__'

11、set hive.exec.max.created.files 默认值为100000

这个是用于设置文件个数，当文件个数超过默认值时，程序会报如下错误：

total number of created files now is 100013, which exceeds 100000

简单解决方法是设置更大值就行

先写这么多，后续再加

Shift是什么意思？Shift键的功能及作用大全详解