hive根据现有数据表创建分区表,进行动态或静态分区插入数据
1:现有数据表结构定义:
CREATE TABLE `tab_user`(
`name` string,
`age` int,
`sex` string,
`addr` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://node:9000/user/hive/warehouse/daxin.db/tab_user'
2:现有数据表的数据:
daxin 18 male beijing
mali 28 female shandong
wangsan 34 male beijing
lisi 45 male liaoning
liwu 58 female beijing
maoliu 43 male anhui
zhouba 62 female beijing
3:对现有用户表数据按照位置信息进行分区,创建新的用户分区表:
CREATE TABLE `user_partition_tab`(
`name` string,
`age` int,
`sex` string) PARTITIONED BY(addr STRING);
4:插入用户数据:
insert overwrite table ptab PARTITION (addr) select name,age,sex,addr from user_partition_tab;
执行上面代码会报错:
FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict
拓展:
对于hive分区表插入数据时候,对于分区字段可以自行指定一个静态字段或者根据分区字段的具体值进行插入分区表,对于前者指定一个分区值的插入则成为静态分区插入,而后者根据分区字段的具体值插入则成为动态分区插入。
a:静态插入:
insert overwrite table ptab PARTITION (addr='qiqihaer') select name,age,sex from tab_user;
指定分区字段addr的值为qiqihaer,如果表中该分区不存在的话则创建该分区。
b:动态插入:
insert overwrite table ptab PARTITION (addr) select name,age,sex,addr from tab_user;
该分区字段是根据select出来的具体值进行动态分区,因此就需要开启:set hive.exec.dynamic.partition.mode=nonstrict。
Configuration property |
Default |
Note |
---|---|---|
|
|
Needs to be set to |
|
|
In |
|
100 |
Maximum number of dynamic partitions allowed to be created in each mapper/reducer node |
|
1000 |
Maximum number of dynamic partitions allowed to be created in total |
|
100000 |
Maximum number of HDFS files created by all mappers/reducers in a MapReduce job |
|
|
Whether to throw an exception if dynamic partition insert generates empty results |
hive.exec.dynamic.partition.mode默认是strict,必须制定一个分区进行插入数据,以避免覆盖所有的分区数据;但是如果需要动态分区插入数据就必须设置nonstrict,nonstrict表示不是严格的必须指定一个静态分区,言外之意就是动态分区插入。其他属性容易理解不解释。