hive根据现有数据表创建分区表,进行动态或静态分区插入数据

程序员文章站 2024-03-09 09:47:41

...

1:现有数据表结构定义：

CREATE TABLE `tab_user`(
  `name` string, 
  `age` int, 
  `sex` string, 
  `addr` string)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY ',' 
  LINES TERMINATED BY '\n' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://node:9000/user/hive/warehouse/daxin.db/tab_user'

2：现有数据表的数据：

daxin	18	male	beijing
mali	28	female	shandong
wangsan	34	male	beijing
lisi	45	male	liaoning
liwu	58	female	beijing
maoliu	43	male	anhui
zhouba	62	female	beijing

3：对现有用户表数据按照位置信息进行分区，创建新的用户分区表：

CREATE TABLE `user_partition_tab`(
`name` string, 
`age` int, 
`sex` string)  PARTITIONED BY(addr STRING);

4：插入用户数据：

insert overwrite table ptab PARTITION (addr)  select name,age,sex,addr from user_partition_tab;

执行上面代码会报错：

FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict

拓展：

对于hive分区表插入数据时候，对于分区字段可以自行指定一个静态字段或者根据分区字段的具体值进行插入分区表，对于前者指定一个分区值的插入则成为静态分区插入，而后者根据分区字段的具体值插入则成为动态分区插入。

a：静态插入：

insert overwrite table ptab PARTITION (addr='qiqihaer')  select name,age,sex from tab_user;

指定分区字段addr的值为qiqihaer，如果表中该分区不存在的话则创建该分区。

b：动态插入：

insert overwrite table ptab PARTITION (addr)  select name,age,sex,addr from tab_user;

该分区字段是根据select出来的具体值进行动态分区，因此就需要开启：set hive.exec.dynamic.partition.mode=nonstrict。

Configuration property	Default	Note
`hive.exec.dynamic.partition`	`true`	Needs to be set to `true` to enable dynamic partition inserts
`hive.exec.dynamic.partition.mode`	`strict`	In `strict` mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions, in `nonstrict` mode all partitions are allowed to be dynamic
`hive.exec.max.dynamic.partitions.pernode`	100	Maximum number of dynamic partitions allowed to be created in each mapper/reducer node
`hive.exec.max.dynamic.partitions`	1000	Maximum number of dynamic partitions allowed to be created in total
`hive.exec.max.created.files`	100000	Maximum number of HDFS files created by all mappers/reducers in a MapReduce job
`hive.error.on.empty.partition`	`false`	Whether to throw an exception if dynamic partition insert generates empty results

hive.exec.dynamic.partition.mode默认是strict，必须制定一个分区进行插入数据，以避免覆盖所有的分区数据；但是如果需要动态分区插入数据就必须设置nonstrict，nonstrict表示不是严格的必须指定一个静态分区，言外之意就是动态分区插入。其他属性容易理解不解释。

参考：https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-DynamicPartitionInserts

上一篇：虚拟机扩容（/dev/mapper/centos-root 空间不足）

下一篇： Hibernate批量处理海量数据的方法