Hive的动态分区
一、简介
当使用静态分区时,在向分区表中插入数据时,我们需要指定具体分区列的值。此外,hive还支持动态提供分区值(即在插入数据时,不指定具体的分区列值,而是仅仅指定分区字段)。动态分区在默认情况下是禁用的(在hive2.3.4版本中默认是开启的,在hive-default.xml.template文件中进行配置),所以需要将hive.exec.dynamic.partition设为true。默认情况下,用户必须至少指定一个静态分区列,这是为了避免意外覆盖分区。要禁用此限制,可以设置分区模式为非严格模式(即将hive.exec.dynamic.partition.mode设为nonstrict,默认值为strict)。可以选择在命令行终端方式设置:
>SET hive.exec.dynamic.partition=true;
>SET hive.exec.dynamic.partition.mode=nonstrict;
或者在hive-site.xml中设置:
<property>
<name>hive.exec.dynamic.partition</name>
<value>true</value>
<description>Whether or not to allow dynamic partitions in DML/DDL.</description>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
<description>
In strict mode, the user must specify at least one static partition
in case the user accidentally overwrites all partitions.
In nonstrict mode all partitions are allowed to be dynamic.
</description>
</property>
二、例子
2.1准备数据
001,tom,23,2019-03-16
002,jack,12,2019-03-13
003,robin,14,2018-08-13
004,justin,34,2018-10-12
005,jarry,24,2017-11-11
006,jasper,24,2017-12-12
2.2创建表
(1)创建普通表,用于load数据
create table dynamic_tbl_temp(id int,name string,age int,start_date date)
row format delimited
fields terminated by ','
(2)创建分区表
create table dynamic_tbl(id int,name string,age int,start_date date)
partitioned by (year string,month string)
row format delimited
fields terminated by ','
2.3加载数据
(1)向普通表加载数据
load data local inpath '/opt/modules/data/dynamic.txt' into table dynamic_tbl_temp;
(2)向分区表中插入数据
insert into dynamic_tbl
partition(year,month)
select
id,
name,
age,
start_date,
year(start_date),
month(start_date)
from
dynamic_tbl_temp;
2.4查询数据
(1)查询year=2018的数据
select * from dynamic_tbl where year = 2018;
--结果
4 justin 34 2018-10-12 2018 10
3 robin 14 2018-08-13 2018 8
(2)查询year=2017,month=11的数据
select * from dynamic_tbl where year = 2017 and month = 11;
--结果:
5 jarry 24 2017-11-11 2017 11
上一篇: Hive创建动态分区