欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

hive中判断A表时间字段是否在B表的两个时间字段中

程序员文章站 2024-03-12 20:32:14
...

问题:在hive中,A表中有一个时间的字段,类型位string,格式为2018-12-0 13:34:12;在B表中有字段start_time和end_time,类型为string,格式为2018-12-06 13:34:12,先需要将两表按id关联后新增一个标记字段(当A表的时间在B表的start_time和end_time之间就标记为1,不在区间内就标记为0)

解决方法:使用join和case when then
先在hive中创建两张表并插入模拟数据:

-- 创建表
create table A (
id int,
time string
);
create table B (
id int,
start_time string,
end_time string
);

-- 插入模拟数据
insert into A values(1, '2018-04-02 13:34:12');
insert into A values(2, '2018-04-02 12:34:12');
insert into A values(3, '2018-04-02 14:34:12');
insert into A values(4, '2018-04-03 15:34:12');
insert into B values(1, '2018-04-02 12:34:12', '2018-04-02 14:34:12');
insert into B values(2, '2018-04-02 13:34:12', '2018-04-02 14:34:12');
insert into B values(3, '2018-04-02 13:34:12', '2018-04-02 15:34:12');
insert into B values(4, '2018-04-02 13:34:12', '2018-04-02 15:34:12');
insert into B values(5, '2018-04-02 15:34:12', '2018-04-02 19:34:12');

-- 查找结果
-- 方法一:不将时间字段转换为时间戳
select 
A.id id, 
A.time time, 
B.start_time start_time, 
B.end_time end_time,
case when A.time between B.start_time and B.end_time then 1 else 0 end sign
from 
A
join
B
on
A.id = B.id;

-- 结果:运行时间42.854 seconds(电脑不咋地)
OK
id  time                  start_time             end_time               sign 
1	2018-04-02 13:34:12	2018-04-02 12:34:12	2018-04-02 14:34:12	1
4	2018-04-03 15:34:12	2018-04-02 13:34:12	2018-04-02 15:34:12	0
2	2018-04-02 12:34:12	2018-04-02 13:34:12	2018-04-02 14:34:12	0
3	2018-04-02 14:34:12	2018-04-02 13:34:12	2018-04-02 15:34:12	1
Time taken: 42.854 seconds, Fetched: 4 row(s)

-- 方法二:将时间字段转换为时间戳
select 
A.id id, 
A.time time, 
B.start_time start_time, 
B.end_time end_time,
case when unix_timestamp(A.time) between unix_timestamp(B.start_time) and unix_timestamp(B.end_time) then 1 else 0 end sign
from 
A
join
B
on
A.id = B.id;

-- 结果:运行时间43.477 seconds
OK
id  time                  start_time             end_time               sign     
1	2018-04-02 13:34:12	2018-04-02 12:34:12	2018-04-02 14:34:12	1
4	2018-04-03 15:34:12	2018-04-02 13:34:12	2018-04-02 15:34:12	0
2	2018-04-02 12:34:12	2018-04-02 13:34:12	2018-04-02 14:34:12	0
3	2018-04-02 14:34:12	2018-04-02 13:34:12	2018-04-02 15:34:12	1
Time taken: 11.101 seconds, Fetched: 4 row(s)

可以看出将时间字段转换成时间戳后运行速度快些。

相关标签: hive case when then