hive row_number()
程序员文章站
2022-04-20 21:12:01
...
问题背景:想对字段A分组,然后根据字段B排序
最笨的方法:
SELECT b.*
FROM (
SELECT
A
, MAX(B) AS B
FROM mytable
WHERE
SOME CONDITIONS
GROUP BY A
) a
JOIN (
SELECT
A
, C
, B
FROM mytable
WHERE
SOME CONDITIONS
GROUP BY A, B, C
)b
ON a.B = b.B AND a.A = b.A
;
后来学会了row_num(),是hive自带的函数,它的作用是按指定的列进行分组生成行序列。在ROW_NUMBER(a,b) 时,若两条记录的a,b列相同,则行序列+1,否则重新计数。
一开始还是写出来很笨很丑的代码:
SELECT *, rank
FROM(
SELECT
*
, row_number() over (partition by A order by B desc ) rank
FROM (
SELECT
A
, C
, B
FROM mytable
WHERE
SOME CONDITIONS
GROUP BY
A
, C
, B
) tb_1
) tb_2
WHERE rank < 2
其实正确的应该是如下方法:
select * from(
SELECT
A
, C
, B
, row_number() over (partition by A order by B desc ) rank
FROM mytable
WHERE
SOME CONDITION
) a where a.rank=1
下一篇: ROW_NUMBER() OVER()