hive sql使用总结
程序员文章站
2022-05-18 17:16:20
...
Hive设置多个reduce方法:set mapred.reduce.tasks = 2;
(1) order by/distribute by/sort by/cluster by区别
order by #全局排序
sort by #局部排序,单独reduce中进行排序
distribute by #分桶排序,相同KEY的记录被划分到一个Reduce
cluster by =distribute by+ sort by #分桶排序
cluster by id,name 默认是升序,且不可指定asc或desc
group by #单纯分组,一般和AVG()/COUNT()/MAX()组合
(2)窗口函数
SELECT
RANK() OVER(PARTITION BY id ORDER BY dt desc) AS rn1,
DENSE_RANK() OVER(PARTITION BY id ORDER BY dt desc) AS rn2,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY dt DESC) AS rn3
FROM table_test;
上一篇: Hive SQL
下一篇: 在numpy数组中查找最接近的值