Hive中order by、sort by、distribute by和cluster by
程序员文章站
2022-04-29 18:54:01
...
Order By语法
colOrder: ( ASC | DESC )
colNullOrder: (NULLS FIRST | NULLS LAST) -- (Note: Available in Hive 2.1.0 and later)
orderBy: ORDER BY colName colOrder? colNullOrder? (',' colName colOrder? colNullOrder?)*
query: SELECT expression (',' expression)* FROM src orderBy
对全局数据的排序,只有一个reduce
Sort By语法
The SORT BY syntax is similar to the syntax of ORDER BY in SQL language.
colOrder: ( ASC | DESC ) sortBy: SORT BY colName colOrder? ( ',' colName colOrder?)*
query: SELECT expression ( ',' expression)* FROM src sortBy
|
对每一个Reduce内部进行排序,对全局结果集来说不是排序的
设置 reduce 执行的个数
set mapreduce.job.reduces=<number>
sort by样例
set mapreduce.job.reduces=3
insert overwrite local directory '/opt/datas/hive_exp_emp0308' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY '\n'
select * from emp sort by empno asc
Distribute By
也就是分区partition,类似MapReduce中分区partition,对数据进行分区后,结合sort by 进行排序使用。
insert overwrite local directory '/opt/datas/hive_exp_distribute_emp0308' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY '\n'
select * from emp distribute by deptno sort by empno asc
第一个分区数据000000_0
第二个分区000001_0
第三个分区000002_0
Cluster By
当sort by 和 distribute by的字段相同时,就可以使用Cluster By替换。
insert overwrite local directory '/opt/datas/hive_exp_cluster_emp0308' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY '\n'
select * from emp cluster by empno
总结
Hive中select新特性
Order By全局排序,一个Reduce
Sort By
每个reduce内部进行排序,全局不是排序
Distribute By
类似MR中partition,进行分区,结合sort by使用
Cluster By
当distribute和sort字段相同时,使用方式
上一篇: 人参枸杞泡水喝有什么功效作用
下一篇: JVM系列二:垃圾回收
推荐阅读
-
hive四种排序Order By , Sort By ,Distribute By ,Cluster By
-
Hive中的排序及优化 ORDER BY, SORT BY, DISTRIBUTE BY, CLUSTER BY
-
Hive中order by、sort by、distribute by和cluster by
-
Hive中的四种排序方式(order by,sort by,distribute by,cluster by)使用与区别详解
-
Hive_Hive 排序及优化 ORDER BY, SORT BY, DISTRIBUTE BY, CLUSTER BY
-
hive排序(order by,sort by,distribute by,cluster by)
-
Hive中 Order by,Sort by, Distribute by, Cluster by详解
-
hive入门之排序查询(order by,sort by,distribute by,cluster by...)
-
hive中order by、distribute by、sort by和cluster by的区别和联系