Hive操作
程序员文章站
2022-07-14 14:36:40
...
一、基本命令
查询数据库
show databases;
模糊搜索
类似正则表达式
show tables like 'xx'
删除数据库:
drop database database_name;
删除数据表:
drop table table_name
查看表结构信息
desc table_name
查看详细表结构信息
desc formatted table_name
查看分区信息
show partition table_name;
插入数据
insert into table values(xxx),(xxxx),;
二、创建表命令
创建内部表
create table person_inside (id string,name string,age int) row format delimited fields terminated by ',' stored as textfile;
注意是fields不是field
创建外部表
create external table external_table_name (filed_name field_type,) row fromat delimited field terminated by ',' stored as textfile location 'hdfs_location';
location后面跟的是目录不是文件
加载本地数据
load data local inpath 'file:///xx.txt' into table person_inside;
删除表
drop table table_name;
创建分区表
creat table table_name(filed_name field_type,) partitioned by (field_name field_type,)
注意 partition中的filed前面的filed不能重复
创建分桶表
先设置 :
set hive.enforce.bucketing =true;
create table person_buckets(id string,name string,sex string) clustered by(id) into 5 buckets row format delimited fields terminated by ',' stored as textfile;
clustered by+ into buckets
注意:
要生成桶的数据只能是由其他表通过insert into 或是insert overwrite导入数据,如果使用loda data加载数据,则不能生成桶数据
三、导出表命令
导出到本地文件系统
insert overwrite local directory 'location' select * from table_name;
导出的数据默认分隔符是^(ascii码是\001)
导出到HDFS
insert overwrite directory '/' select from table_name;
导出到另一个Hive表
insert into table table1_name partition(filed=value) select * from table2_name;
四、查询命令
基于partition的查询
select * from table_name where partition_name=value;
限制条数查询
select * from table_name limit 5;
Top N查询
select * from table_name sort by field_name desc|asc limit 5;
五、连接命令
Hive多表关联使用join…on语句,只支持等值连接,即on句子用等号连接,不支持非等值连接,如果连接中有where子句,会先执行joind子句,在执行where句子
内连接
select * from table1_name join table2_name on table1_name.filed_name=table2_name.filed_name
左外连接
左边有数据而右边没有数据,则左边有数据的记录对应的返回列为空
select * from table_name left outer join table2_name on table1_name.filed_name =table2_name.filed_name;
右外连接
右边有数据而左变没有数据,则左变的数据记录对用返回列为空
select * from table_name right outer join table2_name on table1_name.filed_name =table2_name.filed_name;
全外连接
select * from table1_name full outer join table2_name on table1_name.filed_name =table2_name.field_name;
左半连接
select * from table_name left semi join table2_name on table1_name.field_name = table2_name.field;
相当于:
select * from table1_name where field_name in select field_name in table2_name;
六、其他
创建表
create [external] table [if not exists] table_name
(col_name type)
[partitioned by (col_name,)]
[sorted by (col_name [asc|desc])]
[into number buckets]
[row format row_format(such as delimited terminated by)]
[stored as file_fromat]
[location hdfs_path]
加载数据:
load data [local] inpath 'filepath' [overwrite] into table table_name [partition(filed1=value1,)]