欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Hive操作

程序员文章站 2022-07-14 14:36:40
...

一、基本命令

查询数据库
show databases;
模糊搜索

类似正则表达式

show tables like 'xx'
删除数据库:
drop database database_name;
删除数据表:
drop table table_name
查看表结构信息
desc table_name
查看详细表结构信息
desc formatted table_name
查看分区信息
show partition table_name;
插入数据
insert into table values(xxx),(xxxx),;

二、创建表命令

创建内部表
create table person_inside (id string,name string,age int) row format delimited fields terminated by ',' stored as textfile;

注意是fields不是field

创建外部表
create external table external_table_name (filed_name field_type,) row fromat delimited field terminated by ',' stored  as textfile location 'hdfs_location';

location后面跟的是目录不是文件

加载本地数据
load data local inpath 'file:///xx.txt' into table person_inside;
删除表
drop table table_name;
创建分区表
creat table table_name(filed_name field_type,) partitioned by (field_name field_type,)

注意 partition中的filed前面的filed不能重复

创建分桶表

先设置 :

set hive.enforce.bucketing =true;
create table person_buckets(id string,name string,sex string)  clustered by(id) into 5 buckets row format delimited fields terminated by ',' stored as textfile;

clustered by+ into buckets

注意:

要生成桶的数据只能是由其他表通过insert into 或是insert overwrite导入数据,如果使用loda data加载数据,则不能生成桶数据

三、导出表命令

导出到本地文件系统
insert overwrite local directory 'location' select * from table_name;

导出的数据默认分隔符是^(ascii码是\001)

导出到HDFS
insert overwrite directory '/' select from table_name;
导出到另一个Hive表
insert into table table1_name partition(filed=value) select * from table2_name;

四、查询命令

基于partition的查询
select * from table_name where partition_name=value;
限制条数查询
select * from table_name limit 5;
Top N查询
select * from table_name sort by field_name desc|asc limit 5;

五、连接命令

Hive多表关联使用join…on语句,只支持等值连接,即on句子用等号连接,不支持非等值连接,如果连接中有where子句,会先执行joind子句,在执行where句子

内连接
select * from table1_name join table2_name on table1_name.filed_name=table2_name.filed_name
左外连接

左边有数据而右边没有数据,则左边有数据的记录对应的返回列为空

select * from table_name left outer join table2_name on table1_name.filed_name =table2_name.filed_name;
右外连接

右边有数据而左变没有数据,则左变的数据记录对用返回列为空

select * from table_name right outer join table2_name on table1_name.filed_name =table2_name.filed_name;
全外连接
select * from table1_name full outer join table2_name on table1_name.filed_name =table2_name.field_name;
左半连接
select * from table_name left semi join table2_name on table1_name.field_name = table2_name.field;

相当于:

select * from table1_name where field_name in select field_name in table2_name;

六、其他

创建表
create [external] table [if not exists] table_name
(col_name type) 
[partitioned by (col_name,)]
[sorted by (col_name [asc|desc])]
[into number buckets]
[row format row_format(such as delimited terminated by)]
[stored as file_fromat]
[location hdfs_path]
加载数据:
load data [local] inpath 'filepath' [overwrite] into table table_name [partition(filed1=value1,)]
相关标签: Hive