HIVE基本语法以及HIVE分区

程序员文章站 2022-06-04 22:54:34

HIVE小结 HIVE基本语法 HIVE和Mysql十分类似建表规则 CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name [(col_name data_type [COMMENT col_comment], ...)] [COMMENT tabl ......

hive小结

hive基本语法

hive和mysql十分类似
建表规则

  create [external] table [if not exists] table_name 
  [(col_name data_type [comment col_comment], ...)] 
  [comment table_comment] 
  [partitioned by (col_name data_type [comment col_comment], ...)] 
  [clustered by (col_name, col_name, ...) 
  [sorted by (col_name [asc|desc], ...)] into num_buckets buckets] 
  [row format row_format] 
  [stored as file_format] 
  [location hdfs_path]

create table 创建一个指定名字的表。如果相同名字的表已经存在，则抛出异常；用户可以用 if not exist 选项来忽略这个异常
external 关键字可以让用户创建一个外部表，在建表的同时指定一个指向实际数据的路径（location）
like 允许用户复制现有的表结构，但是不复制数据
comment可以为表与字段增加描述

创建表
hive> create table if not exists test1
> (id int,name string);

删除表
drop table test1;
查看表结构
desc test1;
修改表名
alter table test1 rename to test2;
修改表结构
alter table test1 add columns(address string ,grade string);
创建和已知表相同结构的表
create table test3 like test1;
加载本地数据
load date local inpath '/home/date/' into table test1;
注意可以在into 前面添加overwrite表示覆盖之前在test1的数据，如果没有就表示加载本地数据在原始数据的后面
加载hdfs的文件
首先将文件上传到hdfs文件系统对对应的目录上
hadoop fs -put /home/.txt /usr/
然后加载hdfs中的数据
load data inpath /usr/ into table test1;

插入数据
insert overwrite table test2 select * from test1;
查询数据
和mysql语法上没甚没区别

查询单个字段的数据
where条件查询
all和distinct
limit限制查询
group by
order by
sort bu
distribute by
cluster by

hive分区

hive分区是为了更方便数据管理，常见的有时间分区和业分区

    create table t1(
    id      int
    ,name    string
    ,hobby   array<string>
    ,add     map<string,string>
    )
    partitioned by (pt_d string)

需要注意的是分区字段不能和表中的字段重复，否则就会报错：

    failed: semanticexception [error 10035]: column repeated in partitioning columns

我们在加载数据的时候也可以分区加载

load data local inpath '/home/hadoop/desktop/data' overwrite into table t1 partition ( pt_d = '201701');

之后我们再将同一份数据加载到不同的分区中

load data local inpath '/home/hadoop/desktop/data' overwrite into table t1 partition ( pt_d = '000000');

查询一下数据 select * from t1;

1   xiaoming    ["book","tv","code"]    {"beijing":"chaoyang","shagnhai":"pudong"}  000000
2   lilei   ["book","code"] {"nanjing":"jiangning","*":"taibei"}   000000
3   lihua   ["music","book"]    {"heilongjiang":"haerbin"}  000000
1   xiaoming    ["book","tv","code"]    {"beijing":"chaoyang","shagnhai":"pudong"}  201701
2   lilei   ["book","code"] {"nanjing":"jiangning","*":"taibei"}   201701
3   lihua   ["music","book"]    {"heilongjiang":"haerbin"}  201701

创建分区除了在创建表的时候启动partition by实现，还可以
alter table t1 add partition (pt_d string)
这样就创建了一个分区，这时会看到hive在hdfs中创建了相应的文件夹

查询相应的分区的数据

select * from t1 where pt_d = ‘000000’

添加分区，增加一个分区文件

alter table t1 add partition (pt_d = ‘333333’);

删除分区（删除对应的分区文件）
注意，对于外表进行drop partition并不会删除hdfs上的文件，并且通过msck repair table table_name同步回hdfs上的分区。

alter table test1 drop partition (pt_d = ‘20170101’);

查询分区

show partitions table_name;

修复分区
修复分区就是重新同步hdfs上的分区信息。

msck repair table table_name;

插入数据

insert overwrite table partition_test partition(stat_date='2015-01-18',province='jiangsu') 
select member_id,name from partition_test_input 
where stat_date='2015-01-18' 
and province='jiangsu';

内部表和外部表的区别

hive中表与外部表的区别：
1、在导入数据到外部表，数据并没有移动到自己的数据仓库目录下，也就是说外部表中的数据并不是由它自己来管理的！而表则不一样；
2、在删除表的时候，hive将会把属于表的元数据和数据全部删掉；而删除外部表的时候，hive仅仅删除外部表的元数据，数据是不会删除的！
那么，应该如何选择使用哪种表呢？在大多数情况没有太多的区别，因此选择只是个人喜好的问题。但是作为一个经验，如果所有处理都需要由hive完成，那么你应该创建表，否则使用外部表！

上一篇：用示波器检修主板的一般步骤

下一篇： Class getClass()

HIVE基本语法以及HIVE分区

hive小结