Hive基础知识

程序员文章站 2022-07-14 14:40:28

...

2019独角兽企业重金招聘Python工程师标准>>> Hive基础知识

1、Hive有四种类型的表

1.1、内部表,删除时，表数据同时删除

create table if not exists cl_stu(id int,name string) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t' location '/hive_data';

location为hdfs的目录路径。如果创建内部表时没有指定location，系统默认会在配置项hive.metastore.warehouse.dir定义的路径中，例如/user/hive/warehouse/下新建一个表目录

无论是否指定location，load数据进去，会把数据复制到hdfs指定的目录下面，当删除的时候，这个hdfs文件就会被删除

1.2、外部表，删除表不删除数据，方便共享数据，描述表的信息会被删除

create external table if not exists hff_rbi_roaming_all(
acc_nbr                       string,
date_time                       string,
roamingname string,
create_time string
)row format delimited fields terminated by '\001' location '/yw/hff_rbi_roaming_all';

load数据进去，会把数据复制到hdfs指定的目录下面，当删除的时候，这个hdfs文件就不会被删除，当我们重新建个外部表指定到同一个路径的时候，我们就能select数据出来

1.3、分区表，提高查询性能

create table if not exists hff_rbi_roaming_all(
acc_nbr                       string,
date_time                       string,
roamingname string,
create_time string)partitioned by (month string)

1.4、外部分区表

create external table if not exists dgdw.YFF_RBI_SJ(
SESSIONID                string
)partitioned by (receive_day string) row format delimited fields terminated by '|' location '/yw/YFF_RBI_SJ'

查看表结构 desc 表名，如果想看详细的，describe formatted 表名

2、四种数据导入方式

(1)、从本地文件系统中导入数据到Hive表；

load data local inpath ‘路径’ [overwrite] into table 表名

例如：load data local inpath '/home/cl_stu.txt' into table test.cl_student;

注意：执行此命令的时候，需要在有此文件的主机上执行hive，否者会报错FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'txt'
(2)、从HDFS上导入数据到Hive表；

load data  inpath '路径' [overwrite] into table 表名

(3)、从别的表中查询出相应的数据并导入到Hive表中

insert overwrite table tb1 select a.acc_nbr acc_nbr from acc_nbr2

insert into table tb1 select a.acc_nbr acc_nbr from acc_nbr2

(4)、在创建表的时候通过从别的表中查询出相应的记录并插入到所创建的表中。

create table b as select * from a

数据导出

set hive.merge.mapredfiles=true #在Map-Reduce的任务结束时合并小文件

set hive.merge.size.per.task=256000000 #合并文件的大小

set hive.merge.smallfiles.avgsize=16000000 #当输出文件的平均大小小于该值时，启动一个独立的map-reduce任务进行文件merge

（1）hadoop fs -get hdfs路径 linux路径

（2）insert overwrite local directory '/home/' select * from b2i_overnet_4;

3、insert与update操作

hive不支持用insert语句一条一条的进行插入操作，也不支持update操作（后续版本有支持该功能）

4、分区：分区的作用是为了提高查询性能

添加分区

alter table dw.RBI_ZQ add if NOT EXISTS partition(receive_day = 20170213);

删除分区

alter table dw.tb drop if exists partition(receive_day=201612);

修改分区

alter table tb_comm set location 'hdfs://yw/tb_comm';

例：如果建表的时候设置了错误的分区后，用上面语句重新设置了分区后，需要重新删除分区和添加分区，否则，将查询不到这些分区数据

5、hive里面执行hdfs的命令和shell命令

hive cli中查看dfs命令 dfs -ls /yw;

在shell中执行hdfs命令：hadoop fs -ls

hive也可以执行shell的命令，如：!pwd;

6、类型转换

string类型字段转换成bigint，hive也会默认自动转换类型的

select cast(name as bigint)-100,age from cl_stu where id=120;

7、增加新列

alter table cl_stu add columns (subject string comment 'Add new column');

8、hdfs获取表的文件

hadoop fs -get /yw/shcz/ /home/chenli/

9、lzo文件

create table as select 这种语法默认是存储压缩格式的文件，减少存储空间，

1、可以通过解压解决这种格式

hadoop fs -get /ywzc/dgdw/tf_daily_move_list/*.lzo /data/etl/tmp/tf_daily_move_list/

lzop -cd /data/etl/tmp/tf_daily_move_list/*.lzo >> /data/etl/tmp/tf_daily_move_list/tf_daily_move_list.dat

hadoop fs -rm /ywzc/dgdw/temp/tf_daily_move_list/*

hadoop fs -put /data/etl/tmp/tf_daily_move_list/tf_daily_move_list.dat /ywzc/dgdw/temp/tf_daily_move_list/

rm /data/etl/tmp/tf_daily_move_list/*

2、可以暂时关闭这种压缩，只对当前会话有效

set hive.exec.compress.output=false;

10、hive常见的配置

转载于:https://my.oschina.net/iamchenli/blog/845303

上一篇： C++ 中对dll二次封装时，加载第三方库dll时 LoadLibrary 时错误代码126

下一篇： centos7mongodb创建管理员账号(最详细操作)

Hive基础知识

1、Hive有四种类型的表

1.1、内部表,删除时，表数据同时删除

1.2、外部表，删除表不删除数据，方便共享数据，描述表的信息会被删除

1.3、分区表，提高查询性能

1.4、外部分区表

2、四种数据导入方式

3、insert与update操作

4、分区：分区的作用是为了提高查询性能

5、hive里面执行hdfs的命令和shell命令

6、类型转换

7、增加新列

8、hdfs获取表的文件

9、lzo文件

学习shell脚本之前的基础知识[图文]

电脑装机所需具备的基础知识以及装机常用的工具,步骤和注意事项

Linux进程基础知识 Linux线程介绍

Guitar Pro有哪些基础知识?

Linux信号机制基础知识介绍

微信小程序基础知识css样式media标签

Mysql数据库基础知识

数据库的用户帐号管理基础知识

Debian LINUX 基础知识介绍

Angular4的Rxjs基础知识讲解

Hive基础知识

1、Hive有四种类型的表

1.1、内部表,删除时，表数据同时删除

1.2、外部表，删除表不删除数据，方便共享数据，描述表的信息会被删除

1.3、分区表，提高查询性能

1.4、外部分区表

2、四种数据导入方式

3、insert与update操作

4、分区：分区的作用是为了提高查询性能

5、hive里面执行hdfs的命令和shell命令

6、类型转换

7、增加新列

8、hdfs获取表的文件

9、lzo文件

学习shell脚本之前的基础知识[图文]

电脑装机所需具备的基础知识以及装机常用的工具,步骤和注意事项

Linux进程基础知识 Linux线程介绍

Guitar Pro有哪些基础知识?

Linux信号机制基础知识介绍

微信小程序 基础知识css样式media标签

Mysql数据库基础知识

数据库的用户帐号管理基础知识

Debian LINUX 基础知识介绍

Angular4的Rxjs基础知识讲解

微信小程序基础知识css样式media标签