大数据技术_Hive_DML

程序员文章站 2022-05-01 10:12:37

...

第5章 DML数据操作

5.1 数据导入

5.2 数据导出

5.3 清除表中数据（Truncate）

第5章 DML数据操作

5.1 数据导入

5.1.1 向表中装载数据（Load）

1．语法

load data [local] inpath '/opt/module/datas/student.txt' overwrite | into table student [partition (partcol1=val1,…)];

（1）load data:表示加载数据

（2）local:表示从本地加载数据到hive表；否则从HDFS加载数据到hive表

（3）inpath:表示加载数据的路径

（4）overwrite:表示覆盖表中已有数据，否则表示追加

（5）into table:表示加载到哪张表

（6）student:表示具体的表

（7）partition:表示上传到指定分区

2．实操案例

（0）创建一张表

hive (default)> create table student(id string, name string) row format delimited fields terminated by '\t';

（1）加载本地文件到hive

hive (default)> load data local inpath '/opt/module/datas/student.txt' into table default.student;

（2）加载HDFS文件到hive中

上传文件到HDFS

hive (default)> dfs -put /opt/module/datas/student.txt /user/atguigu/hive;

注意： /user/atguigu/hive 目录需要提前存在。

加载HDFS上数据

hive (default)> load data inpath '/user/atguigu/hive/student.txt' into table default.student;

注：如果文件是在本地加载本地文件到hive只是把本地文件复制到响应的目录里面，而如果文件在hdfs上加载HDFS文件到hive中，这个时候是把HDFS上的文件剪切到了相应的目录。

（3）加载数据覆盖表中已有的数据

上传文件到HDFS

hive (default)> dfs -put /opt/module/datas/student.txt /user/atguigu/hive;

加载数据覆盖表中已有的数据

hive (default)> load data inpath '/user/atguigu/hive/student.txt' overwrite into table default.student;

注： overwrite 覆盖，原来的数据没有了。

5.1.2 通过查询语句向表中插入数据（Insert）

1．创建一张分区表

hive (default)> create table student(id int, name string) partitioned by (month string) row format delimited fields terminated by '\t';

2．基本插入数据

hive (default)> insert into table student partition(month='201709') values(1,'wangwu');

3．基本模式插入（根据单张表查询结果）

hive (default)> insert overwrite table student partition(month='201708') select id, name from student where month='201709';

4．多插入模式（根据多张表查询结果）

hive (default)> from student
insert overwrite table student partition(month='201707')
select id, name where month='201709'
insert overwrite table student partition(month='201706')
select id, name where month='201709';

注：这个语句的意思就是查询 student 这张表，查询 month='201709 的分区，然后以这个结果创建另外两个分区 month='201707'、month='201706' 。

5.1.3 查询语句中创建表并加载数据（As Select）

详见4.5.1章创建表。

根据查询结果创建表（查询的结果会添加到新创建的表中）

hive (default)> create table if not exists student3 as select id, name from student;

5.1.4 创建表时通过Location指定加载数据路径

1．创建表，并指定在hdfs上的位置

create table if not exists student5(
id int, name string
)
row format delimited fields terminated by '\t'
location '/user/hive/warehouse/student5';

2．上传数据到hdfs上

hive (default)> dfs -put /opt/module/datas/student.txt /user/hive/warehouse/student5;

3．查询数据

hive (default)> select * from student5;

注：这种方式适用于已经存在的数据，我现在要对它建表分析，我没必要在建一张表然后把数据移过来，直接指定我新建表的位置即可。

5.1.5 Import数据到指定Hive表中

注意：先用export导出后，再将数据导入。

hive (default)> import table student2 partition(month='201709') from '/user/hive/warehouse/export/student';

5.2 数据导出

5.2.1 Insert导出

1．将查询的结果导出到本地

hive (default)> insert overwrite local directory '/opt/module/datas/export/student' select * from student;

注：文件夹它会自动的创建。

2．将查询的结果格式化导出到本地

hive(default)> insert overwrite local directory '/opt/module/datas/export/student' row format delimited fields terminated by '\t' select * from student;

3．将查询的结果导出到HDFS上(没有local)

hive (default)> insert overwrite directory '/user/atguigu/student2' row format delimited fields terminated by '\t' select * from student;

5.2.2 Hadoop命令导出到本地

hive (default)> dfs -get /user/hive/warehouse/student/month=201709/000000_0 /opt/module/datas/export/student3.txt;

5.2.3 Hive Shell 命令导出

基本语法：（hive -f/-e 执行语句或者脚本 > file）

[aaa@qq.com hive]$ bin/hive -e 'select * from default.student;' > /opt/module/datas/export/student4.txt;

5.2.4 Export导出到HDFS上

hive (default)> export table default.student to '/user/hive/warehouse/export/student';

注：导出的文件包含了元数据信息，这样才能用 Import 导入数据。

5.2.5 Sqoop导出

这是一个框架，用于实现关系型数据库跟非关系型数据库的导入导出，可以把mysql中的数据导入到hive里面，还可以吧hive中的数据导入到mysql里面。

5.3 清除表中数据（Truncate）

注意：Truncate只能删除管理表，不能删除外部表中数据

hive (default)> truncate table student;

mysql 中的元数据

元数据就是对应关系，对应关系不是一个数据（不是元数据、对应关系、文件，对应关系就是元数据），元数据和 HDFS 上的文件是互相映射的。

查看 mysql 中 metastore 库里面的表：

TBLS：里面存放的表的信息

USE `metastore`;
SELECT * FROM `TBLS`;

大数据技术_Hive_DML

TBL_ID 主键自增

DBS：数据库

SELECT * FROM `DBS`;

大数据技术_Hive_DML

SDS：位置关系

SELECT * FROM `SDS`;

大数据技术_Hive_DML

PARTITIONS：分区信息

SELECT * FROM `PARTITIONS`;

大数据技术_Hive_DML

TBL_ID 外键，关联着表

上一篇：华为Hadoop FusionInsight+SUSE安装教程

下一篇：大数据基础之Kafka——Kafka基本简介及基本操作命令

大数据技术_Hive_DML

第5章 DML数据操作

5.1 数据导入

5.1.1 向表中装载数据（Load）

5.1.2 通过查询语句向表中插入数据（Insert）

5.1.3 查询语句中创建表并加载数据（As Select）

5.1.4 创建表时通过Location指定加载数据路径

5.1.5 Import数据到指定Hive表中

5.2 数据导出

5.2.1 Insert导出

5.2.2 Hadoop命令导出到本地

5.2.3 Hive Shell 命令导出

5.2.4 Export导出到HDFS上

5.2.5 Sqoop导出

5.3 清除表中数据（Truncate）

mysql 中的元数据

2021年985大学最低分数线多少？（近三年数据参考）

通过JDBC连接oracle数据库的十大技巧

上海13所原二本大学，2020录取数据简析（应技大、二工大、立信、政法等）

BCP 大容量数据导入导出工具使用步骤

MySQL数据库十大优化技巧

2021年985大学最低分数线多少？（近三年数据参考）

算法与数据结构(算法简介及大O表示法)

Java数据库存取技术

BCP 大容量数据导入导出工具使用步骤

观远数据荣获机器之心「三十大最佳AI应用案例」奖项