Hive常规操作(DDL、DML)

程序员文章站 2022-07-13 14:12:27

...

DDL

DataBase

1.查看数据库

0: jdbc:hive2://Hbase:10000> show databases;
+----------------+--+
| database_name  |
+----------------+--+
| default        |
| test           |
+----------------+--+
2 rows selected (2.96 seconds)

2.使用数据库

0: jdbc:hive2://Hbase:10000> use test;
No rows affected (0.127 seconds)

3.新建数据库

CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name   --DATABASE|SCHEMA 是等价的
  [COMMENT database_comment] --数据库注释
  [LOCATION hdfs_path] --存储在 HDFS 上的位置
  [WITH DBPROPERTIES (property_name=property_value, ...)]; --指定额外属性

0: jdbc:hive2://Hbase:10000> create database if not exists hive_test comment "this id test" WITH DBPROPERTIES("create"="tangc");
No rows affected (0.406 seconds)

4.查看数据库信息

DESC DATABASE [EXTENDED] db_name; --EXTENDED 表示是否显示额外属性

0: jdbc:hive2://Hbase:10000> desc database hive_test;
+------------+---------------+-----------------------------------------------------+-------------+-------------+-------------+--+
|  db_name   |    comment    |                      location                       | owner_name  | owner_type  | parameters  |
+------------+---------------+-----------------------------------------------------+-------------+-------------+-------------+--+
| hive_test  | this id test  | hdfs://Hbase:9000/user/hive/warehouse/hive_test.db  | root        | USER        |             |
+------------+---------------+-----------------------------------------------------+-------------+-------------+-------------+--+
1 row selected (0.388 seconds)

使用extended

0: jdbc:hive2://Hbase:10000> desc database extended hive_test;
+------------+---------------+-----------------------------------------------------+-------------+-------------+-----------------+--+
|  db_name   |    comment    |                      location                       | owner_name  | owner_type  |   parameters    |
+------------+---------------+-----------------------------------------------------+-------------+-------------+-----------------+--+
| hive_test  | this id test  | hdfs://Hbase:9000/user/hive/warehouse/hive_test.db  | root        | USER        | {create=tangc}  |
+------------+---------------+-----------------------------------------------------+-------------+-------------+-----------------+--+
1 row selected (0.093 seconds)

5.删除数据库
RESTRICT：默认行为，如果存在数据库中表，则删除失败。
CASCADE：级联删除。

DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];

0: jdbc:hive2://Hbase:10000> drop database if exists hive_test;
No rows affected (0.371 seconds)

6.查看当前数据库

0: jdbc:hive2://Hbase:10000> select current_database();
+-------+--+
|  _c0  |
+-------+--+
| test  |
+-------+--+
1 row selected (0.61 seconds)

Table

创建表

管理表

是内部表，也称为MANAGED_TABLE;默认存储在/user/hive/warehouse下，也可以通过location指定；删除表时，会删除表数据以及元数据；

create table if not exists t_user(
  id int,
  name string,
  sex boolean,
  age int,
  salary double,
  hobbies array<string>,
  card map<string,string>,
  address struct<country:string,city:string>
)
row format delimited
fields terminated by ','
collection items terminated by '|'
map keys terminated by '>'
lines terminated by '\n'
stored as textfile;

Hive常规操作(DDL、DML)

外部表
外部表称之为EXTERNAL_TABLE；在创建表时可以自己指定目录位置(LOCATION)；删除表时，只会删除元数据不会删除表数据；

create external table if not exists t_access(
     ip string,
     app varchar(32),
     service string,
     last_time timestamp
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex"="^(.*)\\s(.*)\\s(.*)\\s(.*\\s.*)"
)
LOCATION '/hive/t_access';

Hive常规操作(DDL、DML)

分区表
Hive中的表对应为HDFS上的指定目录，在查询数据时候，默认会对全表进行扫描，这样时间和性能的消耗都非常大。分区为HDFS上表目录的子目录，数据按照分区存储在子目录中。如果查询的where字句的中包含分区条件，则直接从该分区去查找，而不是扫描整个表目录，合理的分区设计可以极大提高查询速度和性能。这里说明一下分区表并非Hive独有的概念，实际上这个概念非常常见。比如在我们常用的Oracle数据库中，当表中的数据量不断增大，查询数据的速度就会下降，这时也可以对表进行分区。表进行分区后，逻辑上表仍然是一张完整的表，只是将表中的数据存放到多个表空间（物理文件上），这样查询数据时，就不必要每次都扫描整张表，从而提升查询性能。在Hive中可以使用PARTITIONED BY子句创建分区表。表可以包含一个或多个分区列，程序会为分区列中的每个不同值组合创建单独的数据目录。

7369	SMITH	CLERK	7902	1980-12-17 00:00:00	800.00
7499	ALLEN	SALESMAN	7698	1981-02-20 00:00:00	1600.00	300.00
7521	WARD	SALESMAN	7698	1981-02-22 00:00:00	1250.00	500.00
7566	JONES	MANAGER	7839	1981-04-02 00:00:00	2975.00
7654	MARTIN	SALESMAN	7698	1981-09-28 00:00:00	1250.00	1400.00
7698	BLAKE	MANAGER	7839	1981-05-01 00:00:00	2850.00
7782	CLARK	MANAGER	7839	1981-06-09 00:00:00	2450.00
7788	SCOTT	ANALYST	7566	1987-04-19 00:00:00	1500.00
7839	KING	PRESIDENT		1981-11-17 00:00:00	5000.00
7844	TURNER	SALESMAN	7698	1981-09-08 00:00:00	1500.00	0.00
7876	ADAMS	CLERK	7788	1987-05-23 00:00:00	1100.00
7900	JAMES	CLERK	7698	1981-12-03 00:00:00	950.00
7902	FORD	ANALYST	7566	1981-12-03 00:00:00	3000.00
7934	MILLER	CLERK	7782	1982-01-23 00:00:00	1300.00

此表中有7个字段，包含deptno部门。

 CREATE EXTERNAL TABLE t_employee(
    id INT,
    name STRING,
    job STRING,
    manager INT,
    hiredate TIMESTAMP,
    salary DECIMAL(7,2)
    )
    PARTITIONED BY (deptno INT)   
    ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
    LOCATION '/hive/t_employee';

0: jdbc:hive2://Hbase:10000> select id ,name,job,manager,salary,deptno from t_employee;
+-------+---------+------------+----------+---------+---------+--+
|  id   |  name   |    job     | manager  | salary  | deptno  |
+-------+---------+------------+----------+---------+---------+--+
| 7369  | SMITH   | CLERK      | 7902     | 800     | 10      |
| 7499  | ALLEN   | SALESMAN   | 7698     | 1600    | 10      |
| 7521  | WARD    | SALESMAN   | 7698     | 1250    | 10      |
| 7566  | JONES   | MANAGER    | 7839     | 2975    | 10      |
| 7654  | MARTIN  | SALESMAN   | 7698     | 1250    | 10      |
| 7698  | BLAKE   | MANAGER    | 7839     | 2850    | 10      |
| 7782  | CLARK   | MANAGER    | 7839     | 2450    | 10      |
| 7788  | SCOTT   | ANALYST    | 7566     | 1500    | 10      |
| 7839  | KING    | PRESIDENT  | NULL     | 5000    | 10      |
| 7844  | TURNER  | SALESMAN   | 7698     | 1500    | 10      |
| 7876  | ADAMS   | CLERK      | 7788     | 1100    | 10      |
| 7900  | JAMES   | CLERK      | 7698     | 950     | 10      |
| 7902  | FORD    | ANALYST    | 7566     | 3000    | 10      |
| 7934  | MILLER  | CLERK      | 7782     | 1300    | 10      |
+-------+---------+------------+----------+---------+---------+--+
14 rows selected (0.57 seconds)

Hive常规操作(DDL、DML)

分桶表
分区表是为了将文件按照分区文件夹进行粗粒度文件隔离，但是分桶表是将数据按照某个字段进行hash计算出所属的桶，然后在对桶内的数据进行排序。

 CREATE EXTERNAL TABLE t_employee_bucket(
    id INT,
    name STRING,
    job STRING,
    manager INT,
    hiredate TIMESTAMP,
    salary DECIMAL(7,2),
    deptno INT)
    CLUSTERED BY(id) SORTED BY(salary ASC) INTO 4 BUCKETS  
    ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
    LOCATION '/hive/employee_bucket';

set hive.enforce.bucketing = true; //让hive强制使用分桶
//将部门id为10号的数据插入到t_employee_bucket表中
INSERT INTO TABLE t_employee_bucket SELECT *  FROM t_employee WHERE deptno=10;

0: jdbc:hive2://Hbase:10000> select id,name,deptno from t_employee_bucket;
+-------+---------+---------+--+
|  id   |  name   | deptno  |
+-------+---------+---------+--+
| 7788  | SCOTT   | 10      |
| 7900  | JAMES   | 10      |
| 7876  | ADAMS   | 10      |
| 7844  | TURNER  | 10      |
| 7369  | SMITH   | 10      |
| 7521  | WARD    | 10      |
| 7934  | MILLER  | 10      |
| 7902  | FORD    | 10      |
| 7698  | BLAKE   | 10      |
| 7654  | MARTIN  | 10      |
| 7566  | JONES   | 10      |
| 7782  | CLARK   | 10      |
| 7499  | ALLEN   | 10      |
| 7839  | KING    | 10      |
+-------+---------+---------+--+
14 rows selected (0.313 seconds)

Hive常规操作(DDL、DML)

临时表
临时表仅对当前session可见，临时表的数据将存储在用户的暂存目录中，并在会话结束后删除。如果临时表与永久表表名相同，则对该表名的任何引用都将解析为临时表，而不是永久表。临时表还具有以下两个限制：不支持分区列；不支持创建索引.

  CREATE TEMPORARY TABLE if not exists emp_temp(
    id INT,
    name STRING,
    job STRING,
    manager INT,
    hiredate TIMESTAMP,
    salary DECIMAL(7,2),
    deptno INT
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
    LOCATION '/hive/emp_temp';

CTAS创建表
创建的表结构和查询的信息类似，但是并不会拷贝表的结构

0: jdbc:hive2://CentOS:10000> create  TEMPORARY TABLE t_employee_copy1 as select * from t_employee;

复制表结构
仅仅是复制表的结构，但是不拷贝数据。

0: jdbc:hive2://CentOS:10000> CREATE TEMPORARY EXTERNAL TABLE  IF NOT EXISTS  t_empoyee_copy2  LIKE t_employee_bucket location '/hive/t_empoyee_copy2';

0: jdbc:hive2://Hbase:10000> select id,name,deptno from t_empoyee_copy2;
+-----+-------+---------+--+
| id  | name  | deptno  |
+-----+-------+---------+--+
+-----+-------+---------+--+
No rows selected (0.14 seconds)
0: jdbc:hive2://Hbase:10000> desc t_empoyee_copy2;
+-----------+---------------+----------+--+
| col_name  |   data_type   | comment  |
+-----------+---------------+----------+--+
| id        | int           |          |
| name      | string        |          |
| job       | string        |          |
| manager   | int           |          |
| hiredate  | timestamp     |          |
| salary    | decimal(7,2)  |          |
| deptno    | int           |          |
+-----------+---------------+----------+--+
7 rows selected (0.106 seconds)

修改表

重命名表

0: jdbc:hive2://CentOS:10000> ALTER TABLE t_user RENAME TO t_u;

0: jdbc:hive2://Hbase:10000> show tables;
+--------------------+--+
|      tab_name      |
+--------------------+--+
| t_access           |
| t_employee         |
| t_employee_bucket  |
| t_employee_copy1   |
| t_empoyee_copy2    |
| t_u                |
+--------------------+--+
6 rows selected (0.076 seconds)

修改列(修改类型、顺序、新增)

0: jdbc:hive2://CentOS:10000> ALTER TABLE t_employee CHANGE id  eid INT;--修改列名&类型
0: jdbc:hive2://CentOS:10000> ALTER TABLE t_employee CHANGE eid id decimal(7,2)  AFTER name;--修改顺序
0: jdbc:hive2://CentOS:10000> ALTER TABLE t_employee ADD COLUMNS (address STRING);

清空表

0: jdbc:hive2://CentOS:10000> truncate table t_employee partition(deptno=10);
只能截断managed-table

删除

0: jdbc:hive2://CentOS:10000> drop table t_employee PURGE;
PURGE表示数据会直接删除，不会放置在垃圾箱中

0: jdbc:hive2://Hbase:10000> drop table t_empoyee_copy2 ;
No rows affected (0.024 seconds)
0: jdbc:hive2://Hbase:10000> show tables;
+--------------------+--+
|      tab_name      |
+--------------------+--+
| t_access           |
| t_employee         |
| t_employee_bucket  |
| t_employee_copy1   |
| t_u                |
+--------------------+--+
5 rows selected (0.098 seconds)

删除分区

0: jdbc:hive2://Hbase:10000> show partitions t_employee;
+------------+--+
| partition  |
+------------+--+
| deptno=10  |
| deptno=20  |
+------------+--+
2 rows selected (0.241 seconds)
0: jdbc:hive2://Hbase:10000> alter table t_employee drop partition(deptno=10);
INFO  : Dropped the partition deptno=10
No rows affected (0.538 seconds)
0: jdbc:hive2://Hbase:10000> show partitions t_employee;
+------------+--+
| partition  |
+------------+--+
| deptno=20  |
+------------+--+
1 row selected (0.174 seconds)

其他命令

Describe

查看数据库

DESCRIBE|DESC DATABASE [EXTENDED] db_name;

0: jdbc:hive2://CentOS:10000> desc database hive_test;
+------------+--------------------+------------------------------------------------------+-------------+-------------+-------------+--+
|  db_name   |      comment       |                       location                       | owner_name  | owner_type  | parameters  |
+------------+--------------------+------------------------------------------------------+-------------+-------------+-------------+--+
| hive_test  | database for test  | hdfs://CentOS:9000/user/hive/warehouse/hive_test.db  | root        | USER        |             |
+------------+--------------------+------------------------------------------------------+-------------+-------------+-------------+--+
1 row selected (0.039 seconds)

查看表

DESCRIBE|DESC [EXTENDED|FORMATTED] table_name

0: jdbc:hive2://CentOS:10000> desc  t_user;
+-----------+-------------------------------------+----------+--+
| col_name  |              data_type              | comment  |
+-----------+-------------------------------------+----------+--+
| id        | int                                 |          |
| name      | string                              |          |
| sex       | boolean                             |          |
| age       | int                                 |          |
| salary    | double                              |          |
| hobbies   | array<string>                       |          |
| card      | map<string,string>                  |          |
| address   | struct<country:string,city:string>  |          |
+-----------+-------------------------------------+----------+--+
8 rows selected (0.06 seconds)

show

查看数据库列表

SHOW (DATABASES|SCHEMAS) [LIKE 'identifier_with_wildcards'];

0: jdbc:hive2://CentOS:10000> show schemas like '*'
0: jdbc:hive2://CentOS:10000> ;
+----------------+--+
| database_name  |
+----------------+--+
| default        |
| hive_test      |
| test           |
+----------------+--+
3 rows selected (0.03 seconds)

查看表的列表

SHOW TABLES [IN database_name] ['identifier_with_wildcards'];

0: jdbc:hive2://CentOS:10000> show tables;
+--------------------+--+
|      tab_name      |
+--------------------+--+
| t_access           |
| t_employ_copy      |
| t_employee_bucket  |
| t_employee_copy1   |
| t_user             |
+--------------------+--+
5 rows selected (0.054 seconds)
0: jdbc:hive2://CentOS:10000> show tables in test;
+-------------+--+
|  tab_name   |
+-------------+--+
| t_access    |
| t_employee  |
| t_product   |
| t_student   |
| t_user      |
+-------------+--+
5 rows selected (0.043 seconds)
0: jdbc:hive2://CentOS:10000> show tables in test like 't_*';
+-------------+--+
|  tab_name   |
+-------------+--+
| t_access    |
| t_employee  |
| t_product   |
| t_student   |
| t_user      |
+-------------+--+
5 rows selected (0.04 seconds)

查看分区

0: jdbc:hive2://CentOS:10000> show partitions t_employee;
+------------+--+
| partition  |
+------------+--+
| deptno=10  |
+------------+--+
1 row selected (0.065 seconds)

查看建表语句

0: jdbc:hive2://CentOS:10000> show create table t_employee;
+-----------------------------------------------------------------+--+
|                         createtab_stmt                          |
+-----------------------------------------------------------------+--+
| CREATE EXTERNAL TABLE `t_employee`(                             |
|   `id` int,                                                     |
|   `name` string,                                                |
|   `job` string,                                                 |
|   `manager` int,                                                |
|   `hiredate` timestamp,                                         |
|   `salary` decimal(7,2))                                        |
| PARTITIONED BY (                                                |
|   `deptno` int)                                                 |
| ROW FORMAT DELIMITED                                            |
|   FIELDS TERMINATED BY '\t'                                     |
| STORED AS INPUTFORMAT                                           |
|   'org.apache.hadoop.mapred.TextInputFormat'                    |
| OUTPUTFORMAT                                                    |
|   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'  |
| LOCATION                                                        |
|   'hdfs://CentOS:9000/hive/t_employee'                          |
| TBLPROPERTIES (                                                 |
|   'transient_lastDdlTime'='1576961129')                         |
+-----------------------------------------------------------------+--+
19 rows selected (0.117 seconds)

Hive常规操作(DDL、DML)

DDL

DataBase

Table

创建表

修改表

清空表

删除

其他命令

Hive-1.2.1_04_DML操作

Hive-1.2.1_03_DDL操作

使用sqlplus创建DDL和DML操作方法

使用sqlplus创建DDL和DML操作技巧

Oracle ddl 和 dml 操作

MYSQL入门操作和常规DML、DDL、DQL使用

SQL语句的数据操作语言 (DML) 和数据定义语言 (DDL)使用介绍

Oracle数据库基本操作（一） —— Oracle数据库体系结构介绍、DDL、DCL、DML

Hive的DDL操作

Hive常规操作(DDL、DML)

Hive常规操作(DDL、DML)

DDL

DataBase

Table

创建表

修改表

清空表

删除

其他命令

Hive-1.2.1_04_DML操作

Hive-1.2.1_03_DDL操作

使用sqlplus创建DDL和DML操作方法

使用sqlplus创建DDL和DML操作技巧

Oracle ddl 和 dml 操作

MYSQL入门操作和常规DML、DDL、DQL使用

SQL语句的数据操作语言 (DML) 和 数据定义语言 (DDL)使用介绍

Oracle数据库基本操作（一） —— Oracle数据库体系结构介绍、DDL、DCL、DML

Hive的DDL操作

Hive常规操作(DDL、DML)

SQL语句的数据操作语言 (DML) 和数据定义语言 (DDL)使用介绍