欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Hive常规操作(DDL、DML)

程序员文章站 2022-07-13 14:12:27
...

DDL

DataBase

1.查看数据库

0: jdbc:hive2://Hbase:10000> show databases;
+----------------+--+
| database_name  |
+----------------+--+
| default        |
| test           |
+----------------+--+
2 rows selected (2.96 seconds)

2.使用数据库

0: jdbc:hive2://Hbase:10000> use test;
No rows affected (0.127 seconds)

3.新建数据库

CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name   --DATABASE|SCHEMA 是等价的
  [COMMENT database_comment] --数据库注释
  [LOCATION hdfs_path] --存储在 HDFS 上的位置
  [WITH DBPROPERTIES (property_name=property_value, ...)]; --指定额外属性
0: jdbc:hive2://Hbase:10000> create database if not exists hive_test comment "this id test" WITH DBPROPERTIES("create"="tangc");
No rows affected (0.406 seconds)

4.查看数据库信息

DESC DATABASE [EXTENDED] db_name; --EXTENDED 表示是否显示额外属性
0: jdbc:hive2://Hbase:10000> desc database hive_test;
+------------+---------------+-----------------------------------------------------+-------------+-------------+-------------+--+
|  db_name   |    comment    |                      location                       | owner_name  | owner_type  | parameters  |
+------------+---------------+-----------------------------------------------------+-------------+-------------+-------------+--+
| hive_test  | this id test  | hdfs://Hbase:9000/user/hive/warehouse/hive_test.db  | root        | USER        |             |
+------------+---------------+-----------------------------------------------------+-------------+-------------+-------------+--+
1 row selected (0.388 seconds)

使用extended

0: jdbc:hive2://Hbase:10000> desc database extended hive_test;
+------------+---------------+-----------------------------------------------------+-------------+-------------+-----------------+--+
|  db_name   |    comment    |                      location                       | owner_name  | owner_type  |   parameters    |
+------------+---------------+-----------------------------------------------------+-------------+-------------+-----------------+--+
| hive_test  | this id test  | hdfs://Hbase:9000/user/hive/warehouse/hive_test.db  | root        | USER        | {create=tangc}  |
+------------+---------------+-----------------------------------------------------+-------------+-------------+-----------------+--+
1 row selected (0.093 seconds)

5.删除数据库
RESTRICT:默认行为,如果存在数据库中表,则删除失败。
CASCADE:级联删除。

DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];
0: jdbc:hive2://Hbase:10000> drop database if exists hive_test;
No rows affected (0.371 seconds)

6.查看当前数据库

0: jdbc:hive2://Hbase:10000> select current_database();
+-------+--+
|  _c0  |
+-------+--+
| test  |
+-------+--+
1 row selected (0.61 seconds)

Table

创建表

  • 管理表

是内部表,也称为MANAGED_TABLE;默认存储在/user/hive/warehouse下,也可以通过location指定;删除表时,会删除表数据以及元数据

create table if not exists t_user(
  id int,
  name string,
  sex boolean,
  age int,
  salary double,
  hobbies array<string>,
  card map<string,string>,
  address struct<country:string,city:string>
)
row format delimited
fields terminated by ','
collection items terminated by '|'
map keys terminated by '>'
lines terminated by '\n'
stored as textfile;

Hive常规操作(DDL、DML)

  • 外部表
    外部表称之为EXTERNAL_TABLE;在创建表时可以自己指定目录位置(LOCATION);删除表时,只会删除元数据不会删除表数据;
create external table if not exists t_access(
     ip string,
     app varchar(32),
     service string,
     last_time timestamp
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex"="^(.*)\\s(.*)\\s(.*)\\s(.*\\s.*)"
)
LOCATION '/hive/t_access';

Hive常规操作(DDL、DML)

  • 分区表
    Hive中的表对应为HDFS上的指定目录,在查询数据时候,默认会对全表进行扫描,这样时间和性能的消耗都非常大。分区为HDFS上表目录的子目录,数据按照分区存储在子目录中。如果查询的where字句的中包含分区条件,则直接从该分区去查找,而不是扫描整个表目录,合理的分区设计可以极大提高查询速度和性能。这里说明一下分区表并非Hive独有的概念,实际上这个概念非常常见。比如在我们常用的Oracle数据库中,当表中的数据量不断增大,查询数据的速度就会下降,这时也可以对表进行分区。表进行分区后,逻辑上表仍然是一张完整的表,只是将表中的数据存放到多个表空间(物理文件上),这样查询数据时,就不必要每次都扫描整张表,从而提升查询性能。在Hive中可以使用PARTITIONED BY子句创建分区表。表可以包含一个或多个分区列,程序会为分区列中的每个不同值组合创建单独的数据目录。
7369	SMITH	CLERK	7902	1980-12-17 00:00:00	800.00
7499	ALLEN	SALESMAN	7698	1981-02-20 00:00:00	1600.00	300.00
7521	WARD	SALESMAN	7698	1981-02-22 00:00:00	1250.00	500.00
7566	JONES	MANAGER	7839	1981-04-02 00:00:00	2975.00
7654	MARTIN	SALESMAN	7698	1981-09-28 00:00:00	1250.00	1400.00
7698	BLAKE	MANAGER	7839	1981-05-01 00:00:00	2850.00
7782	CLARK	MANAGER	7839	1981-06-09 00:00:00	2450.00
7788	SCOTT	ANALYST	7566	1987-04-19 00:00:00	1500.00
7839	KING	PRESIDENT		1981-11-17 00:00:00	5000.00
7844	TURNER	SALESMAN	7698	1981-09-08 00:00:00	1500.00	0.00
7876	ADAMS	CLERK	7788	1987-05-23 00:00:00	1100.00
7900	JAMES	CLERK	7698	1981-12-03 00:00:00	950.00
7902	FORD	ANALYST	7566	1981-12-03 00:00:00	3000.00
7934	MILLER	CLERK	7782	1982-01-23 00:00:00	1300.00

此表中有7个字段,包含deptno部门。

 CREATE EXTERNAL TABLE t_employee(
    id INT,
    name STRING,
    job STRING,
    manager INT,
    hiredate TIMESTAMP,
    salary DECIMAL(7,2)
    )
    PARTITIONED BY (deptno INT)   
    ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
    LOCATION '/hive/t_employee';
0: jdbc:hive2://Hbase:10000> select id ,name,job,manager,salary,deptno from t_employee;
+-------+---------+------------+----------+---------+---------+--+
|  id   |  name   |    job     | manager  | salary  | deptno  |
+-------+---------+------------+----------+---------+---------+--+
| 7369  | SMITH   | CLERK      | 7902     | 800     | 10      |
| 7499  | ALLEN   | SALESMAN   | 7698     | 1600    | 10      |
| 7521  | WARD    | SALESMAN   | 7698     | 1250    | 10      |
| 7566  | JONES   | MANAGER    | 7839     | 2975    | 10      |
| 7654  | MARTIN  | SALESMAN   | 7698     | 1250    | 10      |
| 7698  | BLAKE   | MANAGER    | 7839     | 2850    | 10      |
| 7782  | CLARK   | MANAGER    | 7839     | 2450    | 10      |
| 7788  | SCOTT   | ANALYST    | 7566     | 1500    | 10      |
| 7839  | KING    | PRESIDENT  | NULL     | 5000    | 10      |
| 7844  | TURNER  | SALESMAN   | 7698     | 1500    | 10      |
| 7876  | ADAMS   | CLERK      | 7788     | 1100    | 10      |
| 7900  | JAMES   | CLERK      | 7698     | 950     | 10      |
| 7902  | FORD    | ANALYST    | 7566     | 3000    | 10      |
| 7934  | MILLER  | CLERK      | 7782     | 1300    | 10      |
+-------+---------+------------+----------+---------+---------+--+
14 rows selected (0.57 seconds)

Hive常规操作(DDL、DML)

  • 分桶表
    分区表是为了将文件按照分区文件夹进行粗粒度文件隔离,但是分桶表是将数据按照某个字段进行hash计算出所属的桶,然后在对桶内的数据进行排序。
 CREATE EXTERNAL TABLE t_employee_bucket(
    id INT,
    name STRING,
    job STRING,
    manager INT,
    hiredate TIMESTAMP,
    salary DECIMAL(7,2),
    deptno INT)
    CLUSTERED BY(id) SORTED BY(salary ASC) INTO 4 BUCKETS  
    ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
    LOCATION '/hive/employee_bucket';
set hive.enforce.bucketing = true; //让hive强制使用分桶
//将部门id为10号的数据插入到t_employee_bucket表中
INSERT INTO TABLE t_employee_bucket SELECT *  FROM t_employee WHERE deptno=10;

0: jdbc:hive2://Hbase:10000> select id,name,deptno from t_employee_bucket;
+-------+---------+---------+--+
|  id   |  name   | deptno  |
+-------+---------+---------+--+
| 7788  | SCOTT   | 10      |
| 7900  | JAMES   | 10      |
| 7876  | ADAMS   | 10      |
| 7844  | TURNER  | 10      |
| 7369  | SMITH   | 10      |
| 7521  | WARD    | 10      |
| 7934  | MILLER  | 10      |
| 7902  | FORD    | 10      |
| 7698  | BLAKE   | 10      |
| 7654  | MARTIN  | 10      |
| 7566  | JONES   | 10      |
| 7782  | CLARK   | 10      |
| 7499  | ALLEN   | 10      |
| 7839  | KING    | 10      |
+-------+---------+---------+--+
14 rows selected (0.313 seconds)

Hive常规操作(DDL、DML)

  • 临时表
    临时表仅对当前session可见,临时表的数据将存储在用户的暂存目录中,并在会话结束后删除。如果临时表与永久表表名相同,则对该表名的任何引用都将解析为临时表,而不是永久表。临时表还具有以下两个限制:不支持分区列;不支持创建索引.
  CREATE TEMPORARY TABLE if not exists emp_temp(
    id INT,
    name STRING,
    job STRING,
    manager INT,
    hiredate TIMESTAMP,
    salary DECIMAL(7,2),
    deptno INT
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
    LOCATION '/hive/emp_temp';

  • CTAS创建表
    创建的表结构和查询的信息类似,但是并不会拷贝表的结构
0: jdbc:hive2://CentOS:10000> create  TEMPORARY TABLE t_employee_copy1 as select * from t_employee;
  • 复制表结构
    仅仅是复制表的结构,但是不拷贝数据。
0: jdbc:hive2://CentOS:10000> CREATE TEMPORARY EXTERNAL TABLE  IF NOT EXISTS  t_empoyee_copy2  LIKE t_employee_bucket location '/hive/t_empoyee_copy2';
0: jdbc:hive2://Hbase:10000> select id,name,deptno from t_empoyee_copy2;
+-----+-------+---------+--+
| id  | name  | deptno  |
+-----+-------+---------+--+
+-----+-------+---------+--+
No rows selected (0.14 seconds)
0: jdbc:hive2://Hbase:10000> desc t_empoyee_copy2;
+-----------+---------------+----------+--+
| col_name  |   data_type   | comment  |
+-----------+---------------+----------+--+
| id        | int           |          |
| name      | string        |          |
| job       | string        |          |
| manager   | int           |          |
| hiredate  | timestamp     |          |
| salary    | decimal(7,2)  |          |
| deptno    | int           |          |
+-----------+---------------+----------+--+
7 rows selected (0.106 seconds)

修改表

  • 重命名表
0: jdbc:hive2://CentOS:10000> ALTER TABLE t_user RENAME TO t_u;
0: jdbc:hive2://Hbase:10000> show tables;
+--------------------+--+
|      tab_name      |
+--------------------+--+
| t_access           |
| t_employee         |
| t_employee_bucket  |
| t_employee_copy1   |
| t_empoyee_copy2    |
| t_u                |
+--------------------+--+
6 rows selected (0.076 seconds)

  • 修改列(修改类型、顺序、新增)
0: jdbc:hive2://CentOS:10000> ALTER TABLE t_employee CHANGE id  eid INT;--修改列名&类型
0: jdbc:hive2://CentOS:10000> ALTER TABLE t_employee CHANGE eid id decimal(7,2)  AFTER name;--修改顺序
0: jdbc:hive2://CentOS:10000> ALTER TABLE t_employee ADD COLUMNS (address STRING);

清空表

0: jdbc:hive2://CentOS:10000> truncate table t_employee partition(deptno=10);
只能截断managed-table

删除

0: jdbc:hive2://CentOS:10000> drop table t_employee PURGE;
PURGE表示数据会直接删除,不会放置在垃圾箱中
0: jdbc:hive2://Hbase:10000> drop table t_empoyee_copy2 ;
No rows affected (0.024 seconds)
0: jdbc:hive2://Hbase:10000> show tables;
+--------------------+--+
|      tab_name      |
+--------------------+--+
| t_access           |
| t_employee         |
| t_employee_bucket  |
| t_employee_copy1   |
| t_u                |
+--------------------+--+
5 rows selected (0.098 seconds)

删除分区

0: jdbc:hive2://Hbase:10000> show partitions t_employee;
+------------+--+
| partition  |
+------------+--+
| deptno=10  |
| deptno=20  |
+------------+--+
2 rows selected (0.241 seconds)
0: jdbc:hive2://Hbase:10000> alter table t_employee drop partition(deptno=10);
INFO  : Dropped the partition deptno=10
No rows affected (0.538 seconds)
0: jdbc:hive2://Hbase:10000> show partitions t_employee;
+------------+--+
| partition  |
+------------+--+
| deptno=20  |
+------------+--+
1 row selected (0.174 seconds)

其他命令

Describe

  • 查看数据库
DESCRIBE|DESC DATABASE [EXTENDED] db_name; 
0: jdbc:hive2://CentOS:10000> desc database hive_test;
+------------+--------------------+------------------------------------------------------+-------------+-------------+-------------+--+
|  db_name   |      comment       |                       location                       | owner_name  | owner_type  | parameters  |
+------------+--------------------+------------------------------------------------------+-------------+-------------+-------------+--+
| hive_test  | database for test  | hdfs://CentOS:9000/user/hive/warehouse/hive_test.db  | root        | USER        |             |
+------------+--------------------+------------------------------------------------------+-------------+-------------+-------------+--+
1 row selected (0.039 seconds)
  • 查看表
DESCRIBE|DESC [EXTENDED|FORMATTED] table_name 
0: jdbc:hive2://CentOS:10000> desc  t_user;
+-----------+-------------------------------------+----------+--+
| col_name  |              data_type              | comment  |
+-----------+-------------------------------------+----------+--+
| id        | int                                 |          |
| name      | string                              |          |
| sex       | boolean                             |          |
| age       | int                                 |          |
| salary    | double                              |          |
| hobbies   | array<string>                       |          |
| card      | map<string,string>                  |          |
| address   | struct<country:string,city:string>  |          |
+-----------+-------------------------------------+----------+--+
8 rows selected (0.06 seconds)

show

  • 查看数据库列表
SHOW (DATABASES|SCHEMAS) [LIKE 'identifier_with_wildcards'];
0: jdbc:hive2://CentOS:10000> show schemas like '*'
0: jdbc:hive2://CentOS:10000> ;
+----------------+--+
| database_name  |
+----------------+--+
| default        |
| hive_test      |
| test           |
+----------------+--+
3 rows selected (0.03 seconds)
  • 查看表的列表
SHOW TABLES [IN database_name] ['identifier_with_wildcards'];
0: jdbc:hive2://CentOS:10000> show tables;
+--------------------+--+
|      tab_name      |
+--------------------+--+
| t_access           |
| t_employ_copy      |
| t_employee_bucket  |
| t_employee_copy1   |
| t_user             |
+--------------------+--+
5 rows selected (0.054 seconds)
0: jdbc:hive2://CentOS:10000> show tables in test;
+-------------+--+
|  tab_name   |
+-------------+--+
| t_access    |
| t_employee  |
| t_product   |
| t_student   |
| t_user      |
+-------------+--+
5 rows selected (0.043 seconds)
0: jdbc:hive2://CentOS:10000> show tables in test like 't_*';
+-------------+--+
|  tab_name   |
+-------------+--+
| t_access    |
| t_employee  |
| t_product   |
| t_student   |
| t_user      |
+-------------+--+
5 rows selected (0.04 seconds)
  • 查看分区
0: jdbc:hive2://CentOS:10000> show partitions t_employee;
+------------+--+
| partition  |
+------------+--+
| deptno=10  |
+------------+--+
1 row selected (0.065 seconds)
  • 查看建表语句
0: jdbc:hive2://CentOS:10000> show create table t_employee;
+-----------------------------------------------------------------+--+
|                         createtab_stmt                          |
+-----------------------------------------------------------------+--+
| CREATE EXTERNAL TABLE `t_employee`(                             |
|   `id` int,                                                     |
|   `name` string,                                                |
|   `job` string,                                                 |
|   `manager` int,                                                |
|   `hiredate` timestamp,                                         |
|   `salary` decimal(7,2))                                        |
| PARTITIONED BY (                                                |
|   `deptno` int)                                                 |
| ROW FORMAT DELIMITED                                            |
|   FIELDS TERMINATED BY '\t'                                     |
| STORED AS INPUTFORMAT                                           |
|   'org.apache.hadoop.mapred.TextInputFormat'                    |
| OUTPUTFORMAT                                                    |
|   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'  |
| LOCATION                                                        |
|   'hdfs://CentOS:9000/hive/t_employee'                          |
| TBLPROPERTIES (                                                 |
|   'transient_lastDdlTime'='1576961129')                         |
+-----------------------------------------------------------------+--+
19 rows selected (0.117 seconds)