Hive常规操作(DDL、DML)
程序员文章站
2022-07-13 14:12:27
...
DDL
DataBase
1.查看数据库
0: jdbc:hive2://Hbase:10000> show databases;
+----------------+--+
| database_name |
+----------------+--+
| default |
| test |
+----------------+--+
2 rows selected (2.96 seconds)
2.使用数据库
0: jdbc:hive2://Hbase:10000> use test;
No rows affected (0.127 seconds)
3.新建数据库
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name --DATABASE|SCHEMA 是等价的
[COMMENT database_comment] --数据库注释
[LOCATION hdfs_path] --存储在 HDFS 上的位置
[WITH DBPROPERTIES (property_name=property_value, ...)]; --指定额外属性
0: jdbc:hive2://Hbase:10000> create database if not exists hive_test comment "this id test" WITH DBPROPERTIES("create"="tangc");
No rows affected (0.406 seconds)
4.查看数据库信息
DESC DATABASE [EXTENDED] db_name; --EXTENDED 表示是否显示额外属性
0: jdbc:hive2://Hbase:10000> desc database hive_test;
+------------+---------------+-----------------------------------------------------+-------------+-------------+-------------+--+
| db_name | comment | location | owner_name | owner_type | parameters |
+------------+---------------+-----------------------------------------------------+-------------+-------------+-------------+--+
| hive_test | this id test | hdfs://Hbase:9000/user/hive/warehouse/hive_test.db | root | USER | |
+------------+---------------+-----------------------------------------------------+-------------+-------------+-------------+--+
1 row selected (0.388 seconds)
使用extended
0: jdbc:hive2://Hbase:10000> desc database extended hive_test;
+------------+---------------+-----------------------------------------------------+-------------+-------------+-----------------+--+
| db_name | comment | location | owner_name | owner_type | parameters |
+------------+---------------+-----------------------------------------------------+-------------+-------------+-----------------+--+
| hive_test | this id test | hdfs://Hbase:9000/user/hive/warehouse/hive_test.db | root | USER | {create=tangc} |
+------------+---------------+-----------------------------------------------------+-------------+-------------+-----------------+--+
1 row selected (0.093 seconds)
5.删除数据库
RESTRICT:默认行为,如果存在数据库中表,则删除失败。
CASCADE:级联删除。
DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];
0: jdbc:hive2://Hbase:10000> drop database if exists hive_test;
No rows affected (0.371 seconds)
6.查看当前数据库
0: jdbc:hive2://Hbase:10000> select current_database();
+-------+--+
| _c0 |
+-------+--+
| test |
+-------+--+
1 row selected (0.61 seconds)
Table
创建表
- 管理表
是内部表,也称为MANAGED_TABLE;默认存储在/user/hive/warehouse下,也可以通过location指定;删除表时,会删除表数据以及元数据;
create table if not exists t_user(
id int,
name string,
sex boolean,
age int,
salary double,
hobbies array<string>,
card map<string,string>,
address struct<country:string,city:string>
)
row format delimited
fields terminated by ','
collection items terminated by '|'
map keys terminated by '>'
lines terminated by '\n'
stored as textfile;
- 外部表
外部表称之为EXTERNAL_TABLE;在创建表时可以自己指定目录位置(LOCATION);删除表时,只会删除元数据不会删除表数据;
create external table if not exists t_access(
ip string,
app varchar(32),
service string,
last_time timestamp
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex"="^(.*)\\s(.*)\\s(.*)\\s(.*\\s.*)"
)
LOCATION '/hive/t_access';
- 分区表
Hive中的表对应为HDFS上的指定目录,在查询数据时候,默认会对全表进行扫描,这样时间和性能的消耗都非常大。分区为HDFS上表目录的子目录,数据按照分区存储在子目录中。如果查询的where字句的中包含分区条件,则直接从该分区去查找,而不是扫描整个表目录,合理的分区设计可以极大提高查询速度和性能。这里说明一下分区表并非Hive独有的概念,实际上这个概念非常常见。比如在我们常用的Oracle数据库中,当表中的数据量不断增大,查询数据的速度就会下降,这时也可以对表进行分区。表进行分区后,逻辑上表仍然是一张完整的表,只是将表中的数据存放到多个表空间(物理文件上),这样查询数据时,就不必要每次都扫描整张表,从而提升查询性能。在Hive中可以使用PARTITIONED BY子句创建分区表。表可以包含一个或多个分区列,程序会为分区列中的每个不同值组合创建单独的数据目录。
7369 SMITH CLERK 7902 1980-12-17 00:00:00 800.00
7499 ALLEN SALESMAN 7698 1981-02-20 00:00:00 1600.00 300.00
7521 WARD SALESMAN 7698 1981-02-22 00:00:00 1250.00 500.00
7566 JONES MANAGER 7839 1981-04-02 00:00:00 2975.00
7654 MARTIN SALESMAN 7698 1981-09-28 00:00:00 1250.00 1400.00
7698 BLAKE MANAGER 7839 1981-05-01 00:00:00 2850.00
7782 CLARK MANAGER 7839 1981-06-09 00:00:00 2450.00
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 1500.00
7839 KING PRESIDENT 1981-11-17 00:00:00 5000.00
7844 TURNER SALESMAN 7698 1981-09-08 00:00:00 1500.00 0.00
7876 ADAMS CLERK 7788 1987-05-23 00:00:00 1100.00
7900 JAMES CLERK 7698 1981-12-03 00:00:00 950.00
7902 FORD ANALYST 7566 1981-12-03 00:00:00 3000.00
7934 MILLER CLERK 7782 1982-01-23 00:00:00 1300.00
此表中有7个字段,包含deptno部门。
CREATE EXTERNAL TABLE t_employee(
id INT,
name STRING,
job STRING,
manager INT,
hiredate TIMESTAMP,
salary DECIMAL(7,2)
)
PARTITIONED BY (deptno INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
LOCATION '/hive/t_employee';
0: jdbc:hive2://Hbase:10000> select id ,name,job,manager,salary,deptno from t_employee;
+-------+---------+------------+----------+---------+---------+--+
| id | name | job | manager | salary | deptno |
+-------+---------+------------+----------+---------+---------+--+
| 7369 | SMITH | CLERK | 7902 | 800 | 10 |
| 7499 | ALLEN | SALESMAN | 7698 | 1600 | 10 |
| 7521 | WARD | SALESMAN | 7698 | 1250 | 10 |
| 7566 | JONES | MANAGER | 7839 | 2975 | 10 |
| 7654 | MARTIN | SALESMAN | 7698 | 1250 | 10 |
| 7698 | BLAKE | MANAGER | 7839 | 2850 | 10 |
| 7782 | CLARK | MANAGER | 7839 | 2450 | 10 |
| 7788 | SCOTT | ANALYST | 7566 | 1500 | 10 |
| 7839 | KING | PRESIDENT | NULL | 5000 | 10 |
| 7844 | TURNER | SALESMAN | 7698 | 1500 | 10 |
| 7876 | ADAMS | CLERK | 7788 | 1100 | 10 |
| 7900 | JAMES | CLERK | 7698 | 950 | 10 |
| 7902 | FORD | ANALYST | 7566 | 3000 | 10 |
| 7934 | MILLER | CLERK | 7782 | 1300 | 10 |
+-------+---------+------------+----------+---------+---------+--+
14 rows selected (0.57 seconds)
- 分桶表
分区表是为了将文件按照分区文件夹进行粗粒度文件隔离,但是分桶表是将数据按照某个字段进行hash计算出所属的桶,然后在对桶内的数据进行排序。
CREATE EXTERNAL TABLE t_employee_bucket(
id INT,
name STRING,
job STRING,
manager INT,
hiredate TIMESTAMP,
salary DECIMAL(7,2),
deptno INT)
CLUSTERED BY(id) SORTED BY(salary ASC) INTO 4 BUCKETS
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
LOCATION '/hive/employee_bucket';
set hive.enforce.bucketing = true; //让hive强制使用分桶
//将部门id为10号的数据插入到t_employee_bucket表中
INSERT INTO TABLE t_employee_bucket SELECT * FROM t_employee WHERE deptno=10;
0: jdbc:hive2://Hbase:10000> select id,name,deptno from t_employee_bucket;
+-------+---------+---------+--+
| id | name | deptno |
+-------+---------+---------+--+
| 7788 | SCOTT | 10 |
| 7900 | JAMES | 10 |
| 7876 | ADAMS | 10 |
| 7844 | TURNER | 10 |
| 7369 | SMITH | 10 |
| 7521 | WARD | 10 |
| 7934 | MILLER | 10 |
| 7902 | FORD | 10 |
| 7698 | BLAKE | 10 |
| 7654 | MARTIN | 10 |
| 7566 | JONES | 10 |
| 7782 | CLARK | 10 |
| 7499 | ALLEN | 10 |
| 7839 | KING | 10 |
+-------+---------+---------+--+
14 rows selected (0.313 seconds)
- 临时表
临时表仅对当前session可见,临时表的数据将存储在用户的暂存目录中,并在会话结束后删除。如果临时表与永久表表名相同,则对该表名的任何引用都将解析为临时表,而不是永久表。临时表还具有以下两个限制:不支持分区列;不支持创建索引.
CREATE TEMPORARY TABLE if not exists emp_temp(
id INT,
name STRING,
job STRING,
manager INT,
hiredate TIMESTAMP,
salary DECIMAL(7,2),
deptno INT
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
LOCATION '/hive/emp_temp';
- CTAS创建表
创建的表结构和查询的信息类似,但是并不会拷贝表的结构
0: jdbc:hive2://CentOS:10000> create TEMPORARY TABLE t_employee_copy1 as select * from t_employee;
- 复制表结构
仅仅是复制表的结构,但是不拷贝数据。
0: jdbc:hive2://CentOS:10000> CREATE TEMPORARY EXTERNAL TABLE IF NOT EXISTS t_empoyee_copy2 LIKE t_employee_bucket location '/hive/t_empoyee_copy2';
0: jdbc:hive2://Hbase:10000> select id,name,deptno from t_empoyee_copy2;
+-----+-------+---------+--+
| id | name | deptno |
+-----+-------+---------+--+
+-----+-------+---------+--+
No rows selected (0.14 seconds)
0: jdbc:hive2://Hbase:10000> desc t_empoyee_copy2;
+-----------+---------------+----------+--+
| col_name | data_type | comment |
+-----------+---------------+----------+--+
| id | int | |
| name | string | |
| job | string | |
| manager | int | |
| hiredate | timestamp | |
| salary | decimal(7,2) | |
| deptno | int | |
+-----------+---------------+----------+--+
7 rows selected (0.106 seconds)
修改表
- 重命名表
0: jdbc:hive2://CentOS:10000> ALTER TABLE t_user RENAME TO t_u;
0: jdbc:hive2://Hbase:10000> show tables;
+--------------------+--+
| tab_name |
+--------------------+--+
| t_access |
| t_employee |
| t_employee_bucket |
| t_employee_copy1 |
| t_empoyee_copy2 |
| t_u |
+--------------------+--+
6 rows selected (0.076 seconds)
- 修改列(修改类型、顺序、新增)
0: jdbc:hive2://CentOS:10000> ALTER TABLE t_employee CHANGE id eid INT;--修改列名&类型
0: jdbc:hive2://CentOS:10000> ALTER TABLE t_employee CHANGE eid id decimal(7,2) AFTER name;--修改顺序
0: jdbc:hive2://CentOS:10000> ALTER TABLE t_employee ADD COLUMNS (address STRING);
清空表
0: jdbc:hive2://CentOS:10000> truncate table t_employee partition(deptno=10);
只能截断managed-table
删除
0: jdbc:hive2://CentOS:10000> drop table t_employee PURGE;
PURGE表示数据会直接删除,不会放置在垃圾箱中
0: jdbc:hive2://Hbase:10000> drop table t_empoyee_copy2 ;
No rows affected (0.024 seconds)
0: jdbc:hive2://Hbase:10000> show tables;
+--------------------+--+
| tab_name |
+--------------------+--+
| t_access |
| t_employee |
| t_employee_bucket |
| t_employee_copy1 |
| t_u |
+--------------------+--+
5 rows selected (0.098 seconds)
删除分区
0: jdbc:hive2://Hbase:10000> show partitions t_employee;
+------------+--+
| partition |
+------------+--+
| deptno=10 |
| deptno=20 |
+------------+--+
2 rows selected (0.241 seconds)
0: jdbc:hive2://Hbase:10000> alter table t_employee drop partition(deptno=10);
INFO : Dropped the partition deptno=10
No rows affected (0.538 seconds)
0: jdbc:hive2://Hbase:10000> show partitions t_employee;
+------------+--+
| partition |
+------------+--+
| deptno=20 |
+------------+--+
1 row selected (0.174 seconds)
其他命令
Describe
- 查看数据库
DESCRIBE|DESC DATABASE [EXTENDED] db_name;
0: jdbc:hive2://CentOS:10000> desc database hive_test;
+------------+--------------------+------------------------------------------------------+-------------+-------------+-------------+--+
| db_name | comment | location | owner_name | owner_type | parameters |
+------------+--------------------+------------------------------------------------------+-------------+-------------+-------------+--+
| hive_test | database for test | hdfs://CentOS:9000/user/hive/warehouse/hive_test.db | root | USER | |
+------------+--------------------+------------------------------------------------------+-------------+-------------+-------------+--+
1 row selected (0.039 seconds)
- 查看表
DESCRIBE|DESC [EXTENDED|FORMATTED] table_name
0: jdbc:hive2://CentOS:10000> desc t_user;
+-----------+-------------------------------------+----------+--+
| col_name | data_type | comment |
+-----------+-------------------------------------+----------+--+
| id | int | |
| name | string | |
| sex | boolean | |
| age | int | |
| salary | double | |
| hobbies | array<string> | |
| card | map<string,string> | |
| address | struct<country:string,city:string> | |
+-----------+-------------------------------------+----------+--+
8 rows selected (0.06 seconds)
show
- 查看数据库列表
SHOW (DATABASES|SCHEMAS) [LIKE 'identifier_with_wildcards'];
0: jdbc:hive2://CentOS:10000> show schemas like '*'
0: jdbc:hive2://CentOS:10000> ;
+----------------+--+
| database_name |
+----------------+--+
| default |
| hive_test |
| test |
+----------------+--+
3 rows selected (0.03 seconds)
- 查看表的列表
SHOW TABLES [IN database_name] ['identifier_with_wildcards'];
0: jdbc:hive2://CentOS:10000> show tables;
+--------------------+--+
| tab_name |
+--------------------+--+
| t_access |
| t_employ_copy |
| t_employee_bucket |
| t_employee_copy1 |
| t_user |
+--------------------+--+
5 rows selected (0.054 seconds)
0: jdbc:hive2://CentOS:10000> show tables in test;
+-------------+--+
| tab_name |
+-------------+--+
| t_access |
| t_employee |
| t_product |
| t_student |
| t_user |
+-------------+--+
5 rows selected (0.043 seconds)
0: jdbc:hive2://CentOS:10000> show tables in test like 't_*';
+-------------+--+
| tab_name |
+-------------+--+
| t_access |
| t_employee |
| t_product |
| t_student |
| t_user |
+-------------+--+
5 rows selected (0.04 seconds)
- 查看分区
0: jdbc:hive2://CentOS:10000> show partitions t_employee;
+------------+--+
| partition |
+------------+--+
| deptno=10 |
+------------+--+
1 row selected (0.065 seconds)
- 查看建表语句
0: jdbc:hive2://CentOS:10000> show create table t_employee;
+-----------------------------------------------------------------+--+
| createtab_stmt |
+-----------------------------------------------------------------+--+
| CREATE EXTERNAL TABLE `t_employee`( |
| `id` int, |
| `name` string, |
| `job` string, |
| `manager` int, |
| `hiredate` timestamp, |
| `salary` decimal(7,2)) |
| PARTITIONED BY ( |
| `deptno` int) |
| ROW FORMAT DELIMITED |
| FIELDS TERMINATED BY '\t' |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.mapred.TextInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION |
| 'hdfs://CentOS:9000/hive/t_employee' |
| TBLPROPERTIES ( |
| 'transient_lastDdlTime'='1576961129') |
+-----------------------------------------------------------------+--+
19 rows selected (0.117 seconds)
上一篇: hive:动态分区与静态分区