【Hive】Apache Hive系列之Hive中表的操作

程序员文章站 2022-07-14 14:44:42

...

指路牌

Hive支持 原始数据类型和 复杂类型。
原始类型包括 数值型、 Boolean、 字符串、 时间戳等。
复杂类型包括数组、 Map集合、 struct等。
下面是Hive数据类型的一个总结：

原始类型

类型	描述	字面量示例
BOOLEAN	布尔值，可选值true/false	true
TINYINT	1字节的有符号整数 -128~127	1Y
SMALLINT	2个字节的有符号整数，-32768~32767	1S
INT	4个字节的带符号整数	1
BIGINT	8字节带符号整数	1L
FLOAT	4字节单精度浮点数1.0
DOUBLE	8字节双精度浮点数	1.0
DEICIMAL	任意精度的带符号小数	1.0
STRING	字符串，变长	“Abc”
VARCHAR	变长字符串	“Aac”
CHAR	固定长度字符串	“a”,’b’
BINARY	字节数组
TIMESTAMP	时间戳，纳秒精度	122327493795
DATE	日期	‘2020-01-01

复杂类型

类型	描述	字面量示例
ARRAY	有序的的同类型的集合	array(1,2)
MAP	key-value,key必须为原始类型，value可以任意类型	map(‘a’,1,’b’,2)
STRUCT	字段集合,类型可以不同	struct(‘1’,1,1.0), named_stract(‘col1’,’1’,’col2’,1,’clo3’,1.0)

案例测试

①创建库

0: jdbc:hive2://CentOS:10000> create database test;
No rows affected (0.324 seconds)
0: jdbc:hive2://CentOS:10000> desc database test;

②使用库

0: jdbc:hive2://CentOS:10000> use test;
No rows affected (0.092 seconds)

# 查看当前使用的是哪个数据库
0: jdbc:hive2://CentOS:10000> select current_database();
+---------+--+
|   _c0   |
+---------+--+
| test    |
+---------+--+
1 row selected (0.877 seconds)

③创建表

create table if not exists employee (
  id int,
  name varchar(32),
  age int,
  salary double,
  birthDay date,
  hobbies array<string>,
  address struct<street:string,country:string>,
  detail map<string,double>
);

④查看所有表

0: jdbc:hive2://CentOS:10000> show tables;
+-------------+--+
|  tab_name   |
+-------------+--+
|  employee   |
+-------------+--+
1 row selected (0.105 seconds)

⑤查看表的建表详情

0: jdbc:hive2://CentOS:10000> desc employee;
+-----------+---------------------------------------+----------+--+
| col_name  |               data_type               | comment  |
+-----------+---------------------------------------+----------+--+
| id        | int                                   |          |
| name      | varchar(32)                           |          |
| age       | int                                   |          |
| salary    | double                                |          |
| birthday  | date                                  |          |
| hobbies   | array<string>                         |          |
| address   | struct<street:string,country:string>  |          |
| detail    | map<string,double>                    |          |
+-----------+---------------------------------------+----------+--+
8 rows selected (0.318 seconds)

如果用户需要看到更多建表信息可以使用desc formatted 表名查看

数据格式

默认分隔符

在创建完表之后，需要准备数据，其中hive默认的字段分割符号如下,因此上述建表等价写法如下：

分隔符	描述
\n	对于文本文件而言，一行表示一条文本记录，因此可以使用\n表示
^A(Ctrl+A)	用于分割字符的列，在创建表的时候可以使用八进制’\001’表示
^B(Ctrl+B)	用于分割ARRAY、STRUCT或者MAP中的元素，在创建表的时候可以使用八进制’\002’表示
^C(Ctrl+C)	用于分割MAP中的key,value，在创建表的时候可以使用八进制’\003’表示

create table if not exists employee_1 (
  id int,
  name varchar(32),
  age int,
  salary double,
  birthDay date,
  hobbies array<string>,
  address struct<street:string,country:string>,
  detail map<string,double>
)
row format delimited
fields terminated by '\001'
collection items terminated by '\002'
map keys terminated by '\003'
lines terminated by '\n'
stored as textfile;

按照如下数据格式输入数据，然后倒入到指定的表中:

1^Azhangsan^A18^A5000^A2020-10-10^ATV^BPlayGame^Bbasketball^Abeijing^BChina^A001^Cjianshe^B002^Cjiaotong

加载以上数据至hive
load data local inpath '本地路径' overwrite into table 表名

local-表示本地读取省略则表示从hdfs读取

overwrite - 表示覆盖表目录下的数据，如果去掉则保留原有的数据文件

0: jdbc:hive2://CentOS:10000> load data local inpath '/root/employee' overwrite into table employee_1;
INFO  : Loading data to table baizhi.employee_1 from file:/root/employee
INFO  : Table test.employee_1 stats: [numFiles=1, numRows=0, totalSize=175, rawDataSize=0]
No rows affected (0.622 seconds)

自定义分割符号

1,zhangsan,true,18,15000,TV|Game,001>建设|002>招商,china|bj 
2,lisi,true,28,15000,TV|Game,001>建设|002>招商,china|bj 
3,wangwu,false,38,5000,TV|Game,001>建设|002>招商,china|sh

create table if not exists user (
  id int,
  name varchar(32),
  sex boolean,
  age int,
  salary double,
  hobbies array<string>,
  card map<string,string>,
  address struct<street:string,country:string>
)
row format delimited
fields terminated by ','
collection items terminated by '|'
map keys terminated by '>'
lines terminated by '\n'
stored as textfile;

正则格式数据

正则格式检测网址：regex101.com

192.168.0.1 qq com.xx.xx.XxxService#xx 2018-10-10 10:10:00
192.168.2.1 qq com.xx.xx.XxxService#xx 2018-10-10 10:10:00
192.168.0.1 xx com.xx.xx.XxxService#xx 2018-10-10 10:10:00
192.168.202.1 qq com.xx.xx.XxxService#xx 2018-10-10 10:10:00

create table if not exists access(
     ip string,
     app varchar(32),
     service string,
     last_time timestamp
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex"="^(.*)\\s(.*)\\s(.*)\\s(.*\\s.*)"
);

CSV格式文件

1,apple,3,20.5
2,orange,2,21.5
3,pear,2,22.5
4,banana,1,23.0

CREATE TABLE if not exists product(
  id int,
  item string,
  count int,
  price double
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = ",",
   "escapeChar"    = "\\"
);

JSON格式文件

{"id":1,"name":"zhangsan","sex":true,"register":"1991-02-08","score":100.0}
{"id":2,"name":"lisi","sex":true,"register":"1991-02-08","score":80.0}

create table if not exists student1(
   id int,
   name varchar(32),
   sex boolean,
   register date,
   score double
)ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe';

需要添加jar文件 add jar /usr/apache-hive-1.2.2-bin/hcatalog/share/hcatalog/hive-hcatalog-core-1.2.2.jar，删除的时候使用delete指令.

上一篇： clojure 初试

下一篇： Hive的安装-Hive配置

【Hive】Apache Hive系列之Hive中表的操作

指路牌

原始类型

复杂类型

案例测试

数据格式

默认分隔符

自定义分割符号

正则格式数据

CSV格式文件

JSON格式文件

Hadoop学习(7)-hive的安装和命令行使用和java操作

Hive中导入Amazon S3中的分区表数据的操作

hive中将string数据转为bigint的操作

Hive的DDL操作

hive系列-1.hive里踩的坑

【Hive】Apache Hive系列之Hive中表的操作

Hive的数据类型和常用操作

Hadoop学习(7)-hive的安装和命令行使用和java操作

Hive的DML操作

hive 学习系列六 hive 去重办法的思考