欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

动态分区表--严格模式处理===&&===导出数据到本地或HDFS

程序员文章站 2023-12-21 21:21:52
...

动态分区表--严格模式处理======导出hive中的到数据到本地或HDFS


hive (default)> drop database hive1 cascade;
OK
Time taken: 2.025 seconds
hive (default)> create database hive1;
OK
Time taken: 0.188 seconds

创建分区表

hive (default)> create EXTERNAL table hive1.test1(id int,name string,age int) partitioned by (province string,city string) row format delimited fields TERMINATED by '\t' lines TERMINATED by '\n' stored as textfile;
OK
Time taken: 0.186 seconds

hive (default)> desc formatted hive1.test1;        //查看表的信息
OK
col_name	data_type	comment
# col_name            	data_type           	comment             
	 	 
id                  	int                 	                    
name                	string              	                    
age                 	int                 	                    
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
	 	 
province            	string              	                    
city                	string              	                    
	 	 
# Detailed Table Information	 	 
Database:           	hive1               	 
Owner:              	hadoop              	 
CreateTime:         	Sat Jun 30 13:56:19 CST 2018	 
LastAccessTime:     	UNKNOWN             	 
Retention:          	0                   	 

手动添加分区:

hive (default)> alter table hive1.test1 add partition(province='henan',city='zhengzhou');
OK
Time taken: 0.197 seconds


数据准备

employees.txt 
1	tmo1	11	
2	tom2	12
3	tom3	13
cat employees2.txt 
4	tmo1	14
5	tom2	15
6	tom3	16

上传数据到指定的目录

hive (default)> dfs -put /home/hadoop/employees.txt/ /user/hive/warehouse/hive1.db/test1/province=henan/city=zhengzhou; 
hive (default)> dfs lsr /user/hive/warehouse/hive1.db/            //查看目录结构
              > ;
lsr: Unknown command
Did you mean -lsr?  This command begins with a dash.
Command lsr /user/hive/warehouse/hive1.db/ failed with exit code = -1
Query returned non-zero code: -1, cause: null
hive (default)> dfs -lsr /user/hive/warehouse/hive1.db/;
lsr: DEPRECATED: Please use 'ls -R' instead.
drwxrwxrwx   - hadoop supergroup          0 2018-06-30 13:59 /user/hive/warehouse/hive1.db/test1
drwxrwxrwx   - hadoop supergroup          0 2018-06-30 13:59 /user/hive/warehouse/hive1.db/test1/province=henan
drwxrwxrwx   - hadoop supergroup          0 2018-06-30 14:02 /user/hive/warehouse/hive1.db/test1/province=henan/city=zhengzhou
-rw-r--r--   3 hadoop supergroup         32 2018-06-30 14:02 /user/hive/warehouse/hive1.db/test1/province=henan/city=zhengzhou/employees.txt
hive (default)> use hive1;
OK
Time taken: 0.028 seconds
hive (hive1)> show tables;
OK
tab_name
test1
Time taken: 0.044 seconds, Fetched: 1 row(s)
hive (hive1)> select * from test1;
OK
test1.id	test1.name	test1.age	test1.province	test1.city
1	tmo1	11	henan	zhengzhou
2	tom2	12	henan	zhengzhou
3	tom3	13	henan	zhengzhou
NULL	NULL	NULL	henan	zhengzhou
Time taken: 0.947 seconds, Fetched: 4 row(s)
hive (hive1)> select * from test2 where city='zhengzhou';
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'test2'
hive (hive1)> select * from test1 where city='zhengzhou';
OK
test1.id	test1.name	test1.age	test1.province	test1.city
1	tmo1	11	henan	zhengzhou
2	tom2	12	henan	zhengzhou
3	tom3	13	henan	zhengzhou
NULL	NULL	NULL	henan	zhengzhou
Time taken: 0.621 seconds, Fetched: 4 row(s)
hive (hive1)> create table test2(id int,name string,age int);
OK
Time taken: 0.117 seconds

创建表test2

hive (hive1)> create table test2(id int,name string,age int) row format delimited fields TERMINATED by '\t' lines TERMINATED by '\n';
OK
Time taken: 0.load data local inpath '/home/hadoop/shuju' into table test2;
Loading data to table hive1.test2
OK
Time taken: 0.275 seconds
hive (hive1)> select * from test2;
OK
test2.id	test2.name	test2.age
7	tmo1	17
8	tom2	18
Time taken: 0.084 seconds, Fetched: 2 row(s)


添加分区
hive (hive1)> alter table test1 add partition(province='henan',city='shangqiu');
OK
Time taken: 0.12 seconds
hive (hive1)> insert overwrite table test1 partition(province='henan',city='shangqiu') select id,name,age from test2;        //插入数据到分区表
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20180630141009_7ce44be7-e8f7-4646-abf2-5836fcba59d2
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1530334968475_0001, Tracking URL = http://master:8088/proxy/application_1530334968475_0001/
Kill Command = /usr/local/soft/hadoop-2.7.3/bin/hadoop job  -kill job_1530334968475_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-06-30 14:10:22,868 Stage-1 map = 0%,  reduce = 0%
2018-06-30 14:10:30,644 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.51 sec
MapReduce Total cumulative CPU time: 1 seconds 510 msec
Ended Job = job_1530334968475_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://master:9000/user/hive/warehouse/hive1.db/test1/province=henan/city=shangqiu/.hive-staging_hive_2018-06-30_14-10-09_319_3195589670637289657-1/-ext-10000
Loading data to table hive1.test1 partition (province=henan, city=shangqiu)
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.51 sec   HDFS Read: 4002 HDFS Write: 116 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 510 msec
OK
id	name	age
Time taken: 24.046 seconds
hive (hive1)> select * from test1;        //查询数据
OK
test1.id	test1.name	test1.age	test1.province	test1.city
7	tmo1	17	henan	shangqiu
8	tom2	18	henan	shangqiu
1	tmo1	11	henan	zhengzhou
2	tom2	12	henan	zhengzhou
3	tom3	13	henan	zhengzhou

Time taken: 0.109 seconds, Fetched: 6 row(s)

动态分区表

--------------------------------------------------------------

创建test3分区表

hive (hive1)> create EXTERNAL table hive1.test3(id int,name string,age int) partitioned by (province string,city string) row format delimited fields TERMINATED by '\t' lines TERMINATED by '\n' stored as textfile;
OK
Time taken: 0.071 seconds
hive (hive1)> insert into table test1 partition(province,city) select id,name,age, 'henan' as province,'kaifeng' as city from test2;
FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict
hive (hive1)> set hive.exec.dynamic.partition.mode=nonstrict;        //关闭动态分区的严格模式 
 
使用分区动静结合  

hive (hive1)> insert into table test1 partition(province,city) select id,name,age, 'henan' as province,'kaifeng' as city from test2;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Query ID = hadoop_20180630141625_7604e89a-633d-46a7-b46c-a99dca4e5ce3Total jobs = 3Launching Job 1 out of 3Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1530334968475_0002, Tracking URL = http://master:8088/proxy/application_1530334968475_0002/Kill Command = /usr/local/soft/hadoop-2.7.3/bin/hadoop job -kill job_1530334968475_0002Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02018-06-30 14:16:38,580 Stage-1 map = 0%, reduce = 0%2018-06-30 14:16:45,551 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.27 secMapReduce Total cumulative CPU time: 1 seconds 270 msecEnded Job = job_1530334968475_0002Stage-4 is selected by condition resolver.Stage-3 is filtered out by condition resolver.Stage-5 is filtered out by condition resolver.Moving data to directory hdfs://master:9000/user/hive/warehouse/hive1.db/test1/.hive-staging_hive_2018-06-30_14-16-25_833_3707515515591270176-1/-ext-10000Loading data to table hive1.test1 partition (province=null, city=null)Loaded : 1/1 partitions. Time taken to load dynamic partitions: 0.278 seconds Time taken for adding to write entity : 0.0 secondsMapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 1.27 sec HDFS Read: 4462 HDFS Write: 115 SUCCESSTotal MapReduce CPU Time Spent: 1 seconds 270 msecOKid	name	age	province	cityTime taken: 22.483 secondshive (hive1)> select * from test1;OKtest1.id	test1.name	test1.age	test1.province	test1.city7	tmo1 17	henan	kaifeng8	tom2	18	henan	kaifeng7	tmo1	17	henan	shangqiu8	tom2	18	henan	shangqiu1	tmo1	11	henan zhengzhou2	tom2	12	henan	zhengzhou3	tom3	13	henan	zhengzhouTime taken: 0.155 seconds, Fetched: 8 row(s)

查询期间,动态创建表,并将数据写入创建的表中

hive (hive1)> create table test4 as select id,name from test1 where province='henan' and city='shangqiu'; 
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20180630143212_1b601b4f-d67e-41d8-be41-3e7ba1524821
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1530334968475_0003, Tracking URL = http://master:8088/proxy/application_1530334968475_0003/
Kill Command = /usr/local/soft/hadoop-2.7.3/bin/hadoop job  -kill job_1530334968475_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-06-30 14:32:20,758 Stage-1 map = 0%,  reduce = 0%
2018-06-30 14:32:29,396 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.33 sec
MapReduce Total cumulative CPU time: 1 seconds 330 msec
Ended Job = job_1530334968475_0003
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://master:9000/user/hive/warehouse/hive1.db/.hive-staging_hive_2018-06-30_14-32-12_161_5851638588673323325-1/-ext-10002
Moving data to directory hdfs://master:9000/user/hive/warehouse/hive1.db/test4
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.33 sec   HDFS Read: 3917 HDFS Write: 81 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 330 msec
OK
id	name
Time taken: 18.579 seconds
hive (hive1)> select * from test4;
OK
test4.id	test4.name
7	tmo1
8	tom2
Time taken: 0.086 seconds, Fetched: 2 row(s)
导出HIVE数据到本地目录(下载)
hive (hive1)> insert overwrite local directory '/home/hadoop/hive/' select * from test1;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20180630144113_8db441cb-c72c-4976-ac90-acae880cd9cb
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1530334968475_0005, Tracking URL = http://master:8088/proxy/application_1530334968475_0005/
Kill Command = /usr/local/soft/hadoop-2.7.3/bin/hadoop job  -kill job_1530334968475_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-06-30 14:41:25,765 Stage-1 map = 0%,  reduce = 0%
2018-06-30 14:41:33,514 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.23 sec
MapReduce Total cumulative CPU time: 1 seconds 230 msec
Ended Job = job_1530334968475_0005
Moving data to local directory /home/hadoop/hive
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.23 sec   HDFS Read: 5085 HDFS Write: 201 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 230 msec
OK
test1.id	test1.name	test1.age	test1.province	test1.city
Time taken: 22.286 seconds
hive (hive1)> !cat /home/hadoop/hive/000000_0;
7tmo117henankaifeng
8tom218henankaifeng
7tmo117henanshangqiu
8tom218henanshangqiu
1tmo111henanzhengzhou
2tom212henanzhengzhou
3tom313henanzhengzhou

导出hive数据到hdfs目录

hive (hive1)> insert overwrite directory 'hdfs://master:9000/data/' select * from test1;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20180630144453_85a3cfe9-caf5-4073-9a0d-5b5126ddfe79
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1530334968475_0006, Tracking URL = http://master:8088/proxy/application_1530334968475_0006/
Kill Command = /usr/local/soft/hadoop-2.7.3/bin/hadoop job  -kill job_1530334968475_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-06-30 14:45:03,065 Stage-1 map = 0%,  reduce = 0%
2018-06-30 14:45:10,919 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.95 sec
MapReduce Total cumulative CPU time: 1 seconds 950 msec
Ended Job = job_1530334968475_0006
Stage-3 is selected by condition resolver.
Stage-2 is filtered out by condition resolver.
Stage-4 is filtered out by condition resolver.
Moving data to directory hdfs://master:9000/data/.hive-staging_hive_2018-06-30_14-44-53_487_5200227823836548262-1/-ext-10000
Moving data to directory hdfs://master:9000/data
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.95 sec   HDFS Read: 4901 HDFS Write: 201 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 950 msec
OK
test1.id	test1.name	test1.age	test1.province	test1.city
Time taken: 19.608 seconds
hive (hive1)> dfs -cat /data/000000_0;
7tmo117henankaifeng
8tom218henankaifeng
7tmo117henanshangqiu
8tom218henanshangqiu
1tmo111henanzhengzhou
2tom212henanzhengzhou
3tom313henanzhengzhou
\N\N\Nhenanzhengzhou

导出hive数据到多级目录

hive (hive1)> from test1 t insert overwrite local directory '/home/hadoop/henan' select * where t.province='henan' insert overwrite local directory 'hpme/hadoop/beijing/' select * where t.province ='beijing';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20180630145601_5ee49ed3-8bf5-4eea-8c59-f4bf84b1a656
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1530334968475_0008, Tracking URL = http://master:8088/proxy/application_1530334968475_0008/
Kill Command = /usr/local/soft/hadoop-2.7.3/bin/hadoop job  -kill job_1530334968475_0008
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0
2018-06-30 14:56:10,066 Stage-2 map = 0%,  reduce = 0%
2018-06-30 14:56:18,620 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 2.42 sec
MapReduce Total cumulative CPU time: 2 seconds 420 msec
Ended Job = job_1530334968475_0008
Moving data to local directory /home/hadoop/henan
Moving data to local directory hpme/hadoop/beijing
MapReduce Jobs Launched: 
Stage-Stage-2: Map: 1   Cumulative CPU: 2.42 sec   HDFS Read: 7184 HDFS Write: 257 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 420 msec
OK
t.id	t.name	t.age	t.province	t.city
Time taken: 18.808 seconds
hive (hive1)> from test1 t insert overwrite local directory '/home/hadoop/henan' select * where t.province='henan' insert overwrite local directory 'home/hadoop/beijing/' select * where t.province ='beijing';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20180630145854_f1e87153-d1f4-411f-89f8-8cdbdae264f8
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1530334968475_0009, Tracking URL = http://master:8088/proxy/application_1530334968475_0009/
Kill Command = /usr/local/soft/hadoop-2.7.3/bin/hadoop job  -kill job_1530334968475_0009
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0
2018-06-30 14:59:03,479 Stage-2 map = 0%,  reduce = 0%
2018-06-30 14:59:10,915 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 1.53 sec
MapReduce Total cumulative CPU time: 1 seconds 530 msec
Ended Job = job_1530334968475_0009
Moving data to local directory /home/hadoop/henan
Moving data to local directory home/hadoop/beijing
MapReduce Jobs Launched: 
Stage-Stage-2: Map: 1   Cumulative CPU: 1.53 sec   HDFS Read: 7184 HDFS Write: 257 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 530 msec
OK
t.id	t.name	t.age	t.province	t.city
Time taken: 17.917 seconds
hive (hive1)> !cat /home/hadoop/home/hadoop/beijing/000000_0 ;
7tmo117beijing*
8tom218beijing*
hive (hive1)> !cat /home/hadoop/henan/000000_0 
            > ;
7tmo117henankaifeng
8tom218henankaifeng
7tmo117henanshangqiu
8tom218henanshangqiu
1tmo111henanzhengzhou
2tom212henanzhengzhou
3tom313henanzhengzhou




创建分区表

上一篇:

下一篇: