欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

hive部分:hive建表的三种方式(直接建表,查询建表,like建表)

程序员文章站 2022-07-14 15:34:09
...

hive 创建表的三种方式

官网地址:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    -- (Note: TEMPORARY available in Hive 0.14.0 and later)
  [(col_name data_type [COMMENT col_comment], ...)]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [SKEWED BY (col_name, col_name, ...)                  -- (Note: Available in Hive 0.10.0 and later)]
     ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
     [STORED AS DIRECTORIES]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
     | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0 and later)
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]   -- (Note: Available in Hive 0.6.0 and later)
  [AS select_statement];   -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
 
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
  LIKE existing_table_or_view_name
  [LOCATION hdfs_path];
 
data_type
  : primitive_type
  | array_type
  | map_type
  | struct_type
  | union_type  -- (Note: Available in Hive 0.7.0 and later)
 
primitive_type
  : TINYINT
  | SMALLINT
  | INT
  | BIGINT
  | BOOLEAN
  | FLOAT
  | DOUBLE
  | STRING
  | BINARY      -- (Note: Available in Hive 0.8.0 and later)
  | TIMESTAMP   -- (Note: Available in Hive 0.8.0 and later)
  | DECIMAL     -- (Note: Available in Hive 0.11.0 and later)
  | DECIMAL(precision, scale)  -- (Note: Available in Hive 0.13.0 and later)
  | DATE        -- (Note: Available in Hive 0.12.0 and later)
  | VARCHAR     -- (Note: Available in Hive 0.12.0 and later)
  | CHAR        -- (Note: Available in Hive 0.13.0 and later)
 
array_type
  : ARRAY < data_type >
 
map_type
  : MAP < primitive_type, data_type >
 
struct_type
  : STRUCT < col_name : data_type [COMMENT col_comment], ...>
 
union_type
   : UNIONTYPE < data_type, data_type, ... >  -- (Note: Available in Hive 0.7.0 and later)
 
row_format
  : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
        [NULL DEFINED AS char]   -- (Note: Available in Hive 0.13 and later)
  | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
 
file_format:
  : SEQUENCEFILE
  | TEXTFILE    -- (Default, depending on hive.default.fileformat configuration)
  | RCFILE      -- (Note: Available in Hive 0.6.0 and later)
  | ORC         -- (Note: Available in Hive 0.11.0 and later)
  | PARQUET     -- (Note: Available in Hive 0.13.0 and later)
  | AVRO        -- (Note: Available in Hive 0.14.0 and later)
  | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname

准备数据为:

175.42.93.145	[25/Sep/2013:00:10:08 +0800]	"GET /mapreduce/hadoop-rumen-introduction HTTP/1.1"	301	427
61.135.216.104	[25/Sep/2013:00:10:10 +0800]	"GET /search-engine/thrift-framework-intro/feed/ HTTP/1.1"	304	160
175.42.93.145	[25/Sep/2013:00:10:11 +0800]	"GET /mapreduce/hadoop-rumen-introduction HTTP/1.1"	301	427
175.42.93.145	[25/Sep/2013:00:10:12 +0800]	"GET /mapreduce/hadoop-rumen-introduction/ HTTP/1.1"	200	20875

1、第一种方式

CREATE TABLE IF NOT EXISTS default.weblog(
ip string ,
time string ,
req_url string ,
status string ,
size string 
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ;
hive> CREATE TABLE IF NOT EXISTS default.weblog(
    > ip string ,
    > time string ,
    > req_url string ,
    > status string ,
    > size string 
    > )
    > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ;
OK
Time taken: 33.812 seconds

加载数据:

hive> load data local inpath '/opt/hive-0.13.1/weblog.txt' into table default.weblog ;
Copying data from file:/opt/hive-0.13.1/weblog.txt
Copying file: file:/opt/hive-0.13.1/weblog.txt
Loading data to table default.weblog
Table default.weblog stats: [numFiles=1, numRows=0, totalSize=425, rawDataSize=0]
OK
Time taken: 8.374 seconds

查看数据:

hive> select * from default.weblog;
OK
175.42.93.145	[25/Sep/2013:00:10:08 +0800]	"GET /mapreduce/hadoop-rumen-introduction HTTP/1.1"	301	427
61.135.216.104	[25/Sep/2013:00:10:10 +0800]	"GET /search-engine/thrift-framework-intro/feed/ HTTP/1.1"	304  160
175.42.93.145	[25/Sep/2013:00:10:11 +0800]	"GET /mapreduce/hadoop-rumen-introduction HTTP/1.1"	301	427
175.42.93.145	[25/Sep/2013:00:10:12 +0800]	"GET /mapreduce/hadoop-rumen-introduction/ HTTP/1.1"	200	20875
Time taken: 2.6 seconds, Fetched: 4 row(s)

2、第二种方式

CREATE TABLE default.weblog_comm
AS select ip, time, req_url from default.weblog; 

 

hive> CREATE TABLE default.weblog_comm
    > AS select ip, time, req_url from default.weblog; 
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1445142802171_0005, Tracking URL = N/A
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1445142802171_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-10-18 08:54:38,271 Stage-1 map = 0%,  reduce = 0%
2015-10-18 08:55:39,255 Stage-1 map = 0%,  reduce = 0%
2015-10-18 08:56:22,875 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.87 sec
MapReduce Total cumulative CPU time: 5 seconds 870 msec
Ended Job = job_1445142802171_0005
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://cluster/tmp/hive-root/hive_2015-10-18_08-52-50_940_3759961352595656019-1/-ext-10001
Moving data to: hdfs://cluster/user/hive/warehouse/weblog_comm
Table default.weblog_comm stats: [numFiles=1, numRows=0, totalSize=389, rawDataSize=0]
MapReduce Jobs Launched: 
Job 0: Map: 1   Cumulative CPU: 6.46 sec   HDFS Read: 631 HDFS Write: 473 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 460 msec
OK
Time taken: 232.372 seconds
查看表结构:
hive> desc default.weblog_comm;
OK
ip                  	string              	                    
time                	string              	                    
req_url             	string              	                    
Time taken: 0.56 seconds, Fetched: 3 row(s)

3.第三种方式:

CREATE TABLE IF NOT EXISTS default.weblog_20150923
    LIKE default.weblog ;
hive> CREATE TABLE IF NOT EXISTS default.weblog_20150923
    > LIKE default.weblog ;
OK
Time taken: 0.585 seconds

查看表数据

hive> select * from weblog_20150923;
OK
Time taken: 0.441 seconds
hive> desc weblog_20150923;
OK
ip                  	string              	                    
time                	string              	                    
req_url             	string              	                    
status              	string              	                    
size                	string              	                    
Time taken: 0.563 seconds, Fetched: 5 row(s)