hive部分:hive建表的三种方式(直接建表,查询建表,like建表)
程序员文章站
2022-07-14 15:34:09
...
hive 创建表的三种方式
官网地址:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]
ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
[STORED AS DIRECTORIES]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)
[AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
LIKE existing_table_or_view_name
[LOCATION hdfs_path];
data_type
: primitive_type
| array_type
| map_type
| struct_type
| union_type -- (Note: Available in Hive 0.7.0 and later)
primitive_type
: TINYINT
| SMALLINT
| INT
| BIGINT
| BOOLEAN
| FLOAT
| DOUBLE
| STRING
| BINARY -- (Note: Available in Hive 0.8.0 and later)
| TIMESTAMP -- (Note: Available in Hive 0.8.0 and later)
| DECIMAL -- (Note: Available in Hive 0.11.0 and later)
| DECIMAL(precision, scale) -- (Note: Available in Hive 0.13.0 and later)
| DATE -- (Note: Available in Hive 0.12.0 and later)
| VARCHAR -- (Note: Available in Hive 0.12.0 and later)
| CHAR -- (Note: Available in Hive 0.13.0 and later)
array_type
: ARRAY < data_type >
map_type
: MAP < primitive_type, data_type >
struct_type
: STRUCT < col_name : data_type [COMMENT col_comment], ...>
union_type
: UNIONTYPE < data_type, data_type, ... > -- (Note: Available in Hive 0.7.0 and later)
row_format
: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later)
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
file_format:
: SEQUENCEFILE
| TEXTFILE -- (Default, depending on hive.default.fileformat configuration)
| RCFILE -- (Note: Available in Hive 0.6.0 and later)
| ORC -- (Note: Available in Hive 0.11.0 and later)
| PARQUET -- (Note: Available in Hive 0.13.0 and later)
| AVRO -- (Note: Available in Hive 0.14.0 and later)
| INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
准备数据为:
175.42.93.145 [25/Sep/2013:00:10:08 +0800] "GET /mapreduce/hadoop-rumen-introduction HTTP/1.1" 301 427
61.135.216.104 [25/Sep/2013:00:10:10 +0800] "GET /search-engine/thrift-framework-intro/feed/ HTTP/1.1" 304 160
175.42.93.145 [25/Sep/2013:00:10:11 +0800] "GET /mapreduce/hadoop-rumen-introduction HTTP/1.1" 301 427
175.42.93.145 [25/Sep/2013:00:10:12 +0800] "GET /mapreduce/hadoop-rumen-introduction/ HTTP/1.1" 200 20875
1、第一种方式
CREATE TABLE IF NOT EXISTS default.weblog(
ip string ,
time string ,
req_url string ,
status string ,
size string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ;
hive> CREATE TABLE IF NOT EXISTS default.weblog(
> ip string ,
> time string ,
> req_url string ,
> status string ,
> size string
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ;
OK
Time taken: 33.812 seconds
加载数据:
hive> load data local inpath '/opt/hive-0.13.1/weblog.txt' into table default.weblog ;
Copying data from file:/opt/hive-0.13.1/weblog.txt
Copying file: file:/opt/hive-0.13.1/weblog.txt
Loading data to table default.weblog
Table default.weblog stats: [numFiles=1, numRows=0, totalSize=425, rawDataSize=0]
OK
Time taken: 8.374 seconds
查看数据:
hive> select * from default.weblog;
OK
175.42.93.145 [25/Sep/2013:00:10:08 +0800] "GET /mapreduce/hadoop-rumen-introduction HTTP/1.1" 301 427
61.135.216.104 [25/Sep/2013:00:10:10 +0800] "GET /search-engine/thrift-framework-intro/feed/ HTTP/1.1" 304 160
175.42.93.145 [25/Sep/2013:00:10:11 +0800] "GET /mapreduce/hadoop-rumen-introduction HTTP/1.1" 301 427
175.42.93.145 [25/Sep/2013:00:10:12 +0800] "GET /mapreduce/hadoop-rumen-introduction/ HTTP/1.1" 200 20875
Time taken: 2.6 seconds, Fetched: 4 row(s)
2、第二种方式
CREATE TABLE default.weblog_comm
AS select ip, time, req_url from default.weblog;
hive> CREATE TABLE default.weblog_comm
> AS select ip, time, req_url from default.weblog;
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1445142802171_0005, Tracking URL = N/A
Kill Command = /opt/hadoop/bin/hadoop job -kill job_1445142802171_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-10-18 08:54:38,271 Stage-1 map = 0%, reduce = 0%
2015-10-18 08:55:39,255 Stage-1 map = 0%, reduce = 0%
2015-10-18 08:56:22,875 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.87 sec
MapReduce Total cumulative CPU time: 5 seconds 870 msec
Ended Job = job_1445142802171_0005
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://cluster/tmp/hive-root/hive_2015-10-18_08-52-50_940_3759961352595656019-1/-ext-10001
Moving data to: hdfs://cluster/user/hive/warehouse/weblog_comm
Table default.weblog_comm stats: [numFiles=1, numRows=0, totalSize=389, rawDataSize=0]
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 6.46 sec HDFS Read: 631 HDFS Write: 473 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 460 msec
OK
Time taken: 232.372 seconds
查看表结构:
hive> desc default.weblog_comm;
OK
ip string
time string
req_url string
Time taken: 0.56 seconds, Fetched: 3 row(s)
3.第三种方式:
CREATE TABLE IF NOT EXISTS default.weblog_20150923
LIKE default.weblog ;
hive> CREATE TABLE IF NOT EXISTS default.weblog_20150923
> LIKE default.weblog ;
OK
Time taken: 0.585 seconds
查看表数据
hive> select * from weblog_20150923;
OK
Time taken: 0.441 seconds
hive> desc weblog_20150923;
OK
ip string
time string
req_url string
status string
size string
Time taken: 0.563 seconds, Fetched: 5 row(s)
上一篇: sql server 数据控制 权限
下一篇: ROS安装出现依赖包问题解决方法参考