欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Hive进阶之Hive数据导入

程序员文章站 2022-06-15 10:16:58
...

使用load语句导入数据

-语法:

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE table name [PARTITION (partcoll=vall,partcol=val2 ...)]

如:

Hive进阶之Hive数据导入


注意如果创建表的时候没有规定分隔符那它默认是制表符(\t),而你导入的数据以','分隔,那便会成为空值如下所示:

Hive进阶之Hive数据导入

导入目录下的所有文件数据

Hive进阶之Hive数据导入

注意不写local代表从hdfs中导入

将数据导入分区

Hive进阶之Hive数据导入


使用Sqoop实现关系型数据库数据导入

下载地址
http://sqoop.apache.org/
sqoop安装请看sqoop安装篇

将mysql中的数据导入到hdfs中
注意了sqoop是在命令行中执行不是在hive中执行,我之前一直在hive中执行结果一直给我报这样的错
hive> sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password 123456 --table trade_detail --hive-import --hive-overwrite --hive-table trade_detail --fields-terminated-by',';
NoViableAltException(aaa@qq.com[])
	at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:999)
	at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
	at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:373)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:291)
	at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:944)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1009)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:880)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:870)
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: ParseException line 1:0 cannot recognize input near 'sqoop' 'import' '<EOF>'
实际运行应该是这样

zj-db0236deMacBook-Pro:sbin zj-db0236$ sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password 123456 --table trade_detail --hive-import --hive-overwrite -m 1 --hive-table trade_detail --fields-terminated-by ','
Warning: /Users/zj-db0236/Downloads/sqoop-1.4.6.bin__hadoop-0.23/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /Users/zj-db0236/Downloads/sqoop-1.4.6.bin__hadoop-0.23/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /Users/zj-db0236/Downloads/sqoop-1.4.6.bin__hadoop-0.23/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /Users/zj-db0236/Downloads/sqoop-1.4.6.bin__hadoop-0.23/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
17/06/27 15:25:35 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/06/27 15:25:35 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/06/27 15:25:35 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/06/27 15:25:35 INFO tool.CodeGenTool: Beginning code generation
17/06/27 15:25:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `trade_detail` AS t LIMIT 1
17/06/27 15:25:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `trade_detail` AS t LIMIT 1
17/06/27 15:25:35 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /Users/zj-db0236/Downloads/hadoop-2.7.2
注: /tmp/sqoop-zj-db0236/compile/da5649c40aae421516a4a7b09474d590/trade_detail.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
17/06/27 15:25:36 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-zj-db0236/compile/da5649c40aae421516a4a7b09474d590/trade_detail.jar
17/06/27 15:25:36 WARN manager.MySQLManager: It looks like you are importing from mysql.
17/06/27 15:25:36 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
17/06/27 15:25:36 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
17/06/27 15:25:36 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
17/06/27 15:25:36 INFO mapreduce.ImportJobBase: Beginning import of trade_detail
17/06/27 15:26:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/06/27 15:26:07 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/06/27 15:26:08 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/06/27 15:26:08 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/06/27 15:26:10 INFO db.DBInputFormat: Using read commited transaction isolation
17/06/27 15:26:10 INFO mapreduce.JobSubmitter: number of splits:1
17/06/27 15:26:10 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1498547617140_0002
17/06/27 15:26:10 INFO impl.YarnClientImpl: Submitted application application_1498547617140_0002
17/06/27 15:26:10 INFO mapreduce.Job: The url to track the job: http://zj-db0236deMacBook-Pro.local:8088/proxy/application_1498547617140_0002/
17/06/27 15:26:10 INFO mapreduce.Job: Running job: job_1498547617140_0002
17/06/27 15:26:48 INFO mapreduce.Job: Job job_1498547617140_0002 running in uber mode : false
17/06/27 15:26:49 INFO mapreduce.Job:  map 0% reduce 0%
17/06/27 15:27:24 INFO mapreduce.Job:  map 100% reduce 0%
17/06/27 15:27:24 INFO mapreduce.Job: Job job_1498547617140_0002 completed successfully
17/06/27 15:27:24 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=137758
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=87
		HDFS: Number of bytes written=119
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Other local map tasks=1
		Total time spent by all maps in occupied slots (ms)=33155
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=33155
		Total vcore-milliseconds taken by all map tasks=33155
		Total megabyte-milliseconds taken by all map tasks=33950720
	Map-Reduce Framework
		Map input records=5
		Map output records=5
		Input split bytes=87
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=41
		CPU time spent (ms)=0
		Physical memory (bytes) snapshot=0
		Virtual memory (bytes) snapshot=0
		Total committed heap usage (bytes)=149422080
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=119
17/06/27 15:27:24 INFO mapreduce.ImportJobBase: Transferred 119 bytes in 76.2361 seconds (1.5609 bytes/sec)
17/06/27 15:27:24 INFO mapreduce.ImportJobBase: Retrieved 5 records.
17/06/27 15:27:24 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `trade_detail` AS t LIMIT 1
17/06/27 15:27:24 INFO hive.HiveImport: Loading uploaded data into Hive
17/06/27 15:27:26 INFO hive.HiveImport: 
17/06/27 15:27:26 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/Users/zj-db0236/Downloads/apache-hive-0.13.0-bin/lib/hive-common-0.13.0.jar!/hive-log4j.properties
17/06/27 15:28:00 INFO hive.HiveImport: OK
17/06/27 15:28:00 INFO hive.HiveImport: Time taken: 0.679 seconds
17/06/27 15:28:00 INFO hive.HiveImport: Loading data to table default.trade_detail
17/06/27 15:28:01 INFO hive.HiveImport: rmr: DEPRECATED: Please use 'rm -r' instead.
17/06/27 15:28:01 INFO hive.HiveImport: Deleted hdfs://localhost:9000/user/hive/warehouse/trade_detail
17/06/27 15:28:01 INFO hive.HiveImport: Table default.trade_detail stats: [numFiles=2, numRows=0, totalSize=119, rawDataSize=0]
17/06/27 15:28:01 INFO hive.HiveImport: OK
17/06/27 15:28:01 INFO hive.HiveImport: Time taken: 0.456 seconds
17/06/27 15:28:01 INFO hive.HiveImport: Hive import complete.

注意了:如果没有-m 1代表map启动1个如果不加这一句那么每条数据都会启动一个map最后你有多少条数据就会有多少分区,这样很浪费空间
sqoop指定参数说明

--append 将数据追加到hdfs中已经存在的dataset中。使用该参数,sqoop将把数据先导入到一个临时目录中,然后重新给文件命名到一个正式的目录中,以避免和该目录中已存在的文件重名。
--as-avrodatafile 将数据导入到一个Avro数据文件中
--as-sequencefile 将数据导入到一个sequence文件中
--as-textfile 将数据导入到一个普通文本文件中,生成该文本文件后,可以在hive中通过sql语句查询出结果。
--boundary-query <statement> 边界查询,也就是在导入前先通过SQL查询得到一个结果集,然后导入的数据就是该结果集内的数据,格式如:--boundary-query 'select id,no from t where id = 3',表示导入的数据为id=3的记录,或者 select min(<split-by>), max(<split-by>) from <table name>,注意查询的字段中不能有数据类型为字符串的字段,否则会报错
--columns<col,col> 指定要导入的字段值,格式如:--columns id,username
--direct 直接导入模式,使用的是关系数据库自带的导入导出工具。官网上是说这样导入会更快
--direct-split-size 在使用上面direct直接导入的基础上,对导入的流按字节数分块,特别是使用直连模式从PostgreSQL导入数据的时候,可以将一个到达设定大小的文件分为几个独立的文件。
--inline-lob-limit 设定大对象数据类型的最大值
-m,--num-mappers 启动N个map来并行导入数据,默认是4个,最好不要将数字设置为高于集群的节点数
--query,-e <sql> 从查询结果中导入数据,该参数使用时必须指定–target-dir–hive-table,在查询语句中一定要有where条件且在where条件中需要包含 \$CONDITIONS,示例:--query 'select * from t where \$CONDITIONS ' --target-dir /tmp/t –hive-table t
--split-by <column> 表的列名,用来切分工作单元,一般后面跟主键ID
--table <table-name> 关系数据库表名,数据从该表中获取
--delete-target-dir 删除目标目录
--target-dir <dir> 指定hdfs路径
--warehouse-dir <dir> 与 --target-dir 不能同时使用,指定数据导入的存放目录,适用于hdfs导入,不适合导入hive目录
--where 从关系数据库导入数据时的查询条件,示例:--where "id = 2"
-z,--compress 压缩参数,默认情况下数据是没被压缩的,通过该参数可以使用gzip压缩算法对数据进行压缩,适用于SequenceFile, text文本文件, 和Avro文件
--compression-codec Hadoop压缩编码,默认是gzip
--null-string <null-string> 可选参数,如果没有指定,则字符串null将被使用
--null-non-string <null-string> 可选参数,如果没有指定,则字符串null将被使用
将hive的数据导出到mysql

sqoop export --connect "jdbc:mysql://localhost:3306/test?useUnicode=true&characterEncoding=utf-8" --username root --table hiveToMysql --password 123456 --export-dir /user/hive/warehouse/trade_detail/ --fields-terminated-by ','


结果

Warning: /Users/zj-db0236/Downloads/sqoop-1.4.6.bin__hadoop-0.23/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /Users/zj-db0236/Downloads/sqoop-1.4.6.bin__hadoop-0.23/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /Users/zj-db0236/Downloads/sqoop-1.4.6.bin__hadoop-0.23/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /Users/zj-db0236/Downloads/sqoop-1.4.6.bin__hadoop-0.23/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
17/06/27 17:17:07 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/06/27 17:17:07 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/06/27 17:17:07 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/06/27 17:17:07 INFO tool.CodeGenTool: Beginning code generation
17/06/27 17:17:08 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hiveToMysql` AS t LIMIT 1
17/06/27 17:17:08 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hiveToMysql` AS t LIMIT 1
17/06/27 17:17:08 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /Users/zj-db0236/Downloads/hadoop-2.7.2
注: /tmp/sqoop-zj-db0236/compile/2f26ed69134261e462cebf51c09deff7/hiveToMysql.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
17/06/27 17:17:10 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-zj-db0236/compile/2f26ed69134261e462cebf51c09deff7/hiveToMysql.jar
17/06/27 17:17:10 INFO mapreduce.ExportJobBase: Beginning export of hiveToMysql
17/06/27 17:17:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/06/27 17:17:41 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/06/27 17:17:41 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
17/06/27 17:17:41 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
17/06/27 17:17:41 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/06/27 17:17:41 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/06/27 17:17:43 INFO input.FileInputFormat: Total input paths to process : 1
17/06/27 17:17:43 INFO input.FileInputFormat: Total input paths to process : 1
17/06/27 17:17:43 INFO mapreduce.JobSubmitter: number of splits:4
17/06/27 17:17:43 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
17/06/27 17:17:43 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1498547617140_0003
17/06/27 17:17:44 INFO impl.YarnClientImpl: Submitted application application_1498547617140_0003
17/06/27 17:17:44 INFO mapreduce.Job: The url to track the job: http://zj-db0236deMacBook-Pro.local:8088/proxy/application_1498547617140_0003/
17/06/27 17:17:44 INFO mapreduce.Job: Running job: job_1498547617140_0003
17/06/27 17:18:22 INFO mapreduce.Job: Job job_1498547617140_0003 running in uber mode : false
17/06/27 17:18:22 INFO mapreduce.Job:  map 0% reduce 0%
17/06/27 17:19:03 INFO mapreduce.Job:  map 75% reduce 0%
17/06/27 17:19:04 INFO mapreduce.Job:  map 100% reduce 0%
17/06/27 17:19:04 INFO mapreduce.Job: Job job_1498547617140_0003 completed successfully
17/06/27 17:19:04 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=549964
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=1009
		HDFS: Number of bytes written=0
		HDFS: Number of read operations=19
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=0
	Job Counters 
		Launched map tasks=4
		Data-local map tasks=4
		Total time spent by all maps in occupied slots (ms)=152967
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=152967
		Total vcore-milliseconds taken by all map tasks=152967
		Total megabyte-milliseconds taken by all map tasks=156638208
	Map-Reduce Framework
		Map input records=5
		Map output records=5
		Input split bytes=676
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=206
		CPU time spent (ms)=0
		Physical memory (bytes) snapshot=0
		Virtual memory (bytes) snapshot=0
		Total committed heap usage (bytes)=577241088
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=0
17/06/27 17:19:04 INFO mapreduce.ExportJobBase: Transferred 1,009 bytes in 82.6365 seconds (12.2101 bytes/sec)
17/06/27 17:19:04 INFO mapreduce.ExportJobBase: Exported 5 records.
Hive进阶之Hive数据导入