Hive进阶之Hive数据导入

程序员文章站 2022-06-15 10:16:58

...

使用load语句导入数据

-语法：

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE table name [PARTITION (partcoll=vall,partcol=val2 ...)]

如：

Hive进阶之Hive数据导入

注意如果创建表的时候没有规定分隔符那它默认是制表符（\t），而你导入的数据以','分隔，那便会成为空值如下所示：

Hive进阶之Hive数据导入

导入目录下的所有文件数据

Hive进阶之Hive数据导入

注意不写local代表从hdfs中导入

将数据导入分区

Hive进阶之Hive数据导入

使用Sqoop实现关系型数据库数据导入

下载地址

http://sqoop.apache.org/

sqoop安装请看sqoop安装篇

将mysql中的数据导入到hdfs中

注意了sqoop是在命令行中执行不是在hive中执行，我之前一直在hive中执行结果一直给我报这样的错

hive> sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password 123456 --table trade_detail --hive-import --hive-overwrite --hive-table trade_detail --fields-terminated-by',';
NoViableAltException(aaa@qq.com[])
	at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:999)
	at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
	at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:373)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:291)
	at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:944)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1009)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:880)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:870)
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: ParseException line 1:0 cannot recognize input near 'sqoop' 'import' '<EOF>'

实际运行应该是这样

zj-db0236deMacBook-Pro:sbin zj-db0236$ sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password 123456 --table trade_detail --hive-import --hive-overwrite -m 1 --hive-table trade_detail --fields-terminated-by ','
Warning: /Users/zj-db0236/Downloads/sqoop-1.4.6.bin__hadoop-0.23/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /Users/zj-db0236/Downloads/sqoop-1.4.6.bin__hadoop-0.23/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /Users/zj-db0236/Downloads/sqoop-1.4.6.bin__hadoop-0.23/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /Users/zj-db0236/Downloads/sqoop-1.4.6.bin__hadoop-0.23/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
17/06/27 15:25:35 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/06/27 15:25:35 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/06/27 15:25:35 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/06/27 15:25:35 INFO tool.CodeGenTool: Beginning code generation
17/06/27 15:25:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `trade_detail` AS t LIMIT 1
17/06/27 15:25:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `trade_detail` AS t LIMIT 1
17/06/27 15:25:35 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /Users/zj-db0236/Downloads/hadoop-2.7.2
注: /tmp/sqoop-zj-db0236/compile/da5649c40aae421516a4a7b09474d590/trade_detail.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
17/06/27 15:25:36 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-zj-db0236/compile/da5649c40aae421516a4a7b09474d590/trade_detail.jar
17/06/27 15:25:36 WARN manager.MySQLManager: It looks like you are importing from mysql.
17/06/27 15:25:36 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
17/06/27 15:25:36 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
17/06/27 15:25:36 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
17/06/27 15:25:36 INFO mapreduce.ImportJobBase: Beginning import of trade_detail
17/06/27 15:26:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/06/27 15:26:07 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/06/27 15:26:08 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/06/27 15:26:08 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/06/27 15:26:10 INFO db.DBInputFormat: Using read commited transaction isolation
17/06/27 15:26:10 INFO mapreduce.JobSubmitter: number of splits:1
17/06/27 15:26:10 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1498547617140_0002
17/06/27 15:26:10 INFO impl.YarnClientImpl: Submitted application application_1498547617140_0002
17/06/27 15:26:10 INFO mapreduce.Job: The url to track the job: http://zj-db0236deMacBook-Pro.local:8088/proxy/application_1498547617140_0002/
17/06/27 15:26:10 INFO mapreduce.Job: Running job: job_1498547617140_0002
17/06/27 15:26:48 INFO mapreduce.Job: Job job_1498547617140_0002 running in uber mode : false
17/06/27 15:26:49 INFO mapreduce.Job:  map 0% reduce 0%
17/06/27 15:27:24 INFO mapreduce.Job:  map 100% reduce 0%
17/06/27 15:27:24 INFO mapreduce.Job: Job job_1498547617140_0002 completed successfully
17/06/27 15:27:24 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=137758
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=87
		HDFS: Number of bytes written=119
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Other local map tasks=1
		Total time spent by all maps in occupied slots (ms)=33155
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=33155
		Total vcore-milliseconds taken by all map tasks=33155
		Total megabyte-milliseconds taken by all map tasks=33950720
	Map-Reduce Framework
		Map input records=5
		Map output records=5
		Input split bytes=87
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=41
		CPU time spent (ms)=0
		Physical memory (bytes) snapshot=0
		Virtual memory (bytes) snapshot=0
		Total committed heap usage (bytes)=149422080
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=119
17/06/27 15:27:24 INFO mapreduce.ImportJobBase: Transferred 119 bytes in 76.2361 seconds (1.5609 bytes/sec)
17/06/27 15:27:24 INFO mapreduce.ImportJobBase: Retrieved 5 records.
17/06/27 15:27:24 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `trade_detail` AS t LIMIT 1
17/06/27 15:27:24 INFO hive.HiveImport: Loading uploaded data into Hive
17/06/27 15:27:26 INFO hive.HiveImport: 
17/06/27 15:27:26 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/Users/zj-db0236/Downloads/apache-hive-0.13.0-bin/lib/hive-common-0.13.0.jar!/hive-log4j.properties
17/06/27 15:28:00 INFO hive.HiveImport: OK
17/06/27 15:28:00 INFO hive.HiveImport: Time taken: 0.679 seconds
17/06/27 15:28:00 INFO hive.HiveImport: Loading data to table default.trade_detail
17/06/27 15:28:01 INFO hive.HiveImport: rmr: DEPRECATED: Please use 'rm -r' instead.
17/06/27 15:28:01 INFO hive.HiveImport: Deleted hdfs://localhost:9000/user/hive/warehouse/trade_detail
17/06/27 15:28:01 INFO hive.HiveImport: Table default.trade_detail stats: [numFiles=2, numRows=0, totalSize=119, rawDataSize=0]
17/06/27 15:28:01 INFO hive.HiveImport: OK
17/06/27 15:28:01 INFO hive.HiveImport: Time taken: 0.456 seconds
17/06/27 15:28:01 INFO hive.HiveImport: Hive import complete.

注意了：如果没有-m 1代表map启动1个如果不加这一句那么每条数据都会启动一个map最后你有多少条数据就会有多少分区，这样很浪费空间

sqoop指定参数说明

`--append`	将数据追加到hdfs中已经存在的dataset中。使用该参数，sqoop将把数据先导入到一个临时目录中，然后重新给文件命名到一个正式的目录中，以避免和该目录中已存在的文件重名。
`--as-avrodatafile`	将数据导入到一个Avro数据文件中
`--as-sequencefile`	将数据导入到一个sequence文件中
`--as-textfile`	将数据导入到一个普通文本文件中，生成该文本文件后，可以在hive中通过sql语句查询出结果。
`--boundary-query <statement>`	边界查询，也就是在导入前先通过SQL查询得到一个结果集，然后导入的数据就是该结果集内的数据，格式如：`--boundary-query 'select id,no from t where id = 3'`，表示导入的数据为id=3的记录，或者 `select min(<split-by>), max(<split-by>) from <table name>`，注意查询的字段中不能有数据类型为字符串的字段，否则会报错
`--columns<col,col>`	指定要导入的字段值，格式如：`--columns id,username`
`--direct`	直接导入模式，使用的是关系数据库自带的导入导出工具。官网上是说这样导入会更快
`--direct-split-size`	在使用上面direct直接导入的基础上，对导入的流按字节数分块，特别是使用直连模式从PostgreSQL导入数据的时候，可以将一个到达设定大小的文件分为几个独立的文件。
`--inline-lob-limit`	设定大对象数据类型的最大值
`-m,--num-mappers`	启动N个map来并行导入数据，默认是4个，最好不要将数字设置为高于集群的节点数
`--query，-e <sql>`	从查询结果中导入数据，该参数使用时必须指定`–target-dir`、`–hive-table`，在查询语句中一定要有where条件且在where条件中需要包含 `\$CONDITIONS`，示例：`--query 'select * from t where \$CONDITIONS ' --target-dir /tmp/t –hive-table t`
`--split-by <column>`	表的列名，用来切分工作单元，一般后面跟主键ID
`--table <table-name>`	关系数据库表名，数据从该表中获取
`--delete-target-dir`	删除目标目录
`--target-dir <dir>`	指定hdfs路径
`--warehouse-dir <dir>`	与 `--target-dir` 不能同时使用，指定数据导入的存放目录，适用于hdfs导入，不适合导入hive目录
`--where`	从关系数据库导入数据时的查询条件，示例：`--where "id = 2"`
`-z,--compress`	压缩参数，默认情况下数据是没被压缩的，通过该参数可以使用gzip压缩算法对数据进行压缩，适用于SequenceFile, text文本文件, 和Avro文件
`--compression-codec`	Hadoop压缩编码，默认是gzip
`--null-string <null-string>`	可选参数，如果没有指定，则字符串null将被使用
`--null-non-string <null-string>`	可选参数，如果没有指定，则字符串null将被使用

将hive的数据导出到mysql

sqoop export --connect "jdbc:mysql://localhost:3306/test?useUnicode=true&characterEncoding=utf-8" --username root --table hiveToMysql --password 123456 --export-dir /user/hive/warehouse/trade_detail/ --fields-terminated-by ','

结果

Warning: /Users/zj-db0236/Downloads/sqoop-1.4.6.bin__hadoop-0.23/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /Users/zj-db0236/Downloads/sqoop-1.4.6.bin__hadoop-0.23/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /Users/zj-db0236/Downloads/sqoop-1.4.6.bin__hadoop-0.23/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /Users/zj-db0236/Downloads/sqoop-1.4.6.bin__hadoop-0.23/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
17/06/27 17:17:07 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/06/27 17:17:07 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/06/27 17:17:07 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/06/27 17:17:07 INFO tool.CodeGenTool: Beginning code generation
17/06/27 17:17:08 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hiveToMysql` AS t LIMIT 1
17/06/27 17:17:08 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hiveToMysql` AS t LIMIT 1
17/06/27 17:17:08 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /Users/zj-db0236/Downloads/hadoop-2.7.2
注: /tmp/sqoop-zj-db0236/compile/2f26ed69134261e462cebf51c09deff7/hiveToMysql.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
17/06/27 17:17:10 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-zj-db0236/compile/2f26ed69134261e462cebf51c09deff7/hiveToMysql.jar
17/06/27 17:17:10 INFO mapreduce.ExportJobBase: Beginning export of hiveToMysql
17/06/27 17:17:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/06/27 17:17:41 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/06/27 17:17:41 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
17/06/27 17:17:41 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
17/06/27 17:17:41 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/06/27 17:17:41 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/06/27 17:17:43 INFO input.FileInputFormat: Total input paths to process : 1
17/06/27 17:17:43 INFO input.FileInputFormat: Total input paths to process : 1
17/06/27 17:17:43 INFO mapreduce.JobSubmitter: number of splits:4
17/06/27 17:17:43 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
17/06/27 17:17:43 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1498547617140_0003
17/06/27 17:17:44 INFO impl.YarnClientImpl: Submitted application application_1498547617140_0003
17/06/27 17:17:44 INFO mapreduce.Job: The url to track the job: http://zj-db0236deMacBook-Pro.local:8088/proxy/application_1498547617140_0003/
17/06/27 17:17:44 INFO mapreduce.Job: Running job: job_1498547617140_0003
17/06/27 17:18:22 INFO mapreduce.Job: Job job_1498547617140_0003 running in uber mode : false
17/06/27 17:18:22 INFO mapreduce.Job:  map 0% reduce 0%
17/06/27 17:19:03 INFO mapreduce.Job:  map 75% reduce 0%
17/06/27 17:19:04 INFO mapreduce.Job:  map 100% reduce 0%
17/06/27 17:19:04 INFO mapreduce.Job: Job job_1498547617140_0003 completed successfully
17/06/27 17:19:04 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=549964
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=1009
		HDFS: Number of bytes written=0
		HDFS: Number of read operations=19
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=0
	Job Counters 
		Launched map tasks=4
		Data-local map tasks=4
		Total time spent by all maps in occupied slots (ms)=152967
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=152967
		Total vcore-milliseconds taken by all map tasks=152967
		Total megabyte-milliseconds taken by all map tasks=156638208
	Map-Reduce Framework
		Map input records=5
		Map output records=5
		Input split bytes=676
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=206
		CPU time spent (ms)=0
		Physical memory (bytes) snapshot=0
		Virtual memory (bytes) snapshot=0
		Total committed heap usage (bytes)=577241088
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=0
17/06/27 17:19:04 INFO mapreduce.ExportJobBase: Transferred 1,009 bytes in 82.6365 seconds (12.2101 bytes/sec)
17/06/27 17:19:04 INFO mapreduce.ExportJobBase: Exported 5 records.

上一篇：小程序 navigator 组件的用法

下一篇： React Native之导航器Navigator实现导航功能（二）

Hive进阶之Hive数据导入

使用load语句导入数据

使用Sqoop实现关系型数据库数据导入

ASP编程入门进阶（廿一）：DAO SQL之建立数据库表

Hive与Oracle之间利用Sqoop进行数据的导入导出时遇到的问题及解决方法

docker 数据卷之进阶篇

ASP编程入门进阶（二十）：ADO组件之查询数据记录

ASP编程入门进阶（二十）：ADO组件之修改数据记录

ASP编程入门进阶（二十）：ADO组件之插入数据记录

ASP编程入门进阶（二十）：ADO组件之删除数据记录

ASP编程入门进阶（二十）：ADO组件之显示数据记录

Mysql元数据如何生成Hive建表语句注释脚本详解

python处理数据,存进hive表的方法