超简单的Sqoop入门教程
程序员文章站
2022-04-19 18:38:06
...
1、下载
http://sqoop.apache.org/
https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/
对于sqoop-1.4.x.tar.gz软件包,从1.4.5开始以后版本安装包的lib下就已经没有了sqoop-1.4.x.jar啦,我们这里需要下载sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz。
[root@node1 ~]# wget http://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
[root@node1 ~]# tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
将解压后的目录移动到/opt下
[root@node1 ~]# mv sqoop-1.4.7.bin__hadoop-2.6.0 /opt/sqoop-1.4.7
2、配置
进入目录
[aaa@qq.com ~]# cd sqoop-1.4.7
[aaa@qq.com sqoop-1.4.7]# ls
bin CHANGELOG.txt conf ivy lib NOTICE.txt README.txt sqoop-patch-review.py src
build.xml COMPILING.txt docs ivy.xml LICENSE.txt pom-old.xml sqoop-1.4.7.jar sqoop-test-1.4.7.jar testdata
[aaa@qq.com sqoop-1.4.7]#
编译配置文件
[root@node1 sqoop-1.4.7]# vi bin/configure-sqoop
## Moved to be a runtime check in sqoop.
#if [ ! -d "${HCAT_HOME}" ]; then
# echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
# echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
#fi
#if [ ! -d "${ACCUMULO_HOME}" ]; then
# echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
# echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
#fi
#if [ ! -d "${ZOOKEEPER_HOME}" ]; then
# echo "Warning: $ZOOKEEPER_HOME does not exist! Accumulo imports will fail."
# echo 'Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.'
#f
3、测试MySQL数据库
(1)复制驱动程序
[root@node1 mysql-connector-java-5.1.46]# cp mysql-connector-java-5.1.46-bin.jar /opt/sqoop-1.4.7/lib/
(2)测试连接
[aaa@qq.com sqoop-1.4.7]# bin/sqoop list-databases --connect jdbc:mysql://node3:3306/ --username root --password 123456
18/04/03 10:34:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
18/04/03 10:34:56 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/04/03 10:34:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
hive
mysql
performance_schema
test
webMagic
[aaa@qq.com sqoop-1.4.7]#
(3)将MySQL中数据表导入HDFS
[aaa@qq.com sqoop-1.4.7]# bin/sqoop import --connect jdbc:mysql://node3:3306/webMagic --table csdnblog --username root --password 123456 -m 1
18/04/03 10:50:12 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
18/04/03 10:50:12 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/04/03 10:50:12 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/04/03 10:50:12 INFO tool.CodeGenTool: Beginning code generation
18/04/03 10:50:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `csdnblog` AS t LIMIT 1
18/04/03 10:50:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `csdnblog` AS t LIMIT 1
18/04/03 10:50:13 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/hadoop-2.7.3
Note: /tmp/sqoop-root/compile/3e1f797b0f1cc2b27c58837b45ea81c1/csdnblog.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/04/03 10:50:19 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/3e1f797b0f1cc2b27c58837b45ea81c1/csdnblog.jar
18/04/03 10:50:19 WARN manager.MySQLManager: It looks like you are importing from mysql.
18/04/03 10:50:19 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
18/04/03 10:50:19 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
18/04/03 10:50:19 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
18/04/03 10:50:19 INFO mapreduce.ImportJobBase: Beginning import of csdnblog
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/04/03 10:50:20 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/04/03 10:50:23 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/04/03 10:50:36 INFO db.DBInputFormat: Using read commited transaction isolation
18/04/03 10:50:37 INFO mapreduce.JobSubmitter: number of splits:1
18/04/03 10:50:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1522766299245_0001
18/04/03 10:50:41 INFO impl.YarnClientImpl: Submitted application application_1522766299245_0001
18/04/03 10:50:42 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1522766299245_0001/
18/04/03 10:50:42 INFO mapreduce.Job: Running job: job_1522766299245_0001
18/04/03 10:51:04 INFO mapreduce.Job: Job job_1522766299245_0001 running in uber mode : false
18/04/03 10:51:04 INFO mapreduce.Job: map 0% reduce 0%
18/04/03 10:51:48 INFO mapreduce.Job: map 100% reduce 0%
18/04/03 10:51:52 INFO mapreduce.Job: Job job_1522766299245_0001 completed successfully
18/04/03 10:51:52 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=138180
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=30206
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=40637
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=40637
Total vcore-milliseconds taken by all map tasks=40637
Total megabyte-milliseconds taken by all map tasks=41612288
Map-Reduce Framework
Map input records=182
Map output records=182
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=169
CPU time spent (ms)=2190
Physical memory (bytes) snapshot=140386304
Virtual memory (bytes) snapshot=2152366080
Total committed heap usage (bytes)=44564480
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=30206
18/04/03 10:51:52 INFO mapreduce.ImportJobBase: Transferred 29.498 KB in 89.0977 seconds (339.0213 bytes/sec)
18/04/03 10:51:52 INFO mapreduce.ImportJobBase: Retrieved 182 records.
[aaa@qq.com sqoop-1.4.7]#
(4)查看HDFS上新导入数据
[root@node1 sqoop-1.4.7]# hdfs dfs -ls /user/root
Found 3 items
drwxr-xr-x - root supergroup 0 2017-09-22 10:31 /user/root/.sparkStaging
drwxr-xr-x - root supergroup 0 2018-04-03 10:51 /user/root/csdnblog
drwxr-xr-x - root supergroup 0 2017-10-05 09:40 /user/root/input
[root@node1 sqoop-1.4.7]# hdfs dfs -ls /user/root/csdnblog
Found 2 items
-rw-r--r-- 3 root supergroup 0 2018-04-03 10:51 /user/root/csdnblog/_SUCCESS
-rw-r--r-- 3 root supergroup 30206 2018-04-03 10:51 /user/root/csdnblog/part-m-00000
[root@node1 sqoop-1.4.7]#
4、可能遇到的错误
[aaa@qq.com sqoop-1.4.7]# bin/sqoop import --connect jdbc:mysql://10.17.12.160:3306/test --table users --username root --password 123456 -m 1
Warning: /opt/sqoop-1.4.7/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
18/05/22 13:32:02 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
18/05/22 13:32:02 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/05/22 13:32:02 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/05/22 13:32:02 INFO tool.CodeGenTool: Beginning code generation
18/05/22 13:32:03 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `users` AS t LIMIT 1
18/05/22 13:32:03 ERROR manager.SqlManager: Error reading from database: java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@3f197a46 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@3f197a46 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:931)
at com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingData(MysqlIO.java:2518)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1748)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1961)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2537)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2466)
at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1383)
at com.mysql.jdbc.ConnectionImpl.getMaxBytesPerChar(ConnectionImpl.java:2939)
at com.mysql.jdbc.Field.getMaxBytesPerCharacter(Field.java:576)
at com.mysql.jdbc.ResultSetMetaData.getPrecision(ResultSetMetaData.java:440)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:305)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:246)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:327)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1872)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1671)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:501)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
18/05/22 13:32:03 ERROR tool.ImportTool: Import failed: java.io.IOException: No columns to generate for ClassWriter
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1677)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:501)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
[aaa@qq.com sqoop-1.4.7]#
解决办法,加上驱动参数
[aaa@qq.com sqoop-1.4.7]# bin/sqoop import --connect jdbc:mysql://10.17.12.160:3306/test --table users --username root --password 123456 -m 1 --driver com.mysql.jdbc.Driver
Warning: /opt/sqoop-1.4.7/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
18/05/22 13:35:47 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
18/05/22 13:35:48 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/05/22 13:35:48 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
18/05/22 13:35:48 INFO manager.SqlManager: Using default fetchSize of 1000
18/05/22 13:35:48 INFO tool.CodeGenTool: Beginning code generation
18/05/22 13:35:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM users AS t WHERE 1=0
18/05/22 13:35:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM users AS t WHERE 1=0
18/05/22 13:35:48 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/hadoop-2.7.5
Note: /tmp/sqoop-root/compile/054354d9b9af11ea0e4c96f467800c74/users.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/05/22 13:35:50 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/054354d9b9af11ea0e4c96f467800c74/users.jar
18/05/22 13:35:50 INFO mapreduce.ImportJobBase: Beginning import of users
18/05/22 13:35:50 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/05/22 13:35:50 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM users AS t WHERE 1=0
18/05/22 13:35:51 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/05/22 13:35:51 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
18/05/22 13:35:56 INFO db.DBInputFormat: Using read commited transaction isolation
18/05/22 13:35:56 INFO mapreduce.JobSubmitter: number of splits:1
18/05/22 13:35:56 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1526097883376_0001
18/05/22 13:35:57 INFO impl.YarnClientImpl: Submitted application application_1526097883376_0001
18/05/22 13:35:57 INFO mapreduce.Job: The url to track the job: http://bigdata03-test:8088/proxy/application_1526097883376_0001/
18/05/22 13:35:57 INFO mapreduce.Job: Running job: job_1526097883376_0001
18/05/22 13:36:07 INFO mapreduce.Job: Job job_1526097883376_0001 running in uber mode : false
18/05/22 13:36:07 INFO mapreduce.Job: map 0% reduce 0%
18/05/22 13:36:18 INFO mapreduce.Job: map 100% reduce 0%
18/05/22 13:36:19 INFO mapreduce.Job: Job job_1526097883376_0001 completed successfully
18/05/22 13:36:19 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=142003
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=40
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=8202
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=8202
Total vcore-milliseconds taken by all map tasks=8202
Total megabyte-milliseconds taken by all map tasks=8398848
Map-Reduce Framework
Map input records=3
Map output records=3
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=72
CPU time spent (ms)=1510
Physical memory (bytes) snapshot=164237312
Virtual memory (bytes) snapshot=2122526720
Total committed heap usage (bytes)=108527616
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=40
18/05/22 13:36:19 INFO mapreduce.ImportJobBase: Transferred 40 bytes in 28.4436 seconds (1.4063 bytes/sec)
18/05/22 13:36:19 INFO mapreduce.ImportJobBase: Retrieved 3 records.
[aaa@qq.com sqoop-1.4.7]#
上一篇: 现在我把眼光降低了