欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

超简单的Sqoop入门教程

程序员文章站 2022-04-19 18:38:06
...

1、下载

http://sqoop.apache.org/
https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/

超简单的Sqoop入门教程
对于sqoop-1.4.x.tar.gz软件包,从1.4.5开始以后版本安装包的lib下就已经没有了sqoop-1.4.x.jar啦,我们这里需要下载sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz。

[root@node1 ~]# wget http://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
[root@node1 ~]# tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz

将解压后的目录移动到/opt下

[root@node1 ~]# mv sqoop-1.4.7.bin__hadoop-2.6.0 /opt/sqoop-1.4.7

2、配置

进入目录

[aaa@qq.com ~]# cd sqoop-1.4.7
[aaa@qq.com sqoop-1.4.7]# ls
bin        CHANGELOG.txt  conf  ivy      lib          NOTICE.txt   README.txt       sqoop-patch-review.py  src
build.xml  COMPILING.txt  docs  ivy.xml  LICENSE.txt  pom-old.xml  sqoop-1.4.7.jar  sqoop-test-1.4.7.jar   testdata
[aaa@qq.com sqoop-1.4.7]#

编译配置文件

[root@node1 sqoop-1.4.7]# vi bin/configure-sqoop
## Moved to be a runtime check in sqoop.
#if [ ! -d "${HCAT_HOME}" ]; then
#  echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
#  echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
#fi

#if [ ! -d "${ACCUMULO_HOME}" ]; then
#  echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
#  echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
#fi
#if [ ! -d "${ZOOKEEPER_HOME}" ]; then
#  echo "Warning: $ZOOKEEPER_HOME does not exist! Accumulo imports will fail."
#  echo 'Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.'
#f

3、测试MySQL数据库

(1)复制驱动程序

[root@node1 mysql-connector-java-5.1.46]# cp mysql-connector-java-5.1.46-bin.jar /opt/sqoop-1.4.7/lib/

(2)测试连接

[aaa@qq.com sqoop-1.4.7]# bin/sqoop list-databases --connect jdbc:mysql://node3:3306/ --username root --password 123456
18/04/03 10:34:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
18/04/03 10:34:56 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/04/03 10:34:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
hive
mysql
performance_schema
test
webMagic
[aaa@qq.com sqoop-1.4.7]#

(3)将MySQL中数据表导入HDFS

[aaa@qq.com sqoop-1.4.7]# bin/sqoop import --connect jdbc:mysql://node3:3306/webMagic --table csdnblog --username root --password 123456 -m 1
18/04/03 10:50:12 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
18/04/03 10:50:12 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/04/03 10:50:12 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/04/03 10:50:12 INFO tool.CodeGenTool: Beginning code generation
18/04/03 10:50:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `csdnblog` AS t LIMIT 1
18/04/03 10:50:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `csdnblog` AS t LIMIT 1
18/04/03 10:50:13 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/hadoop-2.7.3
Note: /tmp/sqoop-root/compile/3e1f797b0f1cc2b27c58837b45ea81c1/csdnblog.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/04/03 10:50:19 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/3e1f797b0f1cc2b27c58837b45ea81c1/csdnblog.jar
18/04/03 10:50:19 WARN manager.MySQLManager: It looks like you are importing from mysql.
18/04/03 10:50:19 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
18/04/03 10:50:19 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
18/04/03 10:50:19 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
18/04/03 10:50:19 INFO mapreduce.ImportJobBase: Beginning import of csdnblog
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/04/03 10:50:20 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/04/03 10:50:23 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/04/03 10:50:36 INFO db.DBInputFormat: Using read commited transaction isolation
18/04/03 10:50:37 INFO mapreduce.JobSubmitter: number of splits:1
18/04/03 10:50:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1522766299245_0001
18/04/03 10:50:41 INFO impl.YarnClientImpl: Submitted application application_1522766299245_0001
18/04/03 10:50:42 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1522766299245_0001/
18/04/03 10:50:42 INFO mapreduce.Job: Running job: job_1522766299245_0001
18/04/03 10:51:04 INFO mapreduce.Job: Job job_1522766299245_0001 running in uber mode : false
18/04/03 10:51:04 INFO mapreduce.Job:  map 0% reduce 0%
18/04/03 10:51:48 INFO mapreduce.Job:  map 100% reduce 0%
18/04/03 10:51:52 INFO mapreduce.Job: Job job_1522766299245_0001 completed successfully
18/04/03 10:51:52 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=138180
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=87
        HDFS: Number of bytes written=30206
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=40637
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=40637
        Total vcore-milliseconds taken by all map tasks=40637
        Total megabyte-milliseconds taken by all map tasks=41612288
    Map-Reduce Framework
        Map input records=182
        Map output records=182
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=169
        CPU time spent (ms)=2190
        Physical memory (bytes) snapshot=140386304
        Virtual memory (bytes) snapshot=2152366080
        Total committed heap usage (bytes)=44564480
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=30206
18/04/03 10:51:52 INFO mapreduce.ImportJobBase: Transferred 29.498 KB in 89.0977 seconds (339.0213 bytes/sec)
18/04/03 10:51:52 INFO mapreduce.ImportJobBase: Retrieved 182 records.
[aaa@qq.com sqoop-1.4.7]# 

(4)查看HDFS上新导入数据

[root@node1 sqoop-1.4.7]# hdfs dfs -ls /user/root
Found 3 items
drwxr-xr-x   - root supergroup          0 2017-09-22 10:31 /user/root/.sparkStaging
drwxr-xr-x   - root supergroup          0 2018-04-03 10:51 /user/root/csdnblog
drwxr-xr-x   - root supergroup          0 2017-10-05 09:40 /user/root/input
[root@node1 sqoop-1.4.7]# hdfs dfs -ls /user/root/csdnblog
Found 2 items
-rw-r--r--   3 root supergroup          0 2018-04-03 10:51 /user/root/csdnblog/_SUCCESS
-rw-r--r--   3 root supergroup      30206 2018-04-03 10:51 /user/root/csdnblog/part-m-00000
[root@node1 sqoop-1.4.7]#

4、可能遇到的错误

[aaa@qq.com sqoop-1.4.7]# bin/sqoop import --connect jdbc:mysql://10.17.12.160:3306/test --table users --username root --password 123456 -m 1
Warning: /opt/sqoop-1.4.7/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
18/05/22 13:32:02 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
18/05/22 13:32:02 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/05/22 13:32:02 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/05/22 13:32:02 INFO tool.CodeGenTool: Beginning code generation
18/05/22 13:32:03 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `users` AS t LIMIT 1
18/05/22 13:32:03 ERROR manager.SqlManager: Error reading from database: java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@3f197a46 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@3f197a46 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:931)
    at com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingData(MysqlIO.java:2518)
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1748)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1961)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2537)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2466)
    at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1383)
    at com.mysql.jdbc.ConnectionImpl.getMaxBytesPerChar(ConnectionImpl.java:2939)
    at com.mysql.jdbc.Field.getMaxBytesPerCharacter(Field.java:576)
    at com.mysql.jdbc.ResultSetMetaData.getPrecision(ResultSetMetaData.java:440)
    at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:305)
    at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260)
    at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:246)
    at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:327)
    at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1872)
    at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1671)
    at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:501)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
18/05/22 13:32:03 ERROR tool.ImportTool: Import failed: java.io.IOException: No columns to generate for ClassWriter
    at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1677)
    at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:501)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:252)

[aaa@qq.com sqoop-1.4.7]# 

解决办法,加上驱动参数

[aaa@qq.com sqoop-1.4.7]# bin/sqoop import --connect jdbc:mysql://10.17.12.160:3306/test --table users --username root --password 123456 -m 1 --driver com.mysql.jdbc.Driver
Warning: /opt/sqoop-1.4.7/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
18/05/22 13:35:47 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
18/05/22 13:35:48 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/05/22 13:35:48 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
18/05/22 13:35:48 INFO manager.SqlManager: Using default fetchSize of 1000
18/05/22 13:35:48 INFO tool.CodeGenTool: Beginning code generation
18/05/22 13:35:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM users AS t WHERE 1=0
18/05/22 13:35:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM users AS t WHERE 1=0
18/05/22 13:35:48 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/hadoop-2.7.5
Note: /tmp/sqoop-root/compile/054354d9b9af11ea0e4c96f467800c74/users.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/05/22 13:35:50 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/054354d9b9af11ea0e4c96f467800c74/users.jar
18/05/22 13:35:50 INFO mapreduce.ImportJobBase: Beginning import of users
18/05/22 13:35:50 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/05/22 13:35:50 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM users AS t WHERE 1=0
18/05/22 13:35:51 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/05/22 13:35:51 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
18/05/22 13:35:56 INFO db.DBInputFormat: Using read commited transaction isolation
18/05/22 13:35:56 INFO mapreduce.JobSubmitter: number of splits:1
18/05/22 13:35:56 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1526097883376_0001
18/05/22 13:35:57 INFO impl.YarnClientImpl: Submitted application application_1526097883376_0001
18/05/22 13:35:57 INFO mapreduce.Job: The url to track the job: http://bigdata03-test:8088/proxy/application_1526097883376_0001/
18/05/22 13:35:57 INFO mapreduce.Job: Running job: job_1526097883376_0001
18/05/22 13:36:07 INFO mapreduce.Job: Job job_1526097883376_0001 running in uber mode : false
18/05/22 13:36:07 INFO mapreduce.Job:  map 0% reduce 0%
18/05/22 13:36:18 INFO mapreduce.Job:  map 100% reduce 0%
18/05/22 13:36:19 INFO mapreduce.Job: Job job_1526097883376_0001 completed successfully
18/05/22 13:36:19 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=142003
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=87
        HDFS: Number of bytes written=40
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=8202
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=8202
        Total vcore-milliseconds taken by all map tasks=8202
        Total megabyte-milliseconds taken by all map tasks=8398848
    Map-Reduce Framework
        Map input records=3
        Map output records=3
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=72
        CPU time spent (ms)=1510
        Physical memory (bytes) snapshot=164237312
        Virtual memory (bytes) snapshot=2122526720
        Total committed heap usage (bytes)=108527616
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=40
18/05/22 13:36:19 INFO mapreduce.ImportJobBase: Transferred 40 bytes in 28.4436 seconds (1.4063 bytes/sec)
18/05/22 13:36:19 INFO mapreduce.ImportJobBase: Retrieved 3 records.
[aaa@qq.com sqoop-1.4.7]# 
相关标签: Sqoop