欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Kettle与Hadoop(七)执行Sqoop作业

程序员文章站 2024-02-22 20:13:34
...

目录

一、Sqoop export

二、Sqoop import


一、Sqoop export

1. 建立一个作业,将HDFS文件导出到MySQL数据库。
(1)打开PDI,新建一个作业,如图1所示。

Kettle与Hadoop(七)执行Sqoop作业
图1

 

(2)编辑'Sqoop export'作业项,如图2所示。

Kettle与Hadoop(七)执行Sqoop作业
图2

        说明:

Kettle与Hadoop(七)执行Sqoop作业
图3
  • sales_order是MySQL中的目标表,结构与part-m-00000文件匹配,如图4所示。
Kettle与Hadoop(七)执行Sqoop作业
图4

2. 保存并执行作业,日志如下所示。

2020/06/08 09:28:40 - Spoon - Starting job...
2020/06/08 09:28:40 - sqoop_export - Start of job execution
2020/06/08 09:28:40 - sqoop_export - Starting entry [Sqoop export]
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - fs.default.name is deprecated. Instead, use fs.defaultFS
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - Running Sqoop version: 1.4.7-cdh6.1.0
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - Setting your password on the command-line is insecure. Consider using -P instead.
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - Preparing to use a MySQL streaming resultset.
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - Beginning code generation
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - Executing SQL statement: SELECT t.* FROM `sales_order` AS t LIMIT 1
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - Executing SQL statement: SELECT t.* FROM `sales_order` AS t LIMIT 1
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - $HADOOP_MAPRED_HOME is not set
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Writing jar file: /tmp/sqoop-root/compile/7c09fc949771ba78a584cc07c1cd6b3f/sales_order.jar
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Beginning export of sales_order
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - SQOOP_HOME is unset. May not be able to find all job dependencies.
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Connecting to ResourceManager at manager/172.16.1.124:8032
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Disabling Erasure Coding for path: /user/root/.staging/job_1591323999364_0046
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Total input files to process : 1
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Total input files to process : 1
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - number of splits:4
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - fs.default.name is deprecated. Instead, use fs.defaultFS
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Submitting tokens for job: job_1591323999364_0046
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Executing with tokens: []
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Submitted application application_1591323999364_0046
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - The url to track the job: http://manager:8088/proxy/application_1591323999364_0046/
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Running job: job_1591323999364_0046
2020/06/08 09:28:51 - Sqoop export - 2020/06/08 09:28:51 - Job job_1591323999364_0046 running in uber mode : false
2020/06/08 09:28:51 - Sqoop export - 2020/06/08 09:28:51 -  map 0% reduce 0%
2020/06/08 09:28:58 - Sqoop export - 2020/06/08 09:28:58 -  map 75% reduce 0%
2020/06/08 09:28:59 - Sqoop export - 2020/06/08 09:28:59 -  map 100% reduce 0%
2020/06/08 09:28:59 - Sqoop export - 2020/06/08 09:28:59 - Job job_1591323999364_0046 completed successfully
2020/06/08 09:28:59 - Sqoop export - 2020/06/08 09:28:59 - Counters: 34
2020/06/08 09:28:59 - Sqoop export -     File System Counters
2020/06/08 09:28:59 - Sqoop export -         FILE: Number of bytes read=0
2020/06/08 09:28:59 - Sqoop export -         FILE: Number of bytes written=968088
2020/06/08 09:28:59 - Sqoop export -         FILE: Number of read operations=0
2020/06/08 09:28:59 - Sqoop export -         FILE: Number of large read operations=0
2020/06/08 09:28:59 - Sqoop export -         FILE: Number of write operations=0
2020/06/08 09:28:59 - Sqoop export -         HDFS: Number of bytes read=15868
2020/06/08 09:28:59 - Sqoop export -         HDFS: Number of bytes written=0
2020/06/08 09:28:59 - Sqoop export -         HDFS: Number of read operations=16
2020/06/08 09:28:59 - Sqoop export -         HDFS: Number of large read operations=0
2020/06/08 09:28:59 - Sqoop export -         HDFS: Number of write operations=0
2020/06/08 09:28:59 - Sqoop export -         HDFS: Number of bytes read erasure-coded=0
2020/06/08 09:28:59 - Sqoop export -     Job Counters 
2020/06/08 09:28:59 - Sqoop export -         Launched map tasks=4
2020/06/08 09:28:59 - Sqoop export -         Data-local map tasks=2
2020/06/08 09:28:59 - Sqoop export -         Rack-local map tasks=2
2020/06/08 09:28:59 - Sqoop export -         Total time spent by all maps in occupied slots (ms)=18912
2020/06/08 09:28:59 - Sqoop export -         Total time spent by all reduces in occupied slots (ms)=0
2020/06/08 09:28:59 - Sqoop export -         Total time spent by all map tasks (ms)=18912
2020/06/08 09:28:59 - Sqoop export -         Total vcore-milliseconds taken by all map tasks=18912
2020/06/08 09:28:59 - Sqoop export -         Total megabyte-milliseconds taken by all map tasks=19365888
2020/06/08 09:28:59 - Sqoop export -     Map-Reduce Framework
2020/06/08 09:28:59 - Sqoop export -         Map input records=102
2020/06/08 09:28:59 - Sqoop export -         Map output records=102
2020/06/08 09:28:59 - Sqoop export -         Input split bytes=556
2020/06/08 09:28:59 - Sqoop export -         Spilled Records=0
2020/06/08 09:28:59 - Sqoop export -         Failed Shuffles=0
2020/06/08 09:28:59 - Sqoop export -         Merged Map outputs=0
2020/06/08 09:28:59 - Sqoop export -         GC time elapsed (ms)=351
2020/06/08 09:28:59 - Sqoop export -         CPU time spent (ms)=5660
2020/06/08 09:28:59 - Sqoop export -         Physical memory (bytes) snapshot=857292800
2020/06/08 09:28:59 - Sqoop export -         Virtual memory (bytes) snapshot=10344894464
2020/06/08 09:28:59 - Sqoop export -         Total committed heap usage (bytes)=739770368
2020/06/08 09:28:59 - Sqoop export -         Peak Map Physical memory (bytes)=228241408
2020/06/08 09:28:59 - Sqoop export -         Peak Map Virtual memory (bytes)=2586439680
2020/06/08 09:28:59 - Sqoop export -     File Input Format Counters 
2020/06/08 09:28:59 - Sqoop export -         Bytes Read=0
2020/06/08 09:28:59 - Sqoop export -     File Output Format Counters 
2020/06/08 09:28:59 - Sqoop export -         Bytes Written=0
2020/06/08 09:28:59 - Sqoop export - 2020/06/08 09:28:59 - Transferred 15.4961 KB in 18.7657 seconds (845.5859 bytes/sec)
2020/06/08 09:28:59 - Sqoop export - 2020/06/08 09:28:59 - Exported 102 records.
2020/06/08 09:28:59 - sqoop_export - Finished job entry [Sqoop export] (result=[true])
2020/06/08 09:28:59 - sqoop_export - Job execution finished
2020/06/08 09:28:59 - Spoon - Job has ended.

        从日志中可以看到,作业已经成功执行。在作业所在目录,会生成一个名为sales_order.java的文件。

3. 检查MySQL表,结果如图5所示。

Kettle与Hadoop(七)执行Sqoop作业
图5

        从图5可以看到,通过Sqoop export作业项将HDFS文件内容导出到MySQL表中。
 

二、Sqoop import

1. 建立一个作业,将MySQL表数据导入到HDFS。
(1)打开PDI,新建一个作业,如图6所示。

Kettle与Hadoop(七)执行Sqoop作业
图6

(2)编辑'Sqoop import'作业项,如图7所示。

Kettle与Hadoop(七)执行Sqoop作业
图7

        说明:

  • /user/root/sales_order是HDFS的目标目录,该目录应该尚不存在。

2. 保存并执行作业,日志如下所示。

2020/06/08 10:00:52 - Spoon - Starting job...
2020/06/08 10:00:52 - sqoop_import - Start of job execution
2020/06/08 10:00:52 - sqoop_import - Starting entry [Sqoop import]
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - fs.default.name is deprecated. Instead, use fs.defaultFS
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - Running Sqoop version: 1.4.7-cdh6.1.0
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - Setting your password on the command-line is insecure. Consider using -P instead.
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - Preparing to use a MySQL streaming resultset.
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - Beginning code generation
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - Executing SQL statement: SELECT t.* FROM `sales_order` AS t LIMIT 1
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - Executing SQL statement: SELECT t.* FROM `sales_order` AS t LIMIT 1
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - $HADOOP_MAPRED_HOME is not set
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - Writing jar file: /tmp/sqoop-root/compile/7c09fc949771ba78a584cc07c1cd6b3f/sales_order.jar
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - Setting zero DATETIME behavior to convertToNull (mysql)
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - Beginning import of sales_order
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - SQOOP_HOME is unset. May not be able to find all job dependencies.
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - Connecting to ResourceManager at manager/172.16.1.124:8032
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - Disabling Erasure Coding for path: /user/root/.staging/job_1591323999364_0048
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - Using read commited transaction isolation
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - BoundingValsQuery: SELECT MIN(`order_number`), MAX(`order_number`) FROM `sales_order`
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - Split size: 25; Num splits: 4 from: 101 to: 202
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - number of splits:4
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - fs.default.name is deprecated. Instead, use fs.defaultFS
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - Submitting tokens for job: job_1591323999364_0048
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - Executing with tokens: []
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - Submitted application application_1591323999364_0048
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - The url to track the job: http://manager:8088/proxy/application_1591323999364_0048/
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - Running job: job_1591323999364_0048
2020/06/08 10:01:02 - Sqoop import - 2020/06/08 10:01:02 - Job job_1591323999364_0048 running in uber mode : false
2020/06/08 10:01:02 - Sqoop import - 2020/06/08 10:01:02 -  map 0% reduce 0%
2020/06/08 10:01:09 - Sqoop import - 2020/06/08 10:01:09 -  map 75% reduce 0%
2020/06/08 10:01:11 - Sqoop import - 2020/06/08 10:01:11 -  map 100% reduce 0%
2020/06/08 10:01:11 - Sqoop import - 2020/06/08 10:01:11 - Job job_1591323999364_0048 completed successfully
2020/06/08 10:01:11 - Sqoop import - 2020/06/08 10:01:11 - Counters: 33
2020/06/08 10:01:11 - Sqoop import -     File System Counters
2020/06/08 10:01:11 - Sqoop import -         FILE: Number of bytes read=0
2020/06/08 10:01:11 - Sqoop import -         FILE: Number of bytes written=969236
2020/06/08 10:01:11 - Sqoop import -         FILE: Number of read operations=0
2020/06/08 10:01:11 - Sqoop import -         FILE: Number of large read operations=0
2020/06/08 10:01:11 - Sqoop import -         FILE: Number of write operations=0
2020/06/08 10:01:11 - Sqoop import -         HDFS: Number of bytes read=481
2020/06/08 10:01:11 - Sqoop import -         HDFS: Number of bytes written=6120
2020/06/08 10:01:11 - Sqoop import -         HDFS: Number of read operations=24
2020/06/08 10:01:11 - Sqoop import -         HDFS: Number of large read operations=0
2020/06/08 10:01:11 - Sqoop import -         HDFS: Number of write operations=8
2020/06/08 10:01:11 - Sqoop import -         HDFS: Number of bytes read erasure-coded=0
2020/06/08 10:01:11 - Sqoop import -     Job Counters 
2020/06/08 10:01:11 - Sqoop import -         Launched map tasks=4
2020/06/08 10:01:11 - Sqoop import -         Other local map tasks=4
2020/06/08 10:01:11 - Sqoop import -         Total time spent by all maps in occupied slots (ms)=19048
2020/06/08 10:01:11 - Sqoop import -         Total time spent by all reduces in occupied slots (ms)=0
2020/06/08 10:01:11 - Sqoop import -         Total time spent by all map tasks (ms)=19048
2020/06/08 10:01:11 - Sqoop import -         Total vcore-milliseconds taken by all map tasks=19048
2020/06/08 10:01:11 - Sqoop import -         Total megabyte-milliseconds taken by all map tasks=19505152
2020/06/08 10:01:11 - Sqoop import -     Map-Reduce Framework
2020/06/08 10:01:11 - Sqoop import -         Map input records=102
2020/06/08 10:01:11 - Sqoop import -         Map output records=102
2020/06/08 10:01:11 - Sqoop import -         Input split bytes=481
2020/06/08 10:01:11 - Sqoop import -         Spilled Records=0
2020/06/08 10:01:11 - Sqoop import -         Failed Shuffles=0
2020/06/08 10:01:11 - Sqoop import -         Merged Map outputs=0
2020/06/08 10:01:11 - Sqoop import -         GC time elapsed (ms)=500
2020/06/08 10:01:11 - Sqoop import -         CPU time spent (ms)=5680
2020/06/08 10:01:11 - Sqoop import -         Physical memory (bytes) snapshot=783245312
2020/06/08 10:01:11 - Sqoop import -         Virtual memory (bytes) snapshot=10357563392
2020/06/08 10:01:11 - Sqoop import -         Total committed heap usage (bytes)=659554304
2020/06/08 10:01:11 - Sqoop import -         Peak Map Physical memory (bytes)=202842112
2020/06/08 10:01:11 - Sqoop import -         Peak Map Virtual memory (bytes)=2591666176
2020/06/08 10:01:11 - Sqoop import -     File Input Format Counters 
2020/06/08 10:01:11 - Sqoop import -         Bytes Read=0
2020/06/08 10:01:11 - Sqoop import -     File Output Format Counters 
2020/06/08 10:01:11 - Sqoop import -         Bytes Written=6120
2020/06/08 10:01:11 - Sqoop import - 2020/06/08 10:01:11 - Transferred 5.9766 KB in 18.733 seconds (326.697 bytes/sec)
2020/06/08 10:01:11 - Sqoop import - 2020/06/08 10:01:11 - Retrieved 102 records.
2020/06/08 10:01:11 - sqoop_import - Finished job entry [Sqoop import] (result=[true])
2020/06/08 10:01:11 - sqoop_import - Job execution finished
2020/06/08 10:01:11 - Spoon - Job has ended.

        从日志中可以看到,作业已经成功执行。在作业所在目录,会生成一个名为sales_order.java的文件。

3. 检查HDFS目标目录,结果如图8所示。

Kettle与Hadoop(七)执行Sqoop作业
图8

        从图8可以看到,通过Sqoop import作业项将MySQL表中数据导入到HDFS中。