sqoop的安装及简单使用
程序员文章站
2022-04-19 11:15:21
...
一、说明
1、sqoop是sql to hadoop的缩写。
2、连接传统关系型数据库和Hadoop的桥梁,把关系型数据库的数据导入到 Hadoop 系统 ( 如 HDFS、HBase 和 Hive) 中; 把数据从 Hadoop 系统里抽取并导出到关系型数据库中;
3、利用MapReduce,批处理方式进行数据传输
4、sqoop有两个版本,sqoop1(1.4.x)和sqoop2(1.99.x);因为官方并不建议在生产环境中使用sqoop2,所以这里使用的是sqoop1;
5、sqoop1底层是通过MapReduce来实现的,而且是只有map没有reduce的;
6、sqoop的执行是要提交yarn的;
7、本篇测试前要启动hdoop(包括yarn),mysql;
二、sqoop1的安装
1、下载位置
http://mirror.bit.edu.cn/apache/sqoop/1.4.7/
2、将下载的安装包放到linux某一目录下并解压重命名
[aaa@qq.com sqoop]# pwd
/opt/softWare/sqoop
[aaa@qq.com sqoop]# ls
sqoop-1.4.7 sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
mv sqoop-1.4.7.bin__hadoop-2.6.0 sqoop-1.4.7
3、配置环境变量
vim /etc/profile
#sqoop
export SQOOP_HOME=/opt/softWare/sqoop/sqoop-1.4.7
export PATH=$PATH:$SQOOP_HOME/bin
source /etc/profile
4、修改sqoop的配置文件
cd conf/
cp sqoop-env-template.sh sqoop-env.sh
vim sqoop-env.sh
#Set the path for where zookeper config dir is
#export ZOOCFGDIR=
export HADOOP_COMMON_HOME=/opt/softWare/hadoop/hadoop-2.7.3
export HADOOP_MAPRED_HOME=/opt/softWare/hadoop/hadoop-2.7.3
export HIVE_HOME=/opt/softWare/hive/apache-hive-2.1.1-bin
export HBASE_HOME=/opt/softWare/hbase/hbase-1.2.6
在配置文件最后一行增加上面内容;
5、将mysql的驱动包上传到sqoop的lib目录下
mysql-connector-java-5.1.47.jar
6、查看sqoop的命令
[aaa@qq.com bin]# ./sqoop help
Warning: /opt/softWare/sqoop/sqoop-1.4.7/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/softWare/sqoop/sqoop-1.4.7/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /opt/softWare/sqoop/sqoop-1.4.7/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
20/06/29 10:43:28 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
usage: sqoop COMMAND [ARGS]
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
See 'sqoop help COMMAND' for information on a specific command.
7、使用sqoop查询mysql中的数据库
[aaa@qq.com bin]# ./sqoop list-databases --connect jdbc:mysql://192.168.230.21:3306/test?characterEncoding=UTF-8 --username root --password '123456'
Warning: /opt/softWare/sqoop/sqoop-1.4.7/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/softWare/sqoop/sqoop-1.4.7/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /opt/softWare/sqoop/sqoop-1.4.7/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
20/06/29 10:44:06 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
20/06/29 10:44:06 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
20/06/29 10:44:06 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/softWare/hadoop/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/softWare/hbase/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
information_schema
hive
ke
mysql
performance_schema
test
8、mysql下的stu的表数据为
mysql> select * from stu;
+----------+------+----------+
| name | age | address |
+----------+------+----------+
| zhangsan | 20 | henan |
| lisi | 20 | hebei |
| wangwu | 20 | beijing |
| liuqi | 20 | shandong |
| xuwei | 20 | fujian |
+----------+------+----------+
5 rows in set (0.00 sec)
9、将mysql下的stu表数据导入到hdfs
[aaa@qq.com bin]# ./sqoop import -m 1 --connect jdbc:mysql://192.168.230.21:3306/mysql --username root --password 123456 --table stu --target-dir /user/sqoop/datatest
Warning: /opt/softWare/sqoop/sqoop-1.4.7/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/softWare/sqoop/sqoop-1.4.7/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /opt/softWare/sqoop/sqoop-1.4.7/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
20/06/29 09:52:29 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
20/06/29 09:52:29 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
20/06/29 09:52:29 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
20/06/29 09:52:29 INFO tool.CodeGenTool: Beginning code generation
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/softWare/hadoop/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/softWare/hbase/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
20/06/29 09:52:29 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `stu` AS t LIMIT 1
20/06/29 09:52:29 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `stu` AS t LIMIT 1
20/06/29 09:52:29 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/softWare/hadoop/hadoop-2.7.3
Note: /tmp/sqoop-root/compile/de2079aebddc781b7b85a54da470b803/stu.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
20/06/29 09:52:31 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/de2079aebddc781b7b85a54da470b803/stu.jar
20/06/29 09:52:31 WARN manager.MySQLManager: It looks like you are importing from mysql.
20/06/29 09:52:31 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
20/06/29 09:52:31 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
20/06/29 09:52:31 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
20/06/29 09:52:31 INFO mapreduce.ImportJobBase: Beginning import of stu
20/06/29 09:52:31 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
20/06/29 09:52:32 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
20/06/29 09:52:32 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.230.21:8032
20/06/29 09:52:46 INFO db.DBInputFormat: Using read commited transaction isolation
20/06/29 09:52:46 INFO mapreduce.JobSubmitter: number of splits:1
20/06/29 09:52:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1593394152340_0001
20/06/29 09:52:48 INFO impl.YarnClientImpl: Submitted application application_1593394152340_0001
20/06/29 09:52:48 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1593394152340_0001/
20/06/29 09:52:48 INFO mapreduce.Job: Running job: job_1593394152340_0001
20/06/29 09:53:00 INFO mapreduce.Job: Job job_1593394152340_0001 running in uber mode : false
20/06/29 09:53:00 INFO mapreduce.Job: map 0% reduce 0%
20/06/29 09:53:11 INFO mapreduce.Job: map 100% reduce 0%
20/06/29 09:53:11 INFO mapreduce.Job: Job job_1593394152340_0001 completed successfully
20/06/29 09:53:12 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=138019
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=84
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=7657
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=7657
Total vcore-milliseconds taken by all map tasks=7657
Total megabyte-milliseconds taken by all map tasks=7840768
Map-Reduce Framework
Map input records=5
Map output records=5
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=77
CPU time spent (ms)=990
Physical memory (bytes) snapshot=114147328
Virtual memory (bytes) snapshot=2082172928
Total committed heap usage (bytes)=42663936
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=84
20/06/29 09:53:12 INFO mapreduce.ImportJobBase: Transferred 84 bytes in 39.4567 seconds (2.1289 bytes/sec)
20/06/29 09:53:12 INFO mapreduce.ImportJobBase: Retrieved 5 records.
注意:-m:表明需要使用几个map任务并发执行
10、查看效果
[aaa@qq.com ~]# hdfs dfs -ls /user/sqoop/datatest
Found 2 items
-rw-r--r-- 1 root supergroup 0 2020-06-29 09:53 /user/sqoop/datatest/_SUCCESS
-rw-r--r-- 1 root supergroup 84 2020-06-29 09:53 /user/sqoop/datatest/part-m-00000
[aaa@qq.com ~]# hdfs dfs -cat /user/sqoop/datatest/part-m-0000
cat: `/user/sqoop/datatest/part-m-0000': No such file or directory
[aaa@qq.com ~]# hdfs dfs -cat /user/sqoop/datatest/part-m-00000
zhangsan,20,henan
lisi,20,hebei
wangwu,20,beijing
liuqi,20,shandong
xuwei,20,fujian
[aaa@qq.com ~]#