ubuntu中Hadoop和Spark平台的安装
软硬件环境
名称 | 版本 |
系统 | Ubuntu 18.04.4 LTS |
内存 | 7.5GiB |
处理器 | Intel Core i7-8565U CPU @ 1.80GHz *8 |
图形 | Intel UHD Graphics(Whiskey Lake 3*8 GT2) |
GNOME | 3.28.2 |
操作系统类型 | 64位 |
磁盘 | 251.0 GB |
Hadoop | 2.10.0 |
Spark | 2.3.4 |
步骤
①安装ssh
1 aaa@qq.com:~$ sudo apt-get install openssh-server
2 [sudo] acat 的密码:
3 正在读取软件包列表... 完成
4 正在分析软件包的依赖关系树
5 正在读取状态信息... 完成
6 openssh-server 已经是最新版 (1:7.6p1-4ubuntu0.3)。
7 下列软件包是自动安装的并且现在不需要了:
8 fonts-wine gir1.2-geocodeglib-1.0 libfwup1 libglade2.0-cil libglib2.0-cil
9 libgtk2.0-cil libmono-cairo4.0-cil libstartup-notification0:i386 libwine
10 libwine:i386 libxcb-util1:i386 ubuntu-web-launchers wine32:i386 wine64
11 使用'sudo apt autoremove'来卸载它(它们)。
12 升级了 0 个软件包,新安装了 0 个软件包,要卸载 0 个软件包,有 83 个软件包未被升级。
②配置ssh为无密码登录
1 aaa@qq.com:~$ cd ~/.ssh/
2 aaa@qq.com:.ssh$ ls
3 authorized_keys id_rsa id_rsa.pub known_hosts
4 aaa@qq.com:.ssh$ ssh-****** -t rsa
5 Generating public/private rsa key pair.
6 Enter file in which to save the key (/home/acat/.ssh/id_rsa):
7 /home/acat/.ssh/id_rsa already exists.
8 Overwrite (y/n)?
9 aaa@qq.com:.ssh$ ls
10 authorized_keys id_rsa id_rsa.pub known_hosts
11 aaa@qq.com:.ssh$ cat ./id_rsa.pub >> ./authorized_keys
③配置Java环境
下载java for linux软件包,并解压到目录:/home/acat/softwares/jdk1.8.0_161。然后编辑家目录下的.bashrc文件,添加如下内容:
export JAVA_HOME=/home/acat/softwares/jdk1.8.0_161
export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PAT
然后:wq保存退出。
查看Java配置是否成功。
1 aaa@qq.com:~$ java -version
2 java version "1.8.0_161"
3 Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
4 Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)
④安装Hadoop2
下载hadoop-2.10.0-src.tar.gz,并解压缩到/usr/local目录下,并把解压缩之后的文件夹hadoop-2.10.0-src重命名为hadoop。
在.bashrc文件中配置Hadoop相关的环境变量
export PATH=/usr/local/hadoop/sbin:$PATH
export PATH=/usr/local/hadoop/bin:$PAT
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
查看Hadoop的版本
1 aaa@qq.com:~$ hadoop version
2 Hadoop 2.10.0
3 Subversion ssh://git.corp.linkedin.com:29418/hadoop/hadoop.git -r e2f1f118e465e787d8567dfa6e2f3b72a0eb9194
4 Compiled by jhung on 2019-10-22T19:10Z
5 Compiled with protoc 2.5.0
6 From source with checksum 7b2d8877c5ce8c9a2cca5c7e81aa4026
7 This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.10.0.jar
⑤Hadoop伪分布式配置
Hadoop 可以在单节点上以伪分布式的方式运行,Hadoop 进程以分离的 Java 进程来运行,节点既作为 NameNode 也作为 DataNode,同时,读取的是 HDFS 中的文件。
Hadoop 的配置文件位于 /usr/local/hadoop/etc/hadoop/ 中,伪分布式需要修改2个配置文件 core-site.xml 和 hdfs-site.xml 。Hadoop的配置文件是 xml 格式,每个配置以声明 property 的 name 和 value 的方式来实现。
首先,修改core-site.xml文件为
1 <configuration>
2 <property>
3 <name>hadoop.tmp.dir</name>
4 <value>file:/usr/local/hadoop/tmp</value>
5 <description>Abase for other temporary directories.</description>
6 </property>
7 <property>
8 <name>fs.defaultFS</name>
9 <value>hdfs://localhost:9000</value>
10 </property>
11 </configuration>
然后,修改hdfs-site.xml为
1 <configuration>
2 <property>
3 <name>dfs.replication</name>
4 <value>1</value>
5 </property>
6 <property>
7 <name>dfs.namenode.name.dir</name>
8 <value>file:/usr/local/hadoop/tmp/dfs/name</value>
9 </property>
10 <property>
11 <name>dfs.datanode.data.dir</name>
12 <value>file:/usr/local/hadoop/tmp/dfs/data</value>
13 </property>
14 </configuration>
⑥下面进行NameNode的格式化
1 aaa@qq.com:hadoop$ stop-dfs.sh
2 aaa@qq.com:hadoop$ rm -r ./tmp
3 aaa@qq.com:hadoop$ hdfs namenode -format
4 ...省略若干行...
5 20/05/27 23:46:49 INFO util.GSet: capacity = 2^15 = 32768 entries
6 20/05/27 23:46:49 INFO namenode.FSImage: Allocated new BlockPoolId: BP-335173629-127.0.1.1-1590594409666
7 20/05/27 23:46:49 INFO common.Storage: Storage directory /usr/local/hadoop/tmp/dfs/name has been successfully formatted.
8 20/05/27 23:46:49 INFO namenode.FSImageFormatProtobuf: Saving image file /usr/local/hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
9 20/05/27 23:46:49 INFO namenode.FSImageFormatProtobuf: Image file /usr/local/hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 323 bytes saved in 0 seconds .
10 20/05/27 23:46:49 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
11 20/05/27 23:46:49 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid = 0 when meet shutdown.
12 20/05/27 23:46:49 INFO namenode.NameNode: SHUTDOWN_MSG:
13 /************************************************************
14 SHUTDOWN_MSG: Shutting down NameNode at acat-xx/127.0.1.1
15 ************************************************************/
⑦接着开启 NameNode 和 DataNode 守护进程。
1 aaa@qq.com:hadoop$ start-dfs.sh
2 20/05/27 23:47:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
3 Starting namenodes on [localhost]
4 localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-acat-namenode-acat-xx.out
5 localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-acat-datanode-acat-xx.out
6 Starting secondary namenodes [0.0.0.0]
7 0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-acat-secondarynamenode-acat-xx.out
8 20/05/27 23:47:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
9 aaa@qq.com:hadoop$ jps
10 8729 Jps
11 8588 SecondaryNameNode
12 8332 DataNode
13 8126 NameNode
可以看出,在开启NameNode和DataNode守护进程之后,下面就多了三个Jave进程,分别是SecondaryNameNode、DataNode和NameNode。SecondaryNameNode可以看做是NameNode的备节点,为了防止NameNode出现故障的情况可以及时切换到SecondaryNameNode,从而可以继续提供服务。
在成功启动之后,可以访问网址:http://localhost:50070/
⑧运行Hadoop伪分布式实例
首先创建hdfs格式的文件夹和文件。
1 aaa@qq.com:hadoop$ hdfs dfs -mkdir -p /usr/local/hadoop/
2 20/05/28 00:17:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
3 aaa@qq.com:hadoop$ hdfs dfs -mkdir input
4 20/05/28 00:17:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
5 aaa@qq.com:hadoop$ hdfs dfs -ls input
6 20/05/28 00:17:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
7 aaa@qq.com:hadoop$ hdfs dfs -put ./etc/hadoop/*.xml ./input/
8 20/05/28 00:18:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
9 aaa@qq.com:hadoop$ hdfs dfs -ls input
10 20/05/28 00:18:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
11 Found 8 items
12 -rw-r--r-- 1 acat supergroup 8814 2020-05-28 00:18 input/capacity-scheduler.xml
13 -rw-r--r-- 1 acat supergroup 1076 2020-05-28 00:18 input/core-site.xml
14 -rw-r--r-- 1 acat supergroup 10206 2020-05-28 00:18 input/hadoop-policy.xml
15 -rw-r--r-- 1 acat supergroup 1133 2020-05-28 00:18 input/hdfs-site.xml
16 -rw-r--r-- 1 acat supergroup 620 2020-05-28 00:18 input/httpfs-site.xml
17 -rw-r--r-- 1 acat supergroup 3518 2020-05-28 00:18 input/kms-acls.xml
18 -rw-r--r-- 1 acat supergroup 5939 2020-05-28 00:18 input/kms-site.xml
19 -rw-r--r-- 1 acat supergroup 690 2020-05-28 00:18 input/yarn-site.xml
运行脚本
1 aaa@qq.com:hadoop$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep input output 'dfs[a-z.]+'
2 ...此处省略若干行...
3 Shuffle Errors
4 BAD_ID=0
5 CONNECTION=0
6 IO_ERROR=0
7 WRONG_LENGTH=0
8 WRONG_MAP=0
9 WRONG_REDUCE=0
10 File Input Format Counters
11 Bytes Read=219
12 File Output Format Counters
13 Bytes Written=77
查看运行结果
1 aaa@qq.com:hadoop$ hdfs dfs -cat output/*
2 20/05/28 00:19:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
3 1 dfsadmin
4 1 dfs.replication
5 1 dfs.namenode.name.dir
6 1 dfs.datanode.data.dir
将运行结果保留在本地文件中
1 aaa@qq.com:hadoop$ ls
2 abc bin etc include lib libexec LICENSE.txt logs NOTICE.txt README.txt sbin share test.txt tmp
3 aaa@qq.com:hadoop$ hdfs dfs -get output ./output
4 20/05/28 00:20:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
5 aaa@qq.com:hadoop$ cat ./output/*
6 1 dfsadmin
7 1 dfs.replication
8 1 dfs.namenode.name.dir
9 1 dfs.datanode.data.dir
10 aaa@qq.com:hadoop$ ls
11 abc bin etc include lib libexec LICENSE.txt logs NOTICE.txt output README.txt sbin share test.txt tmp
⑨安装Spark
首先下载文件:spark-2.3.4-bin-without-hadoop.tgz。然后将其解压缩到/usr/local目录下,将文件夹名重命名为spark。
然后在/usr/local/spark/conf目录下创建脚本文件:spark-env.sh。然后向该文件中添加如下内容:
export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)
配置完成后就可以直接使用,不需要像Hadoop运行启动命令。
通过运行Spark自带的示例,验证Spark是否安装成功。
1 aaa@qq.com:spark$ ./bin/run-example SparkPi | grep "Pi is"
2 Pi is roughly 3.1446357231786157
可以看出,Spark已经配置成功。
实验结果。
上一篇: NSIS 打包记要
下一篇: 将python程序打包成.exe
推荐阅读
-
ubuntu中Hadoop和Spark平台的安装
-
Ubuntu下安装和使用OpenNMT翻译以及其中系统中遇到的问题
-
Linux和Windows平台下PHP中PDF支持库的安装及应用案例_PHP
-
Ubuntu 16.04中Docker的安装和代理配置教程
-
在Ubuntu 18上安装和运行Hadoop和Spark
-
在Docker中的ubuntu中安装Python3和Pip的问题
-
linux ubuntu中安装、卸载和删除python-igraph的方法教程
-
Hadoop安装和伪分布式平台的搭建
-
Ubuntu 16.04中Docker的安装和代理配置教程
-
Linux和Windows平台下PHP中PDF支持库的安装及应用案例_PHP