欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

ubuntu中Hadoop和Spark平台的安装

程序员文章站 2024-02-03 22:09:52
...

软硬件环境

 
 

名称    版本
系统 Ubuntu 18.04.4 LTS
内存  7.5GiB
处理器 Intel Core i7-8565U CPU @ 1.80GHz *8
图形 Intel UHD Graphics(Whiskey Lake 3*8 GT2)
GNOME 3.28.2
操作系统类型 64位
磁盘 251.0 GB
Hadoop 2.10.0
Spark 2.3.4

步骤

①安装ssh

 1 aaa@qq.com:~$ sudo apt-get install openssh-server
 2 [sudo] acat 的密码:
 3 正在读取软件包列表... 完成
 4 正在分析软件包的依赖关系树
 5 正在读取状态信息... 完成
 6 openssh-server 已经是最新版 (1:7.6p1-4ubuntu0.3)。
 7 下列软件包是自动安装的并且现在不需要了:
 8   fonts-wine gir1.2-geocodeglib-1.0 libfwup1 libglade2.0-cil libglib2.0-cil
 9   libgtk2.0-cil libmono-cairo4.0-cil libstartup-notification0:i386 libwine
10   libwine:i386 libxcb-util1:i386 ubuntu-web-launchers wine32:i386 wine64
11 使用'sudo apt autoremove'来卸载它(它们)。
12 升级了 0 个软件包,新安装了 0 个软件包,要卸载 0 个软件包,有 83 个软件包未被升级。

②配置ssh为无密码登录

 1 aaa@qq.com:~$ cd ~/.ssh/
 2 aaa@qq.com:.ssh$ ls
 3 authorized_keys  id_rsa  id_rsa.pub  known_hosts
 4 aaa@qq.com:.ssh$ ssh-****** -t rsa
 5 Generating public/private rsa key pair.
 6 Enter file in which to save the key (/home/acat/.ssh/id_rsa):
 7 /home/acat/.ssh/id_rsa already exists.
 8 Overwrite (y/n)?
 9 aaa@qq.com:.ssh$ ls
10 authorized_keys  id_rsa  id_rsa.pub  known_hosts
11 aaa@qq.com:.ssh$ cat ./id_rsa.pub >> ./authorized_keys

③配置Java环境

下载java for linux软件包,并解压到目录:/home/acat/softwares/jdk1.8.0_161。然后编辑家目录下的.bashrc文件,添加如下内容:

export JAVA_HOME=/home/acat/softwares/jdk1.8.0_161
export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PAT

然后:wq保存退出。
查看Java配置是否成功。

1 aaa@qq.com:~$ java -version
2 java version "1.8.0_161"
3 Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
4 Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)

④安装Hadoop2
下载hadoop-2.10.0-src.tar.gz,并解压缩到/usr/local目录下,并把解压缩之后的文件夹hadoop-2.10.0-src重命名为hadoop。
在.bashrc文件中配置Hadoop相关的环境变量
 

export PATH=/usr/local/hadoop/sbin:$PATH
export PATH=/usr/local/hadoop/bin:$PAT
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

查看Hadoop的版本

1 aaa@qq.com:~$ hadoop version
2 Hadoop 2.10.0
3 Subversion ssh://git.corp.linkedin.com:29418/hadoop/hadoop.git -r e2f1f118e465e787d8567dfa6e2f3b72a0eb9194
4 Compiled by jhung on 2019-10-22T19:10Z
5 Compiled with protoc 2.5.0
6 From source with checksum 7b2d8877c5ce8c9a2cca5c7e81aa4026
7 This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.10.0.jar

⑤Hadoop伪分布式配置
Hadoop 可以在单节点上以伪分布式的方式运行,Hadoop 进程以分离的 Java 进程来运行,节点既作为 NameNode 也作为 DataNode,同时,读取的是 HDFS 中的文件。
Hadoop 的配置文件位于 /usr/local/hadoop/etc/hadoop/ 中,伪分布式需要修改2个配置文件 core-site.xml 和 hdfs-site.xml 。Hadoop的配置文件是 xml 格式,每个配置以声明 property 的 name 和 value 的方式来实现。
首先,修改core-site.xml文件为
 

1 <configuration>
 2      <property>
 3         <name>hadoop.tmp.dir</name>
 4         <value>file:/usr/local/hadoop/tmp</value>
 5         <description>Abase for other temporary directories.</description>
 6     </property>
 7     <property>
 8         <name>fs.defaultFS</name>
 9         <value>hdfs://localhost:9000</value>
10     </property>
11 </configuration>

然后,修改hdfs-site.xml为

1 <configuration>
 2     <property>
 3         <name>dfs.replication</name>
 4         <value>1</value>
 5     </property>
 6     <property>
 7         <name>dfs.namenode.name.dir</name>
 8         <value>file:/usr/local/hadoop/tmp/dfs/name</value>
 9     </property>
10     <property>
11         <name>dfs.datanode.data.dir</name>
12         <value>file:/usr/local/hadoop/tmp/dfs/data</value>
13     </property>
14 </configuration>

⑥下面进行NameNode的格式化

1 aaa@qq.com:hadoop$ stop-dfs.sh
 2 aaa@qq.com:hadoop$ rm -r ./tmp
 3 aaa@qq.com:hadoop$ hdfs namenode -format
 4 ...省略若干行...
 5 20/05/27 23:46:49 INFO util.GSet: capacity      = 2^15 = 32768 entries
 6 20/05/27 23:46:49 INFO namenode.FSImage: Allocated new BlockPoolId: BP-335173629-127.0.1.1-1590594409666
 7 20/05/27 23:46:49 INFO common.Storage: Storage directory /usr/local/hadoop/tmp/dfs/name has been successfully formatted.
 8 20/05/27 23:46:49 INFO namenode.FSImageFormatProtobuf: Saving image file /usr/local/hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
 9 20/05/27 23:46:49 INFO namenode.FSImageFormatProtobuf: Image file /usr/local/hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 323 bytes saved in 0 seconds .
10 20/05/27 23:46:49 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
11 20/05/27 23:46:49 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid = 0 when meet shutdown.
12 20/05/27 23:46:49 INFO namenode.NameNode: SHUTDOWN_MSG:
13 /************************************************************
14 SHUTDOWN_MSG: Shutting down NameNode at acat-xx/127.0.1.1
15 ************************************************************/

⑦接着开启 NameNode 和 DataNode 守护进程。

1 aaa@qq.com:hadoop$ start-dfs.sh
 2 20/05/27 23:47:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 3 Starting namenodes on [localhost]
 4 localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-acat-namenode-acat-xx.out
 5 localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-acat-datanode-acat-xx.out
 6 Starting secondary namenodes [0.0.0.0]
 7 0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-acat-secondarynamenode-acat-xx.out
 8 20/05/27 23:47:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 9 aaa@qq.com:hadoop$ jps
10 8729 Jps
11 8588 SecondaryNameNode
12 8332 DataNode
13 8126 NameNode

可以看出,在开启NameNode和DataNode守护进程之后,下面就多了三个Jave进程,分别是SecondaryNameNode、DataNode和NameNode。SecondaryNameNode可以看做是NameNode的备节点,为了防止NameNode出现故障的情况可以及时切换到SecondaryNameNode,从而可以继续提供服务。
在成功启动之后,可以访问网址:http://localhost:50070/
ubuntu中Hadoop和Spark平台的安装

⑧运行Hadoop伪分布式实例
首先创建hdfs格式的文件夹和文件。
 

1 aaa@qq.com:hadoop$ hdfs dfs -mkdir -p /usr/local/hadoop/
 2 20/05/28 00:17:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 3 aaa@qq.com:hadoop$ hdfs dfs -mkdir input
 4 20/05/28 00:17:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 5 aaa@qq.com:hadoop$ hdfs dfs -ls input
 6 20/05/28 00:17:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 7 aaa@qq.com:hadoop$ hdfs dfs -put ./etc/hadoop/*.xml ./input/
 8 20/05/28 00:18:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 9 aaa@qq.com:hadoop$ hdfs dfs -ls input
10 20/05/28 00:18:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
11 Found 8 items
12 -rw-r--r--   1 acat supergroup       8814 2020-05-28 00:18 input/capacity-scheduler.xml
13 -rw-r--r--   1 acat supergroup       1076 2020-05-28 00:18 input/core-site.xml
14 -rw-r--r--   1 acat supergroup      10206 2020-05-28 00:18 input/hadoop-policy.xml
15 -rw-r--r--   1 acat supergroup       1133 2020-05-28 00:18 input/hdfs-site.xml
16 -rw-r--r--   1 acat supergroup        620 2020-05-28 00:18 input/httpfs-site.xml
17 -rw-r--r--   1 acat supergroup       3518 2020-05-28 00:18 input/kms-acls.xml
18 -rw-r--r--   1 acat supergroup       5939 2020-05-28 00:18 input/kms-site.xml
19 -rw-r--r--   1 acat supergroup        690 2020-05-28 00:18 input/yarn-site.xml

运行脚本

1 aaa@qq.com:hadoop$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep input output 'dfs[a-z.]+'
 2 ...此处省略若干行...
 3     Shuffle Errors
 4         BAD_ID=0
 5         CONNECTION=0
 6         IO_ERROR=0
 7         WRONG_LENGTH=0
 8         WRONG_MAP=0
 9         WRONG_REDUCE=0
10     File Input Format Counters
11         Bytes Read=219
12     File Output Format Counters
13         Bytes Written=77

查看运行结果

1 aaa@qq.com:hadoop$ hdfs dfs -cat output/*
2 20/05/28 00:19:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
3 1   dfsadmin
4 1   dfs.replication
5 1   dfs.namenode.name.dir
6 1   dfs.datanode.data.dir

将运行结果保留在本地文件中

1 aaa@qq.com:hadoop$ ls
 2 abc  bin  etc  include  lib  libexec  LICENSE.txt  logs  NOTICE.txt  README.txt  sbin  share  test.txt  tmp
 3 aaa@qq.com:hadoop$ hdfs dfs -get output ./output
 4 20/05/28 00:20:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 5 aaa@qq.com:hadoop$ cat ./output/*
 6 1   dfsadmin
 7 1   dfs.replication
 8 1   dfs.namenode.name.dir
 9 1   dfs.datanode.data.dir
10 aaa@qq.com:hadoop$ ls
11 abc  bin  etc  include  lib  libexec  LICENSE.txt  logs  NOTICE.txt  output  README.txt  sbin  share  test.txt  tmp

⑨安装Spark
首先下载文件:spark-2.3.4-bin-without-hadoop.tgz。然后将其解压缩到/usr/local目录下,将文件夹名重命名为spark。
然后在/usr/local/spark/conf目录下创建脚本文件:spark-env.sh。然后向该文件中添加如下内容:
 

export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)

配置完成后就可以直接使用,不需要像Hadoop运行启动命令。
通过运行Spark自带的示例,验证Spark是否安装成功。
 

1 aaa@qq.com:spark$ ./bin/run-example SparkPi | grep "Pi is"
2 Pi is roughly 3.1446357231786157

可以看出,Spark已经配置成功。

实验结果。

ubuntu中Hadoop和Spark平台的安装

相关标签: ubantu