Hadoop 单节点部署
Setup JDK
tar -zxvf jdk-8u112-linux-x64.tar.gz
ln -s jdk1.8.0_112 jdk
vim ~/.bash_profile
export JAVA_HOME=/home/jdk
PATH=${JAVA_HOME}/bin:$PATH
export PATH
source ~/.bash_profile
Standalone Operation
By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.
The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.
$ mkdir input
$ cp etc/hadoop/*.xml input
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+'
$ cat output/*
# cat output/*
1 dfsadmin
Setup passphraseless ssh
Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-****** -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
Pseudo-Distributed Operation
Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.
Configuration
etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
./etc/hadoop/hadoop-env.sh
export JAVA_HOME=/home/jdk
如果不在 haddop-env.sh 设置 JAVA_HOME 将无法感知,会报错,而且要写绝对路径
start hdfs
$ bin/hdfs namenode -format
$ sbin/start-dfs.sh
# sbin/start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/packages/hadoop-2.7.3/logs/hadoop-root-namenode-vsr264.out
localhost: starting datanode, logging to /home/packages/hadoop-2.7.3/logs/hadoop-root-datanode-vsr264.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/packages/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-vsr264.out
Browse the web interface for the NameNode; by default it is available at:
NameNode - http://localhost:50070/
execute MapReduce jobs
Make the HDFS directories required to execute MapReduce jobs:
$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/hello
Copy the input files into the distributed filesystem
bin/hdfs dfs -put etc/hadoop /input
Run some of the examples provided:
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep /input/* output 'dfs[a-z.]+'
check result
#bin/hdfs dfs -cat output/*
or
# bin/hdfs dfs -cat hdfs://localhost:9000/user/root/output/*
6 dfs.audit.logger
4 dfs.class
3 dfs.server.namenode.
2 dfs.period
2 dfs.audit.log.maxfilesize
2 dfs.audit.log.maxbackupindex
1 dfsmetrics.log
1 dfsadmin
1 dfs.servers
1 dfs.replication
1 dfs.file
[[email protected] hadoop]# bin/hdfs dfs -ls hdfs://localhost:9000/user/root/output/*
-rw-r--r-- 1 root supergroup 0 2020-09-17 19:39 hdfs://localhost:9000/user/root/output/_SUCCESS
-rw-r--r-- 1 root supergroup 197 2020-09-17 19:39 hdfs://localhost:9000/user/root/output/part-r-00000
YARN on a Single Node
Configuration
etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Start ResourceManager daemon and NodeManager daemon
sbin/start-yarn.sh
Browse the web interface for the ResourceManager; by default it is available at:
ResourceManager - http://localhost:8088/
下一篇: [前端学习]一文搞定Vue指令