欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Hadoop 单节点部署

程序员文章站 2022-07-13 08:57:28
...

Setup JDK

tar -zxvf jdk-8u112-linux-x64.tar.gz
ln -s jdk1.8.0_112 jdk
vim ~/.bash_profile

export JAVA_HOME=/home/jdk
PATH=${JAVA_HOME}/bin:$PATH
export PATH

source ~/.bash_profile

Standalone Operation

By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.

The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.

  $ mkdir input
  $ cp etc/hadoop/*.xml input
  $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+'
  $ cat output/*
# cat output/*
1	dfsadmin

Setup passphraseless ssh

Now check that you can ssh to the localhost without a passphrase:

  $ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:

  $ ssh-****** -t rsa -P '' -f ~/.ssh/id_rsa
  $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  $ chmod 0600 ~/.ssh/authorized_keys

Pseudo-Distributed Operation

Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.

Configuration

etc/hadoop/core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

etc/hadoop/hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

./etc/hadoop/hadoop-env.sh

export JAVA_HOME=/home/jdk

如果不在 haddop-env.sh 设置 JAVA_HOME 将无法感知,会报错,而且要写绝对路径

start hdfs
$ bin/hdfs namenode -format
 $ sbin/start-dfs.sh
# sbin/start-dfs.sh


Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/packages/hadoop-2.7.3/logs/hadoop-root-namenode-vsr264.out
localhost: starting datanode, logging to /home/packages/hadoop-2.7.3/logs/hadoop-root-datanode-vsr264.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/packages/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-vsr264.out

Browse the web interface for the NameNode; by default it is available at:

NameNode - http://localhost:50070/
execute MapReduce jobs

Make the HDFS directories required to execute MapReduce jobs:

 $ bin/hdfs dfs -mkdir /user
 $ bin/hdfs dfs -mkdir /user/hello

Copy the input files into the distributed filesystem

 bin/hdfs dfs -put etc/hadoop /input

Run some of the examples provided:

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep /input/* output 'dfs[a-z.]+'

check result

#bin/hdfs dfs -cat output/*
or
# bin/hdfs dfs -cat  hdfs://localhost:9000/user/root/output/*
6	dfs.audit.logger
4	dfs.class
3	dfs.server.namenode.
2	dfs.period
2	dfs.audit.log.maxfilesize
2	dfs.audit.log.maxbackupindex
1	dfsmetrics.log
1	dfsadmin
1	dfs.servers
1	dfs.replication
1	dfs.file
[[email protected] hadoop]# bin/hdfs dfs -ls  hdfs://localhost:9000/user/root/output/*
-rw-r--r--   1 root supergroup          0 2020-09-17 19:39 hdfs://localhost:9000/user/root/output/_SUCCESS
-rw-r--r--   1 root supergroup        197 2020-09-17 19:39 hdfs://localhost:9000/user/root/output/part-r-00000

YARN on a Single Node

Configuration

etc/hadoop/mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

etc/hadoop/yarn-site.xml

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>
Start ResourceManager daemon and NodeManager daemon
sbin/start-yarn.sh

Browse the web interface for the ResourceManager; by default it is available at:

ResourceManager - http://localhost:8088/

相关标签: hadoop