Hadoop 单节点部署

程序员文章站 2022-07-13 08:57:28

...

Setup JDK

tar -zxvf jdk-8u112-linux-x64.tar.gz
ln -s jdk1.8.0_112 jdk

vim ~/.bash_profile

export JAVA_HOME=/home/jdk
PATH=${JAVA_HOME}/bin:$PATH
export PATH

source ~/.bash_profile

Standalone Operation

By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.

The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.

  $ mkdir input
  $ cp etc/hadoop/*.xml input
  $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+'
  $ cat output/*

# cat output/*
1	dfsadmin

Setup passphraseless ssh

Now check that you can ssh to the localhost without a passphrase:

  $ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:

  $ ssh-****** -t rsa -P '' -f ~/.ssh/id_rsa
  $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  $ chmod 0600 ~/.ssh/authorized_keys

Pseudo-Distributed Operation

Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.

Configuration

etc/hadoop/core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

etc/hadoop/hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

./etc/hadoop/hadoop-env.sh

export JAVA_HOME=/home/jdk

如果不在 haddop-env.sh 设置 JAVA_HOME 将无法感知，会报错，而且要写绝对路径

start hdfs

$ bin/hdfs namenode -format
 $ sbin/start-dfs.sh

# sbin/start-dfs.sh


Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/packages/hadoop-2.7.3/logs/hadoop-root-namenode-vsr264.out
localhost: starting datanode, logging to /home/packages/hadoop-2.7.3/logs/hadoop-root-datanode-vsr264.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/packages/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-vsr264.out

Browse the web interface for the NameNode; by default it is available at:

NameNode - http://localhost:50070/

execute MapReduce jobs

Make the HDFS directories required to execute MapReduce jobs:

 $ bin/hdfs dfs -mkdir /user
 $ bin/hdfs dfs -mkdir /user/hello

Copy the input files into the distributed filesystem

 bin/hdfs dfs -put etc/hadoop /input

Run some of the examples provided:

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep /input/* output 'dfs[a-z.]+'

check result

#bin/hdfs dfs -cat output/*
or
# bin/hdfs dfs -cat  hdfs://localhost:9000/user/root/output/*
6	dfs.audit.logger
4	dfs.class
3	dfs.server.namenode.
2	dfs.period
2	dfs.audit.log.maxfilesize
2	dfs.audit.log.maxbackupindex
1	dfsmetrics.log
1	dfsadmin
1	dfs.servers
1	dfs.replication
1	dfs.file
[[email protected] hadoop]# bin/hdfs dfs -ls  hdfs://localhost:9000/user/root/output/*
-rw-r--r--   1 root supergroup          0 2020-09-17 19:39 hdfs://localhost:9000/user/root/output/_SUCCESS
-rw-r--r--   1 root supergroup        197 2020-09-17 19:39 hdfs://localhost:9000/user/root/output/part-r-00000

YARN on a Single Node

Configuration

etc/hadoop/mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

etc/hadoop/yarn-site.xml

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

Start ResourceManager daemon and NodeManager daemon

sbin/start-yarn.sh

Browse the web interface for the ResourceManager; by default it is available at:

ResourceManager - http://localhost:8088/

Hadoop 单节点部署

Setup JDK

Standalone Operation

Setup passphraseless ssh

Pseudo-Distributed Operation

Configuration

start hdfs

execute MapReduce jobs

YARN on a Single Node

Configuration

Start ResourceManager daemon and NodeManager daemon

hadoop生态搭建（3节点）

linux系统下MongoDB单节点安装教程

Elasticsearch单机双节点集群部署实战

python实现单链表中删除倒数第K个节点的方法

Mac部署hadoop3(伪分布式)

hadoop入门之hadoop集群验证任务存放在不同的节点上

hbase单机版安装+phoneix SQL on hbase 单节点安装

hadoop 完全分布式部署

vue、react等单页面项目部署到服务器的方法及vue和react的区别

hadoop一主一从部署（1）