Hadoop Docker 2019 Version 3.2.1
程序员文章站
2022-03-30 19:23:10
...
Hadoop Docker 2019 Version 3.2.1
I try to set up a HDFS in Docker, which can be running on one server to provide DFS. That is it. The files there can be easily share with multiple machines.
Exception:
> systemctl start sshd
Failed to get D-Bus connection: Operation not permitted
Solution:
I can not fix that in CentOS. So I start to use Ubuntu instead.
Set Up Client and Try
> wget http://apache-mirror.8birdsvideo.com/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
> tar zxvf hadoop-3.2.1.tar.gz
> mv hadoop-3.2.1 ~/tool/
Place in working directory, and add this to Path
PATH=$PATH:/opt/hadoop/bin
Check version
> hdfs version
Hadoop 3.2.1
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r b3cbbb467e22ea829b3808f4b7b01d07e0bf3842
Compiled by rohithsharmaks on 2019-09-10T15:56Z
Compiled with protoc 2.5.0
From source with checksum 776eaf9eee9c0ffc370bcbc1888737
This command was run using /home/carl/tool/hadoop-3.2.1/share/hadoop/common/hadoop-common-3.2.1.jar
List the file
> hdfs dfs -ls hdfs://localhost:9000/
Found 1 items
drwxr-xr-x - dr.who supergroup 0 2019-12-07 16:25 hdfs://localhost:9000/hello
The put command works
> hdfs dfs -put ./README.txt hdfs://localhost:9000/hello/README.txt
But I can not upload and download from web console. Check the developer tool, I found it is using the Docker Container hostname and 9864 port.
https://my.oschina.net/u/3163032/blog/1622221
Official Website
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/HttpAuthentication.html
https://note.louyj.com/blog/post/louyj/Authentication-for-Hadoop-HTTP-web-consoles
<property>
<name>hadoop.http.filter.initializers</name>
<value>org.apache.hadoop.security.AuthenticationFilterInitializer</value>
</property>
<property>
<name>hadoop.http.authentication.type</name>
<value>simple</value>
</property>
<property>
<name>hadoop.http.authentication.token.validity</name>
<value>12000</value>
</property>
<property>
<name>hadoop.http.authentication.simple.anonymous.allowed</name>
<value>false</value>
</property>
<property>
<name>hadoop.http.authentication.signature.secret.file</name>
<value>/tool/hadoop/hadoop-http-auth-signature-secret</value>
</property>
<property>
<name>hadoop.http.authentication.cookie.domain</name>
<value></value>
</property>
The Hadoop-http-auth-signature-secret is a text file with content hello!123
This will works
http://rancher-worker1:9870/explorer.html?user.name=hello!123#/
Warning:
2019-12-08 01:56:21,717 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
It is an info, I do not know how to disable it right now.
The most important conf/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://0.0.0.0:9000</value>
</property>
<property>
<name>hadoop.http.filter.initializers</name>
<value>org.apache.hadoop.security.AuthenticationFilterInitializer</value>
</property>
<property>
<name>hadoop.http.authentication.type</name>
<value>simple</value>
</property>
<property>
<name>hadoop.http.authentication.token.validity</name>
<value>12000</value>
</property>
<property>
<name>hadoop.http.authentication.simple.anonymous.allowed</name>
<value>false</value>
</property>
<property>
<name>hadoop.http.authentication.signature.secret.file</name>
<value>/tool/hadoop/hadoop-http-auth-signature-secret</value>
</property>
<property>
<name>hadoop.http.authentication.cookie.domain</name>
<value></value>
</property>
</configuration>
Nothing special in conf/hadoop-env.sh
export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"
export HADOOP_OS_TYPE=${HADOOP_OS_TYPE:-$(uname -s)}
case ${HADOOP_OS_TYPE} in
Darwin*)
export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.realm= "
export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.kdc= "
export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.conf= "
;;
esac
Secret password file conf/hadoop-http-auth-signature-secret
hello123
Configuration file conf/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>0.0.0.0:9870</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:9864</value>
</property>
</configuration>
All steps are in Dockerfile
#Run a kafka server side
#Prepare the OS
FROM ubuntu:16.04
MAINTAINER Carl Luo <luohuazju@gmail.com>
ENV DEBIAN_FRONTEND noninteractive
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64
ENV LANG en_US.UTF-8
ENV LC_ALL en_US.UTF-8
RUN apt-get -qq update
RUN apt-get -qqy dist-upgrade
#Prepare the denpendencies
RUN apt-get install -qy wget unzip vim
RUN apt-get install -qy iputils-ping
#Install JAVA
RUN apt-get update && \
apt-get install -y --no-install-recommends locales && \
locale-gen en_US.UTF-8 && \
apt-get dist-upgrade -y && \
apt-get install -qy openjdk-8-jdk
#Prepare for hadoop and spark
RUN apt-get install -y openssh-server
RUN mkdir /var/run/sshd
RUN ssh-keygen -q -t rsa -N '' -f /root/.ssh/id_rsa
RUN cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
RUN mkdir /tool/
WORKDIR /tool/
RUN wget http://apache-mirror.8birdsvideo.com/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
RUN tar zxvf hadoop-3.2.1.tar.gz
RUN ln -s /tool/hadoop-3.2.1 /tool/hadoop
ADD conf/core-site.xml /tool/hadoop/etc/hadoop/
ADD conf/hdfs-site.xml /tool/hadoop/etc/hadoop/
ADD conf/hadoop-env.sh /tool/hadoop/etc/hadoop/
ADD conf/hadoop-http-auth-signature-secret /tool/hadoop/hadoop-http-auth-signature-secret
#set up the app
EXPOSE 9870 9000 9864
RUN mkdir -p /app/
ADD start.sh /app/
WORKDIR /app/
CMD [ "./start.sh" ]
Makefile to help me build the images
IMAGE=sillycat/public
TAG=ubuntu-hadoop-1.0
NAME=ubuntu-hadoop-1.0
HOSTNAME=rancher-worker1
docker-context:
build: docker-context
docker build -t $(IMAGE):$(TAG) .
run:
docker run -d -p 9870:9870 -p 9000:9000 -p 9864:9864 --hostname ${HOSTNAME} --name $(NAME) $(IMAGE):$(TAG)
debug:
docker run -ti -p 9870:9870 -p 9000:9000 -p 9864:9864 --hostname ${HOSTNAME} --name $(NAME) $(IMAGE):$(TAG) /bin/bash
clean:
docker stop ${NAME}
docker rm ${NAME}
logs:
docker logs ${NAME}
publish:
docker push ${IMAGE}
Shell script to start the service start.sh
#!/bin/sh -ex
#prepare ENV
export HDFS_NAMENODE_USER="root"
export HDFS_DATANODE_USER="root"
export HDFS_SECONDARYNAMENODE_USER="root"
export YARN_RESOURCEMANAGER_USER="root"
export YARN_NODEMANAGER_USER="root"
#start ssh service
nohup /usr/sbin/sshd -D >/dev/stdout &
#start the service
cd /tool/hadoop
bin/hdfs namenode -format
sbin/start-dfs.sh
tail -f /dev/null
References:
https://phoenixnap.com/kb/how-to-enable-ssh-centos-7
https://serverfault.com/questions/824975/failed-to-get-d-bus-connection-operation-not-permitted
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml
https://serverfault.com/questions/562756/how-to-remove-the-path-with-an-nginx-proxy-pass
Security
https://www.jianshu.com/p/51c39dfecff2
https://www.twblogs.net/a/5cfed4aebd9eee14029f459f/zh-cn
I try to set up a HDFS in Docker, which can be running on one server to provide DFS. That is it. The files there can be easily share with multiple machines.
Exception:
> systemctl start sshd
Failed to get D-Bus connection: Operation not permitted
Solution:
I can not fix that in CentOS. So I start to use Ubuntu instead.
Set Up Client and Try
> wget http://apache-mirror.8birdsvideo.com/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
> tar zxvf hadoop-3.2.1.tar.gz
> mv hadoop-3.2.1 ~/tool/
Place in working directory, and add this to Path
PATH=$PATH:/opt/hadoop/bin
Check version
> hdfs version
Hadoop 3.2.1
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r b3cbbb467e22ea829b3808f4b7b01d07e0bf3842
Compiled by rohithsharmaks on 2019-09-10T15:56Z
Compiled with protoc 2.5.0
From source with checksum 776eaf9eee9c0ffc370bcbc1888737
This command was run using /home/carl/tool/hadoop-3.2.1/share/hadoop/common/hadoop-common-3.2.1.jar
List the file
> hdfs dfs -ls hdfs://localhost:9000/
Found 1 items
drwxr-xr-x - dr.who supergroup 0 2019-12-07 16:25 hdfs://localhost:9000/hello
The put command works
> hdfs dfs -put ./README.txt hdfs://localhost:9000/hello/README.txt
But I can not upload and download from web console. Check the developer tool, I found it is using the Docker Container hostname and 9864 port.
https://my.oschina.net/u/3163032/blog/1622221
Official Website
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/HttpAuthentication.html
https://note.louyj.com/blog/post/louyj/Authentication-for-Hadoop-HTTP-web-consoles
<property>
<name>hadoop.http.filter.initializers</name>
<value>org.apache.hadoop.security.AuthenticationFilterInitializer</value>
</property>
<property>
<name>hadoop.http.authentication.type</name>
<value>simple</value>
</property>
<property>
<name>hadoop.http.authentication.token.validity</name>
<value>12000</value>
</property>
<property>
<name>hadoop.http.authentication.simple.anonymous.allowed</name>
<value>false</value>
</property>
<property>
<name>hadoop.http.authentication.signature.secret.file</name>
<value>/tool/hadoop/hadoop-http-auth-signature-secret</value>
</property>
<property>
<name>hadoop.http.authentication.cookie.domain</name>
<value></value>
</property>
The Hadoop-http-auth-signature-secret is a text file with content hello!123
This will works
http://rancher-worker1:9870/explorer.html?user.name=hello!123#/
Warning:
2019-12-08 01:56:21,717 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
It is an info, I do not know how to disable it right now.
The most important conf/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://0.0.0.0:9000</value>
</property>
<property>
<name>hadoop.http.filter.initializers</name>
<value>org.apache.hadoop.security.AuthenticationFilterInitializer</value>
</property>
<property>
<name>hadoop.http.authentication.type</name>
<value>simple</value>
</property>
<property>
<name>hadoop.http.authentication.token.validity</name>
<value>12000</value>
</property>
<property>
<name>hadoop.http.authentication.simple.anonymous.allowed</name>
<value>false</value>
</property>
<property>
<name>hadoop.http.authentication.signature.secret.file</name>
<value>/tool/hadoop/hadoop-http-auth-signature-secret</value>
</property>
<property>
<name>hadoop.http.authentication.cookie.domain</name>
<value></value>
</property>
</configuration>
Nothing special in conf/hadoop-env.sh
export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"
export HADOOP_OS_TYPE=${HADOOP_OS_TYPE:-$(uname -s)}
case ${HADOOP_OS_TYPE} in
Darwin*)
export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.realm= "
export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.kdc= "
export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.conf= "
;;
esac
Secret password file conf/hadoop-http-auth-signature-secret
hello123
Configuration file conf/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>0.0.0.0:9870</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:9864</value>
</property>
</configuration>
All steps are in Dockerfile
#Run a kafka server side
#Prepare the OS
FROM ubuntu:16.04
MAINTAINER Carl Luo <luohuazju@gmail.com>
ENV DEBIAN_FRONTEND noninteractive
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64
ENV LANG en_US.UTF-8
ENV LC_ALL en_US.UTF-8
RUN apt-get -qq update
RUN apt-get -qqy dist-upgrade
#Prepare the denpendencies
RUN apt-get install -qy wget unzip vim
RUN apt-get install -qy iputils-ping
#Install JAVA
RUN apt-get update && \
apt-get install -y --no-install-recommends locales && \
locale-gen en_US.UTF-8 && \
apt-get dist-upgrade -y && \
apt-get install -qy openjdk-8-jdk
#Prepare for hadoop and spark
RUN apt-get install -y openssh-server
RUN mkdir /var/run/sshd
RUN ssh-keygen -q -t rsa -N '' -f /root/.ssh/id_rsa
RUN cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
RUN mkdir /tool/
WORKDIR /tool/
RUN wget http://apache-mirror.8birdsvideo.com/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
RUN tar zxvf hadoop-3.2.1.tar.gz
RUN ln -s /tool/hadoop-3.2.1 /tool/hadoop
ADD conf/core-site.xml /tool/hadoop/etc/hadoop/
ADD conf/hdfs-site.xml /tool/hadoop/etc/hadoop/
ADD conf/hadoop-env.sh /tool/hadoop/etc/hadoop/
ADD conf/hadoop-http-auth-signature-secret /tool/hadoop/hadoop-http-auth-signature-secret
#set up the app
EXPOSE 9870 9000 9864
RUN mkdir -p /app/
ADD start.sh /app/
WORKDIR /app/
CMD [ "./start.sh" ]
Makefile to help me build the images
IMAGE=sillycat/public
TAG=ubuntu-hadoop-1.0
NAME=ubuntu-hadoop-1.0
HOSTNAME=rancher-worker1
docker-context:
build: docker-context
docker build -t $(IMAGE):$(TAG) .
run:
docker run -d -p 9870:9870 -p 9000:9000 -p 9864:9864 --hostname ${HOSTNAME} --name $(NAME) $(IMAGE):$(TAG)
debug:
docker run -ti -p 9870:9870 -p 9000:9000 -p 9864:9864 --hostname ${HOSTNAME} --name $(NAME) $(IMAGE):$(TAG) /bin/bash
clean:
docker stop ${NAME}
docker rm ${NAME}
logs:
docker logs ${NAME}
publish:
docker push ${IMAGE}
Shell script to start the service start.sh
#!/bin/sh -ex
#prepare ENV
export HDFS_NAMENODE_USER="root"
export HDFS_DATANODE_USER="root"
export HDFS_SECONDARYNAMENODE_USER="root"
export YARN_RESOURCEMANAGER_USER="root"
export YARN_NODEMANAGER_USER="root"
#start ssh service
nohup /usr/sbin/sshd -D >/dev/stdout &
#start the service
cd /tool/hadoop
bin/hdfs namenode -format
sbin/start-dfs.sh
tail -f /dev/null
References:
https://phoenixnap.com/kb/how-to-enable-ssh-centos-7
https://serverfault.com/questions/824975/failed-to-get-d-bus-connection-operation-not-permitted
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml
https://serverfault.com/questions/562756/how-to-remove-the-path-with-an-nginx-proxy-pass
Security
https://www.jianshu.com/p/51c39dfecff2
https://www.twblogs.net/a/5cfed4aebd9eee14029f459f/zh-cn