基于ubuntu16.04伪分布式安装hadoop2.9.1以及hive2.3.1
一、安装ubuntu操作系统
参考:https://www.cnblogs.com/alier/p/6337151.html
二、下载hadoop以及hive
hadoop:https://hadoop.apache.org/releases.html
hive:http://hive.apache.org/downloads.html
三、hadoop安装
1.准备工作
1 sudo useradd -m hadoop -s /bin/bash #创建hadoop用户 2 sudo passwd hadoop #为hadoop用户设置密码,之后需要连续输入两次密码 3 sudo adduser hadoop sudo #为hadoop用户增加管理员权限 4 su - hadoop #切换当前用户为用户hadoop 5 sudo apt-get update
2.安装ssh并设置免密登陆
1 sudo apt-get install openssh-server #安装ssh server 2 ssh localhost #登陆ssh,第一次登陆输入yes 3 exit #退出登录的ssh localhost 4 cd ~/.ssh/ #如果没法进入该目录,执行一次ssh localhost 5 ssh-keygen -t rsa 6 cat ./id_rsa.pub >> ./authorized_keys #加入授权 7 ssh localhost #此时不需输入密码
hadoop@ge-hadoop:~$ ssh localhost welcome to ubuntu 16.04.2 lts (gnu/linux 4.8.0-36-generic x86_64) * documentation: https://help.ubuntu.com * management: https://landscape.canonical.com * support: https://ubuntu.com/advantage 312 个可升级软件包。 14 个安全更新。 last login: mon mar 11 21:37:12 2019 from 127.0.0.1 hadoop@ge-hadoop:~$
如上面显示。
3.安装配置java环境
由于撰写本文时作者已经配置完成,故无法展示
附参考链接:https://www.linuxidc.com/linux/2015-01/112030.htm
重点为配置java环境,如下:
sudo gedit ~/.bashrc #在此文件中配置加入 export java_home=/usr/java/jdk1.8.0_201#为你java安装路径 sudo gedit /ect/profile #在此文件配置中加入 export java_home=/usr/java/jdk1.8.0_201 export jre_home=${java_home}/jre export classpath=.:${java_home}/lib:${jre_home}/lib export path=${java_home}/bin:$path
java_home为你安装java路径
配置完两个文件需要输入,配置立即生效
1 source ~/.bashrc 2 source /etc/profile
在终端输入
hadoop@ge-hadoop:~$ java -version java version "1.8.0_201" java(tm) se runtime environment (build 1.8.0_201-b09) java hotspot(tm) 64-bit server vm (build 25.201-b09, mixed mode)
表明安装配置成功。
4.hadoop安装与配置
将hadoop安装包配置解压到/usr/local目录下:
tar -zxvf hadoop-2.9.1.tar.gz -c /usr/local
cd /usr/local
sudo mv hadoop-2.9.1 hadoop
sudo chown hadoop ./hadoop #修改文件所属用户
添加hadoop环境变量
sudo gedit /etc/profile #添加以下行 export hadoop_home=/usr/local/hadoop export path=.:$hadoop_home/bin:$java_home/bin:$path
source /etc/prifile 立即生效
查看hadoop版本号
hadoop@ge-hadoop:~$ hadoop version hadoop 2.9.1 subversion https://github.com/apache/hadoop.git -r e30710aea4e6e55e69372929106cf119af06fd0e compiled by root on 2018-04-16t09:33z compiled with protoc 2.5.0 from source with checksum 7d6d2b655115c6cc336d662cc2b919bd this command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.9.1.jar
显示安装成功
5.配置伪分布式
主要修改hadoop配置文件
cd /usr/local/hadoop/etc/hadoop #配置文件目录 sudo vim hadoop-env.sh #在该文件下添加
export java_home=/usr/java/jdk1.8.0_201
sudo vim hdfs-site.xml
修改该文件为
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/data</value>
</property>
<property>
<name>dfs.secondary.http.address</name>
<value>127.0.0.1:50090</value>
</property>
</configuration>
接下来修改core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultfs</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
最后执行namenode初始化
./bin/hdfs namenode -format #此段代码执行多次容易报错
1 ./sbin/start-dfs.sh #启动服务
2 jps #查看服务状态
hadoop@ge-hadoop:/usr/local/hadoop$ jps 5632 resourcemanager 5457 secondarynamenode 6066 jps 5238 datanode 5113 namenode 5756 nodemanager hadoop@ge-hadoop:/usr/local/hadoop$
成功启动后,可以访问 web 界面 查看 namenode 和 datanode 信息,还可以在线查看 hdfs 中的文件。
四、hive安装与配置
1.准备工作
安装mysql
下载mysql-connector-java:https://dev.mysql.com/downloads/connector/j/ #最好下载与mysql配套的,不然连接时候容易报错
2.配置mysql
mysql -u root -p; #咦root权限登陆mysql create database hive; use hive; create table user(host char(20),user char(10),password char(20)); insert into user(host,user,password) values("localhost","hive","hive"); #建立hive用户密码为hive flush privileges; grant all privileges on *.* to 'hive'@'localhost' identified by 'hive'; flush privileges;
3.hive安装配置
tar –zxvf apache-hive-2.3.4-bin.tar.gz /usr/local/ sudo mv apache-hive-2.3.4-bin.tar.gz hive
sudo vim /etc/profile
保存后记得要source /etc/profile 使其更改生效
export hive_home=/usr/local/hive export path=$path:$hive_home/bin
修改hive/conf下的几个模板
cp hive-env.sh.template hive-env.sh cp hive-default.xml.template hive-site.xml
更改hive-env.sh文件,指定hadoop的安装路
hadoop_home=/usr/local/hadoop
更改hive-site.xml文件,指定数据库的相关信息
<property> <name>javax.jdo.option.connectionurl</name> <value>jdbc:mysql://localhost:3306/hive?createdatabaseifnotexist=true</value> <description>jdbc connect string for a jdbc metastore</description> </property> <property> <name>javax.jdo.option.connectiondrivername</name> <value>com.mysql.jdbc.driver</value> <description>driver class name for a jdbc metastore</description> </property> <property> <name>javax.jdo.option.connectionusername</name> <value>hive</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.connectionpassword</name> <value>hive</value> <description>password to use against metastore database</description> </property>
此处建议在配置文件中找到对应行修改value值,不然容易报错
修改hive/bin下的hive-config.sh文件
export java_home=/usr/java/jdk1.8.0_201 export hadoop_home=/usr/local/hadoop export hive_home=/usr/local/hive
解压mysql-connector-java-5.1.47.tar.gz
tar -zxvf mysql-connector-java-5.1.47.tar.gz /usr/local
将文件中mysql-connector-java-5.1.47.jar 包copy到hive/lib目录下
初始化hive数据库
schematool -dbtype mysql -initschema
打开hive
hadoop@ge-hadoop:/usr/local/hive$ bin/hive logging initialized using configuration in jar:file:/usr/local/hive/lib/hive-common-2.3.4.jar!/hive-log4j2.properties async: true hive-on-mr is deprecated in hive 2 and may not be available in the future versions. consider using a different execution engine (i.e. spark, tez) or using hive 1.x releases. hive>
完成,当然安装郭恒中会出行很多很多报错的地方,又问题欢迎留言