spark-2.4.0-hadoop2.7-安装部署
1. 主机规划
主机名称 |
ip地址 |
操作系统 |
部署软件 |
运行进程 |
备注 |
mini01 |
172.16.1.11【内网】 10.0.0.11 【外网】 |
centos 7.5 |
jdk-8、zookeeper-3.4.5、hadoop2.7.6、hbase-2.0.2、kafka_2.11-2.0.0、spark-2.4.0-hadoop2.7【主】 |
quorumpeermain、 |
|
mini02 |
172.16.1.12【内网】 10.0.0.12 【外网】 |
centos 7.5 |
jdk-8、zookeeper-3.4.5、hadoop2.7.6、hbase-2.0.2、kafka_2.11-2.0.0 |
quorumpeermain、 |
|
mini03 |
172.16.1.13【内网】 10.0.0.13 【外网】 |
centos 7.5 |
jdk-8、zookeeper-3.4.5、hadoop2.7.6、hbase-2.0.2、kafka_2.11-2.0.0、spark-2.4.0-hadoop2.7 |
quorumpeermain、 |
|
mini04 |
172.16.1.14【内网】 10.0.0.14 【外网】 |
centos 7.5 |
jdk-8、zookeeper-3.4.5、hadoop2.7.6、hbase-2.0.2、spark-2.4.0-hadoop2.7 |
quorumpeermain、 |
|
mini05 |
172.16.1.15【内网】 10.0.0.15 【外网】 |
centos 7.5 |
jdk-8、zookeeper-3.4.5、hadoop2.7.6、hbase-2.0.2、spark-2.4.0-hadoop2.7 |
quorumpeermain、 |
|
说明
该spark集群安装,但是有一个很大的问题,那就是master节点存在单点故障,要解决此问题,就要借助zookeeper,并且启动至少两个master节点来实现高可靠。具体部署下节讲解。
2. 免密码登录
实现mini01到mini02、mini03、mini04、mini05通过秘钥免密码登录。
参见文章:hadoop2.7.6_01_部署
3. jdk【java8】
参见文章:hadoop2.7.6_01_部署
4. spark部署步骤
4.1. spark安装
1 [yun@mini01 software]$ pwd 2 /app/software 3 [yun@mini01 software]$ ll 4 total 238572 5 -rw-r--r-- 1 yun yun 227893062 nov 19 21:24 spark-2.4.0-bin-hadoop2.7.tgz 6 [yun@mini01 software]$ tar xf spark-2.4.0-bin-hadoop2.7.tgz 7 [yun@mini01 software]$ mv spark-2.4.0-bin-hadoop2.7 /app/ 8 [yun@mini01 software]$ cd /app/ 9 [yun@mini01 ~]$ ln -s spark-2.4.0-bin-hadoop2.7/ spark 10 [yun@mini01 ~]$ ll -d spark-* 11 drwxr-xr-x 13 yun yun 211 oct 29 14:36 spark-2.4.0-bin-hadoop2.7 12 lrwxrwxrwx 1 yun yun 26 nov 24 14:23 spark -> spark-2.4.0-bin-hadoop2.7/
4.2. 环境变量修改
根据规划,该环境变量的修改包括mini01、mini03、mini04、mini05。
1 # 需要root权限去添加环境变量 2 [root@mini01 ~]# tail /etc/profile 3 ……………… 4 # spark环境变量 5 export spark_home="/app/spark" 6 export path=$spark_home/bin:$spark_home/sbin:$path 7 8 [root@mini01 ~]# logout 9 [yun@mini01 conf]$ source /etc/profile # 重新加载该环境变量
4.3. 配置修改
1 [yun@mini01 conf]$ pwd 2 /app/spark/conf 3 [yun@mini01 conf]$ cp -a spark-env.sh.template spark-env.sh 4 [yun@mini01 conf]$ tail spark-env.sh # 修改环境变量配置 5 # options for native blas, like intel mkl, openblas, and so on. 6 # you might get better performance to enable these options if using native blas (see spark-21305). 7 # - mkl_num_threads=1 disable multi-threading of intel mkl 8 # - openblas_num_threads=1 disable multi-threading of openblas 9 10 # 添加配置如下 11 # 配置java_home 12 export java_home=/app/jdk 13 # 设置master的主机名 14 export spark_master_ip=mini01 15 # 每一个worker最多可以使用的内存,我的虚拟机就2g 16 # 真实服务器如果有128g,你可以设置为100g 17 # 所以这里设置为1024m或1g 18 export spark_worker_memory=1024m 19 # 每一个worker最多可以使用的cpu core的个数,我虚拟机就一个... 20 # 真实服务器如果有32个,你可以设置为32个 21 export spark_worker_cores=1 22 # 提交application的端口,默认就是这个,万一要改呢,改这里 23 export spark_master_port=7077 24 25 [yun@mini01 conf]$ pwd 26 /app/spark/conf 27 [yun@mini01 conf]$ cp -a slaves.template slaves 28 [yun@mini01 conf]$ tail slaves # 修改slaves 配置 29 # distributed under the license is distributed on an "as is" basis, 30 # without warranties or conditions of any kind, either express or implied. 31 # see the license for the specific language governing permissions and 32 # limitations under the license. 33 # 34 35 # a spark worker will be started on each of the machines listed below. 36 mini03 37 mini04 38 mini05
4.4. 分发到其他机器
分发到mini03、mini04和mini05
1 [yun@mini01 ~]$ scp -pr spark-2.4.0-bin-hadoop2.7/ yun@mini03:/app # 拷贝到mini03 2 [yun@mini01 ~]$ scp -pr spark-2.4.0-bin-hadoop2.7/ yun@mini04:/app # 拷贝到mini04 3 [yun@mini01 ~]$ scp -pr spark-2.4.0-bin-hadoop2.7/ yun@mini05:/app # 拷贝到mini05
在mini03、mini04和mini05上操作
1 [yun@mini04 ~]$ pwd 2 /app 3 [yun@mini04 ~]$ ll -d spark-2.4.0-bin-hadoop2.7 4 drwxr-xr-x 13 yun yun 211 oct 29 14:36 spark-2.4.0-bin-hadoop2.7 5 [yun@mini04 ~]$ ln -s spark-2.4.0-bin-hadoop2.7/ spark 6 [yun@mini04 ~]$ ll -d spark-* 7 drwxr-xr-x 13 yun yun 211 oct 29 14:36 spark-2.4.0-bin-hadoop2.7 8 lrwxrwxrwx 1 yun yun 26 nov 24 23:39 spark -> spark-2.4.0-bin-hadoop2.7/
4.5. 启动spark
在mini01上操作
1 [yun@mini01 sbin]$ pwd 2 /app/spark/sbin 3 [yun@mini01 sbin]$ ./start-all.sh # 关闭使用 stop-all.sh 脚本 4 starting org.apache.spark.deploy.master.master, logging to /app/spark/logs/spark-yun-org.apache.spark.deploy.master.master-1-mini01.out 5 mini03: starting org.apache.spark.deploy.worker.worker, logging to /app/spark/logs/spark-yun-org.apache.spark.deploy.worker.worker-1-mini03.out 6 mini05: starting org.apache.spark.deploy.worker.worker, logging to /app/spark/logs/spark-yun-org.apache.spark.deploy.worker.worker-1-mini05.out 7 mini04: starting org.apache.spark.deploy.worker.worker, logging to /app/spark/logs/spark-yun-org.apache.spark.deploy.worker.worker-1-mini04.out 8 [yun@mini01 ~]$ 9 [yun@mini01 ~]$ jps # 查看进程状态 10 3103 master 11 3183 jps
mini03进程查看
1 [yun@mini03 ~]$ jps 2 2387 worker 3 2437 jps
mini04进程查看
1 [yun@mini04 ~]$ jps 2 2183 jps 3 2125 worker
mini05进程查看
1 [yun@mini05 ~]$ jps 2 2212 worker 3 2261 jps
4.6. 浏览器访问
1 http://mini01:8080/
上一篇: 写一个可以在可口可乐上写日记的小程序
下一篇: Go Web:数据存储(1)——内存存储