欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

CentOS7上安装Jupyter notebook使用pyspark连接spark集群

程序员文章站 2024-02-23 10:43:22
...

Jupyter notebooks安装教程

安装Jupyter

#安装Jupyter
pip install jupyter

首先打开python终端,生产jupyter登陆密码

登陆密码设置为root-123456
CentOS7上安装Jupyter notebook使用pyspark连接spark集群

接下来生成**

openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout mycert.pem -out mycert.pem

CentOS7上安装Jupyter notebook使用pyspark连接spark集群

配置jupyter notebook

生成配置文件
jupyter-notebook --generate-config

CentOS7上安装Jupyter notebook使用pyspark连接spark集群

修改/root/.jupyter/jupyter_notebook_config.py文件
vim /root/.jupyter/jupyter_notebook_config.py
c.NotebookApp.password = u'sha1:bb281d407795:9fbe40980e25c5c092ad2c94e801b84d989e12ea'
c.NotebookApp.certfile=u'/moudle/jupyter/mycert.pem'
c.NotebookApp.ip = '10.177.33.45' #本机IP
c.NotebookApp.port=9999 #端口

CentOS7上安装Jupyter notebook使用pyspark连接spark集群

启动jupyter

jupyter notebook --ip=10.177.33.45 --no-browser --allow-root

CentOS7上安装Jupyter notebook使用pyspark连接spark集群

本地浏览器打开jupyter

https://10.177.33.45:9999/
CentOS7上安装Jupyter notebook使用pyspark连接spark集群

Jupyter连接pyspark

设置文件变量
# Spark安装目录
export SPARK_HOME=/moudle/spark
export PATH=$PATH:$SPARK_HOME/sbin:$SPARK_HOME/bin:$PATH
#指向spark目录下的python文件夹和py4j包
export PYTHON_PATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHON_PATH
#使用python3
export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --ip=10.177.33.45 --no-browser --allow-root"

使设置生效

source /etc/profile
启动pyspark
pyspark

CentOS7上安装Jupyter notebook使用pyspark连接spark集群

浏览器中输入链接https://10.177.33.45:10000/

默认启动了SparkContext

SparkContext(app=PySparkShell, master=local[*]) created by <module> at /home/biapp/miniconda3/lib/python3.7/site-packages/IPython/utils/py3compat.py:168 

CentOS7上安装Jupyter notebook使用pyspark连接spark集群

测试pyspark代码

# 默认启动了SprakContext,不需要引入SparkContext模块
#from pyspark import SparkContext
#sc = SparkContext("local","count app")
words = sc.parallelize(
["scala",
"java",
"hadoop"
"spark",
"python",
"C++",
"pyspark"]
)
words_map = words.map(lambda x:(x,1))
mapping = words_map.collect()
print("key value pair -> %s" % (mapping))

CentOS7上安装Jupyter notebook使用pyspark连接spark集群