jupyter notebook + pyspark 环境搭建
程序员文章站
2022-06-01 09:52:23
...
主要思路、步骤:
1、正常安装Anaconda环境
2、conda/pip install findspark
#这一步很重要,findspark的作用:Provides findspark.init() to make pyspark importable as a regular library.
3、正常安装配置Jupyter Notebook,并后台启动
4、浏览器打开Jupyter Notebook页面,写代码
wordCount示例:
import findspark
findspark.init()
from pyspark import SparkConf, SparkContext
conf = SparkConf().setAppName('testApp') \
.setMaster('spark://mdw-1:7077') \
.set('spark.executor.memory', '2g') \
.set('spark.executor.cores', '2') \
.set('spark.cores.max', '56')
sc = SparkContext( conf=conf )
textFile = sc.textFile('file:///usr/local/spark/README.md')
wordCount = textFile.flatMap(lambda line: line.split()).map(lambda word: (word,1)).reduceByKey(lambda a, b : a + b)
for x in sorted(wordCount.collect()):
print(x)
print('\n'*2, '*'*20, '\n'*2)
textFile = sc.textFile('hdfs://mdw-1:9000/user/bda/README.md')
wordCount = textFile.flatMap(lambda line: line.split()).map(lambda word: (word,1)).reduceByKey(lambda a, b : a + b)
for x in sorted(wordCount.collect()):
print(x)
sc.stop()
5、另外,也可以把pyspark集成到ipython环境中
Findspark can add a startup file to the current IPython profile so that the environment vaiables will be properly set and pyspark will be imported upon IPython startup. This file is created when edit_profile
is set to true.
ipython --profile=myprofile
findspark.init('/path/to/spark_home', edit_profile=True)
参考文章:
How to configure an Apache Spark standalone cluster and integrate with Jupyter: Step-by-Step
配置Ipython Nodebook 运行 Python Spark 程序
jupyter notebook + pyspark 环境搭建
转载于:https://my.oschina.net/goopand/blog/2963135
推荐阅读
-
windows系统中Python多版本与jupyter notebook使用虚拟环境的过程
-
windows系统下jupyter notebook使用虚拟环境
-
PyCharm搭建Spark开发环境实现第一个pyspark程序
-
PySpark与GraphFrames的安装与使用环境搭建过程
-
PySpark与GraphFrames的安装与使用环境搭建过程
-
anaconda 环境新建/删除/拷贝 jupyter notebook上使用python虚拟环境 TensorFlow
-
Jupyter notebook 创建、切换Anaconda(Python)虚拟环境
-
多个环境配置Jupyter notebook
-
云服务centos搭建jupyter notebook并通过外网访问
-
jupyter notebook 不同环境安装