Spark第一个程序
程序员文章站
2022-04-01 15:41:35
...
spark中运行项目有两种方式
-
在spark-shell中导入依赖,编写代码,执行
-
和在idea中写好应用,使用spark-submit方式提交到spark运行
Spark-Shell
打开spark-shell
输入程序,读取文件data.txt里面的内容:
val lines = sc.textFile("/Users/jeremy/Documents/data.txt")
lines.first()
lines.count()
先看下文件里面的内容:
结果:
在shell中SparkContext 自动创建好了, 就是我们上面的sc
Idea 打jar 包submit 到spark
sbt添加依赖:
libraryDependencies += "org.apache.spark" %% "spark-core" % "3.0.1"
编写wordcount
import org.apache.spark._
object WordCount {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("wordcount")
val sc = new SparkContext(conf)
val input = sc.textFile("/Users/jeremy/Documents/data.txt")
val lines = input.flatMap(line => line.split(' '))
val count = lines.map(word => (word, 1)).reduceByKey{case (x,y) => x+y}
val output = count.saveAsTextFile("/Users/jeremy/Documents/res.txt")
}
}
打包
创建wordcount Artifacts
file -> Project Structure -> Artifacts -> + -> JAR -> Empty
build, 点击build, jar生成在当前项目out下
启动Spark
启动master: ./sbin/start-master.sh
./sbin/start-master.sh
启动master后可以在http://localhost:8080/ 查看master信息, 可以看到还没有worker
启动worker: ./bin/spark-class
./bin/spark-class org.apache.spark.deploy.worker.Worker spark://JeremydeiMac-Pro.local:7077
可以看到已经启动了worker
submit 打包好的jar包
提交作业: ./bin/spark-submit
./bin/spark-submit --master spark://JeremydeiMac-Pro.local:7077 --class WordCount /Users/jeremy/Project/IdeaProjects/Test/out/artifacts/wordcount/wordcount.jar
可以看到我们的wordcount作业完成了
结果
看下程序保存文件下的结果, 计数正确
val output = count.saveAsTextFile("/Users/jeremy/Documents/res.txt")