Spark算子实现WordCount
程序员文章站
2022-03-27 14:14:02
...
1 map + reduceByKey
sparkContext.textFile("hdfs://ifeng:9000/hdfsapi/wc.txt")
.flatMap(_.split(","))
.map((_,1))
.reduceByKey(_+_).collect()
2 countByValue代替map + reduceByKey
val RDDfile = sparkContext.textFile("hdfs://ifeng:9000/hdfsapi/wc.txt")
RDDfile.flatMap(_.split(",")).countByValue.foreach(println)
3 aggregateByKey
RDDfile.flatMap(_.split(",")).map((_,1)).aggregateByKey(0)(_ + _, _ + _).collect().foreach(println)
4 foldByKey
RDDfile.flatMap(_.split(",")).map((_,1)).foldByKey(0)(_ + _).collect().foreach(println)
5 groupByKey+map
RDDfile.flatMap(_.split(",")).map((_, 1)).groupByKey().map(tuple => {
(tuple._1, tuple._2.sum)
}).collect().foreach(println)
6 combineByKey
RDDfile.flatMap(_.split(",")).map((_, 1)).combineByKey(
x => x,
(x: Int, y: Int) => x + y,
(x: Int, y: Int) => x + y
).collect().foreach(println)
推荐阅读
-
PyCharm搭建Spark开发环境实现第一个pyspark程序
-
Spark SQL join的三种实现方式
-
.Net for Spark 实现 WordCount 应用及调试入坑详解
-
Spark SQL join的三种实现方式
-
PyCharm搭建Spark开发环境的实现步骤
-
Win7 Eclipse 搭建spark java1.8环境:WordCount helloworld例子
-
Spark的算子:方法、函数
-
Spark算子之aggregateByKey、aggregate详解
-
利用mapWithState实现按照首字母统计的有状态的wordCount
-
php和js实现根据子网掩码和ip计算子网功能示例