表关联demo
程序员文章站
2022-07-03 23:21:26
...
1.建立两张表,第1张表有学生姓名和出生省份数据,第2张表有学生姓名和英语成绩数据,用map-reduce程序来统计同一省份的学生英语平均成绩。
//表1
chenhanghang beijing
lihao shandong
qufang shanxi
zhangsan beijing
//表2
chenhanghang 100
lihao 87
qufang 99
zhangsan 88
object Average {
def main(args :Array[String]):Unit ={
val conf=new SparkConf().setAppName("averageScore").setMaster("local")
val sc=new SparkContext(conf)
val palce =sc.textFile("F:\\1.txt")
val score =sc.textFile("F:\\2.txt")
val placeRDD=palce.map(line => {
val words=line.split(" ")
(words(0),words(1))
})
val scoreRDD=score.map(line => {
val words = line.split(" ")
(words(0),words(1).toDouble)
})
//进行关联
val sumRDD =placeRDD.join(scoreRDD).values.groupByKey().mapValues(x =>{
var count=0
var sum:Double=0
x.foreach(y => {
sum += y
count += 1
})
sum/count
})
sumRDD.foreach(println)
}
}
结果
(shandong,87.0)
(shanxi,99.0)
(beijing,94.0)
上一篇: Java中的加号+