spark eclipse写wordcount
程序员文章站
2022-05-22 17:06:50
...
安装spark,见上文
http://blackproof.iteye.com/blog/2182393
配置window开发环境
window安装scala
下载scala http://www.scala-lang.org/files/archive/scala-2.10.4.msi
安装即可
window配置eclipse
下载eclipse
解压即可
写wordcount scala代码
创建scala 项目
新建scala object,命名WordCount
代码如下
package com.dirk.test
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext.rddToPairRDDFunctions
import scala.collection.mutable.ListBuffer
import org.apache.spark.SparkConf
object WordCount {
def main(args: Array[String]){
if(args.length != 3){
println("usage: com.qiurc.test.WordCount <master> <input> <output>")
return
}
val jars = ListBuffer[String]()
jars.+=("/home/hadoop-cdh/app/test/sparktest/aa.jar") //aa.jar发布位置
val conf = new SparkConf()
conf.setMaster(args(0))//设置spark master url
.setAppName("word count")
.setJars(jars)//解决找不到jar包的问题
.set("spark.executor.memory","200m")
val sc = new SparkContext(conf)
val textFile = sc.textFile(args(1))
val result = textFile.flatMap(_.split(" "))
.map(word => (word, 1)).reduceByKey(_ + _)
result.saveAsTextFile(args(2))
}
}
打包scala项目,和java项目打jar相同,名为aa.jar,
jar包发布位置为
/home/hadoop-cdh/app/test/sparktest/aa.jar
发布到spark服务器上
写运行脚本
#!/usr/bin/env bash $SPARK_HOME/bin/spark-submit --name SparkWordCount --class com.dirk.test.WordCount --master spark://host143:7077 --executor-memory 512M --total-executor-cor es 1 aa.jar spark://host143:7077 hdfs://XXX/user/dirk.zhang/data/word.txt hdfs://XXX/user/dirk.zhang/output
遇到的问题
1.参数解释 参数1为spark master url,参数2为hdfs输入,参数3位hdfs输出,xxx为HA的defaultFS
2.找不到class文件的原因是缺少setJars(jars)
参考
http://bit1129.iteye.com/blog/2172164
http://www.tuicool.com/articles/qq2mQj
推荐阅读
-
spark-windows(含eclipse配置)下本地开发环境搭建
-
.Net for Spark 实现 WordCount 应用及调试入坑详解
-
Win7 Eclipse 搭建spark java1.8环境:WordCount helloworld例子
-
eclipse、myeclipse写类时,自动生成注释
-
hadoop学习笔记——用python写wordcount程序
-
使用windows下的Eclipse或者IDEA远程连接Linux的Hadoop并运行wordcount
-
基于spark开发wordcount案例
-
python、scala、java分别实现在spark上实现WordCount
-
用Eclipse写python程序
-
Win7 Eclipse 搭建spark java1.8(lambda)环境:WordCount helloworld例子