VS Code下的Spark(Scala)开发
VS Code下的Spark(Scala)开发
IntelliJ IDEA下开发Scala应用很智能,体验很好,但正版费用极高,为了避免版权收费问题,考虑使用开源免费的IDE,比如Eclipse,VS Code,而且VS Code小而美、功能全、执行快、跨平台,所以使用VS Code
安装Java JDK 8
很简单,不做啰嗦
安装Scala
官网下载Scala 2.12或2.11,Spark 3.0使用Scala2.12,,Spark3.0之前使用Scala2.11
下载安装msi安装包,傻瓜式安装,环境变量会自动配置
安装好后cmd 输入scala成功则会进入Scala 的交互式命令行Repo
C:\Users\cross>scala
Welcome to Scala 2.12.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131).
Type in expressions for evaluation. Or try :help.
安装SBT
官网下载最新的sbt msi安装包,一样傻瓜式,环境变量会自动配置
安装好后cmd 输入sbt --version成功会打印版本信息
C:\Users\cross>sbt --version
sbt version in this project: 1.4.5
sbt script version: 1.4.5
配置sbt使用阿里云的maven仓库,不然下包很慢
到目录:C:\Users\cross\.sbt\下创建repositories文件,内容:
[repositories]
local
aliyum-maven: https://maven.aliyun.com/repository/central
maven-central: https://repo1.maven.org/maven2/
sbt-plugin-repo: https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext]
配置使用全局仓库,C:\Program Files (x86)\sbt\conf\sbtconfig.txt,编辑增加:
-Dsbt.override.build.repos=true
安装VS Code
最新包exe版,直接安装
-
左侧扩展插件搜索 Scala (Metals) 安装
-
然后左侧插件按钮下会多出现一个Metals按钮,就是创建构建Scala sbt项目用的
-
安装中文插件搜索 Chinese (Simplified) Language Pack 就能找到
-
点卡设置按钮,找扩展>metals>custom repositories,输入框输入
https://maven.aliyun.com/repository/central
多个仓库用“|”分割,如果不配置的话使用的仍然是*仓库,下包很慢。。。
-
点击左侧的metals按钮,最下边Build Commands有个New Scala Project,点击创建,选择一个scala/hello-word.g8的模板,按提示enter确认,这样就创建一个scala hello-word项目
创建Spark应用
-
在上面创建的项目中修改project/build.properties,修改sbt版本为你安装的版本
-
修改项目根目录下build.sbt文件,修改scala版本为你安装的对应版本
-
build.sbt增加spark的依赖
scalaVersion := "2.12.12" libraryDependencies +="org.apache.spark"%%"spark-core"%"3.0.0" libraryDependencies +="org.apache.spark"%%"spark-sql"%"3.0.0"
保存时会提示import,或者点击左侧Metals按钮,点击Buid Commands下的import build,可能会报一些warn,找不到插件啥的,忽略即可
或者在VS code最下面的sbt控制台中输入:
sbt:sbt-spark> reload #重新载入项目文件 [info] welcome to sbt 1.4.5 (Oracle Corporation Java 1.8.0_131) [info] loading global plugins from C:\Users\cross\.sbt\1.0\plugins [info] loading settings for project sbt-spark-build-build from metals.sbt ... [info] loading project definition from D:\IdeaWorkspace\sbt-spark\project\project [info] loading settings for project sbt-spark-build from metals.sbt ... [info] loading project definition from D:\IdeaWorkspace\sbt-spark\project [success] Generated .bloop\sbt-spark-build.json [success] Total time: 1 s, completed 2020-12-24 16:20:52 [info] loading settings for project sbt-spark from build.sbt ... [info] set current project to sbt-spark (in build file:/D:/IdeaWorkspace/sbt-spark/) sbt:sbt-spark> update #更新下载依赖包 [warn] There may be incompatibilities among your library dependencies; run 'evicted' to see detailed eviction warnings. [success] Total time: 3 s, completed 2020-12-24 16:21:06 sbt:sbt-spark>
-
编写代码
修改src/main/scala/Main.scala文件:
import org.apache.spark.sql.SparkSession object Main { def main(args: Array[String]) = { println("hello scala!") val ss = SparkSession.builder .appName("example") .master("local") .getOrCreate() import ss.implicits._ ss.createDataset(1 to 10).show() ss.close() } }
保存即可
-
运行spark应用
在VS code最下面的sbt控制台命令行中输入:
sbt:hello-world> compile #编译项目代码 [success] Total time: 1 s, completed 2020-12-24 16:26:41 sbt:hello-world> run #运行项目 [info] running Main hello scala! Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 20/12/24 16:26:47 INFO SparkContext: Running Spark version 3.0.0 20/12/24 16:26:47 INFO ResourceUtils: ============================================================== 20/12/24 16:26:47 INFO ResourceUtils: Resources for spark.driver: 20/12/24 16:26:47 INFO ResourceUtils: ============================================================== 20/12/24 16:26:47 INFO SparkContext: Submitted application: example 20/12/24 16:26:47 INFO SecurityManager: Changing view acls to: cross 20/12/24 16:26:47 INFO SecurityManager: Changing modify acls to: cross 20/12/24 16:26:47 INFO SecurityManager: Changing view acls groups to: 20/12/24 16:26:47 INFO SecurityManager: Changing modify acls groups to: 20/12/24 16:26:47 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cross); groups with view permissions: Set(); users with modify permissions: Set(cross); groups with modify permissions: Set() 20/12/24 16:26:48 INFO Utils: Successfully started service 'sparkDriver' on port 14587. 20/12/24 16:26:48 INFO SparkEnv: Registering MapOutputTracker 20/12/24 16:26:48 INFO SparkEnv: Registering BlockManagerMaster 20/12/24 16:26:48 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 20/12/24 16:26:48 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 20/12/24 16:26:48 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 20/12/24 16:26:48 INFO DiskBlockManager: Created local directory at C:\Users\cross\AppData\Local\Temp\blockmgr-414c5e00-d699-48ac-8c04-fe2f9b0bc4a8 20/12/24 16:26:48 INFO MemoryStore: MemoryStore started with capacity 383.7 MiB 20/12/24 16:26:48 INFO SparkEnv: Registering OutputCommitCoordinator 20/12/24 16:26:48 INFO Utils: Successfully started service 'SparkUI' on port 4040. 20/12/24 16:26:49 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://DELL-G7:4040 20/12/24 16:26:49 INFO Executor: Starting executor ID driver on host DELL-G7 20/12/24 16:26:49 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 14607. 20/12/24 16:26:49 INFO NettyBlockTransferService: Server created on DELL-G7:14607 20/12/24 16:26:49 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 20/12/24 16:26:49 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, DELL-G7, 14607, None) 20/12/24 16:26:49 INFO BlockManagerMasterEndpoint: Registering block manager DELL-G7:14607 with 383.7 MiB RAM, BlockManagerId(driver, DELL-G7, 14607, None) 20/12/24 16:26:49 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, DELL-G7, 14607, None) 20/12/24 16:26:49 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, DELL-G7, 14607, None) 20/12/24 16:26:50 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/D:/IdeaWorkspace/hello-world/spark-warehouse'). 20/12/24 16:26:50 INFO SharedState: Warehouse path is 'file:/D:/IdeaWorkspace/hello-world/spark-warehouse'. 20/12/24 16:26:51 INFO CodeGenerator: Code generated in 200.9859 ms 20/12/24 16:26:51 INFO CodeGenerator: Code generated in 5.3017 ms 20/12/24 16:26:51 INFO CodeGenerator: Code generated in 21.0971 ms +-----+ |value| +-----+ | 1| | 2| | 3| | 4| | 5| | 6| | 7| | 8| | 9| | 10| +-----+ 20/12/24 16:26:51 INFO SparkUI: Stopped Spark web UI at http://DELL-G7:4040 20/12/24 16:26:51 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 20/12/24 16:26:51 INFO MemoryStore: MemoryStore cleared 20/12/24 16:26:51 INFO BlockManager: BlockManager stopped 20/12/24 16:26:51 INFO BlockManagerMaster: BlockManagerMaster stopped 20/12/24 16:26:51 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 20/12/24 16:26:51 INFO SparkContext: Successfully stopped SparkContext [success] Total time: 5 s, completed 2020-12-24 16:26:51 sbt:hello-world>
如果见到以上打印的Dataset ,那么恭喜,项目执行成功!
文档依据
scala官网:https://docs.scala-lang.org/getting-started/index.html#with-an-ide
spark官网:http://spark.apache.org/
推荐阅读
-
VS Code下的Spark(Scala)开发
-
在vs2008中使用AJAX开发.net 2.0下的Web程序的方法
-
在vs2008中使用AJAX开发.net 2.0下的Web程序的方法
-
Win10环境下VS Code配置go语言golang开发插件
-
vs code远程开发的配置
-
2021年值得向Python开发者推荐的VS Code扩展插件
-
Ubuntu下安装并配置VS Code编译C++的方法
-
利用VS Code开发你的第一个AngularJS 2应用程序
-
Ubuntu下安装并配置VS Code编译C++的方法
-
vs code怎么搭建NodeJs的开发环境? vscode运行nodejs代码的技巧