欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

VS Code下的Spark(Scala)开发

程序员文章站 2024-03-12 23:38:09
...

VS Code下的Spark(Scala)开发

IntelliJ IDEA下开发Scala应用很智能,体验很好,但正版费用极高,为了避免版权收费问题,考虑使用开源免费的IDE,比如Eclipse,VS Code,而且VS Code小而美、功能全、执行快、跨平台,所以使用VS Code

安装Java JDK 8

很简单,不做啰嗦

安装Scala

官网下载Scala 2.12或2.11,Spark 3.0使用Scala2.12,,Spark3.0之前使用Scala2.11

下载安装msi安装包,傻瓜式安装,环境变量会自动配置

安装好后cmd 输入scala成功则会进入Scala 的交互式命令行Repo

C:\Users\cross>scala
Welcome to Scala 2.12.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131).
Type in expressions for evaluation. Or try :help.

安装SBT

官网下载最新的sbt msi安装包,一样傻瓜式,环境变量会自动配置

安装好后cmd 输入sbt --version成功会打印版本信息

C:\Users\cross>sbt --version
sbt version in this project: 1.4.5
sbt script version: 1.4.5

配置sbt使用阿里云的maven仓库,不然下包很慢

到目录:C:\Users\cross\.sbt\下创建repositories文件,内容:

[repositories]
local
aliyum-maven: https://maven.aliyun.com/repository/central
maven-central: https://repo1.maven.org/maven2/
sbt-plugin-repo: https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext]

配置使用全局仓库,C:\Program Files (x86)\sbt\conf\sbtconfig.txt,编辑增加:

-Dsbt.override.build.repos=true

安装VS Code

最新包exe版,直接安装

  • 左侧扩展插件搜索 Scala (Metals) 安装

  • 然后左侧插件按钮下会多出现一个Metals按钮,就是创建构建Scala sbt项目用的

  • 安装中文插件搜索 Chinese (Simplified) Language Pack 就能找到

  • 点卡设置按钮,找扩展>metals>custom repositories,输入框输入

    https://maven.aliyun.com/repository/central
    

    多个仓库用“|”分割,如果不配置的话使用的仍然是*仓库,下包很慢。。。

  • 点击左侧的metals按钮,最下边Build Commands有个New Scala Project,点击创建,选择一个scala/hello-word.g8的模板,按提示enter确认,这样就创建一个scala hello-word项目

创建Spark应用

  • 在上面创建的项目中修改project/build.properties,修改sbt版本为你安装的版本

  • 修改项目根目录下build.sbt文件,修改scala版本为你安装的对应版本

  • build.sbt增加spark的依赖

    scalaVersion := "2.12.12"
    libraryDependencies +="org.apache.spark"%%"spark-core"%"3.0.0"
    libraryDependencies +="org.apache.spark"%%"spark-sql"%"3.0.0"
    

    保存时会提示import,或者点击左侧Metals按钮,点击Buid Commands下的import build,可能会报一些warn,找不到插件啥的,忽略即可

    或者在VS code最下面的sbt控制台中输入:

    sbt:sbt-spark> reload #重新载入项目文件
    [info] welcome to sbt 1.4.5 (Oracle Corporation Java 1.8.0_131)
    [info] loading global plugins from C:\Users\cross\.sbt\1.0\plugins
    [info] loading settings for project sbt-spark-build-build from metals.sbt ...
    [info] loading project definition from D:\IdeaWorkspace\sbt-spark\project\project
    [info] loading settings for project sbt-spark-build from metals.sbt ...
    [info] loading project definition from D:\IdeaWorkspace\sbt-spark\project
    [success] Generated .bloop\sbt-spark-build.json
    [success] Total time: 1 s, completed 2020-12-24 16:20:52
    [info] loading settings for project sbt-spark from build.sbt ...
    [info] set current project to sbt-spark (in build file:/D:/IdeaWorkspace/sbt-spark/)
    sbt:sbt-spark> update #更新下载依赖包
    [warn] There may be incompatibilities among your library dependencies; run 'evicted' to see detailed eviction warnings.
    [success] Total time: 3 s, completed 2020-12-24 16:21:06
    sbt:sbt-spark> 
    
  • 编写代码

    修改src/main/scala/Main.scala文件:

    import org.apache.spark.sql.SparkSession
    object Main  {
      def main(args: Array[String]) = {
        println("hello scala!")
        val ss = SparkSession.builder
          .appName("example")
          .master("local")
          .getOrCreate()
          import ss.implicits._
          ss.createDataset(1 to 10).show()
          ss.close()
      }
    }
    

    保存即可

  • 运行spark应用

    在VS code最下面的sbt控制台命令行中输入:

    sbt:hello-world> compile    #编译项目代码
    [success] Total time: 1 s, completed 2020-12-24 16:26:41
    sbt:hello-world> run        #运行项目
    [info] running Main
    hello scala!
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    20/12/24 16:26:47 INFO SparkContext: Running Spark version 3.0.0
    20/12/24 16:26:47 INFO ResourceUtils: ==============================================================
    20/12/24 16:26:47 INFO ResourceUtils: Resources for spark.driver:
    
    20/12/24 16:26:47 INFO ResourceUtils: ==============================================================
    20/12/24 16:26:47 INFO SparkContext: Submitted application: example
    20/12/24 16:26:47 INFO SecurityManager: Changing view acls to: cross
    20/12/24 16:26:47 INFO SecurityManager: Changing modify acls to: cross
    20/12/24 16:26:47 INFO SecurityManager: Changing view acls groups to:
    20/12/24 16:26:47 INFO SecurityManager: Changing modify acls groups to:
    20/12/24 16:26:47 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(cross); groups with view permissions: Set(); users  with modify permissions: Set(cross); groups with modify permissions: Set()
    20/12/24 16:26:48 INFO Utils: Successfully started service 'sparkDriver' on port 14587.
    20/12/24 16:26:48 INFO SparkEnv: Registering MapOutputTracker
    20/12/24 16:26:48 INFO SparkEnv: Registering BlockManagerMaster
    20/12/24 16:26:48 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
    20/12/24 16:26:48 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
    20/12/24 16:26:48 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
    20/12/24 16:26:48 INFO DiskBlockManager: Created local directory at C:\Users\cross\AppData\Local\Temp\blockmgr-414c5e00-d699-48ac-8c04-fe2f9b0bc4a8
    20/12/24 16:26:48 INFO MemoryStore: MemoryStore started with capacity 383.7 MiB
    20/12/24 16:26:48 INFO SparkEnv: Registering OutputCommitCoordinator
    20/12/24 16:26:48 INFO Utils: Successfully started service 'SparkUI' on port 4040.
    20/12/24 16:26:49 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://DELL-G7:4040
    20/12/24 16:26:49 INFO Executor: Starting executor ID driver on host DELL-G7
    20/12/24 16:26:49 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 14607.
    20/12/24 16:26:49 INFO NettyBlockTransferService: Server created on DELL-G7:14607
    20/12/24 16:26:49 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
    20/12/24 16:26:49 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, DELL-G7, 14607, None)
    20/12/24 16:26:49 INFO BlockManagerMasterEndpoint: Registering block manager DELL-G7:14607 with 383.7 MiB RAM, BlockManagerId(driver, DELL-G7, 14607, None)
    20/12/24 16:26:49 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, DELL-G7, 14607, None)
    20/12/24 16:26:49 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, DELL-G7, 14607, None)
    20/12/24 16:26:50 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/D:/IdeaWorkspace/hello-world/spark-warehouse').
    20/12/24 16:26:50 INFO SharedState: Warehouse path is 'file:/D:/IdeaWorkspace/hello-world/spark-warehouse'.
    20/12/24 16:26:51 INFO CodeGenerator: Code generated in 200.9859 ms
    20/12/24 16:26:51 INFO CodeGenerator: Code generated in 5.3017 ms
    20/12/24 16:26:51 INFO CodeGenerator: Code generated in 21.0971 ms
    +-----+
    |value|
    +-----+
    |    1|
    |    2|
    |    3|
    |    4|
    |    5|
    |    6|
    |    7|
    |    8|
    |    9|
    |   10|
    +-----+
    
    20/12/24 16:26:51 INFO SparkUI: Stopped Spark web UI at http://DELL-G7:4040
    20/12/24 16:26:51 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
    20/12/24 16:26:51 INFO MemoryStore: MemoryStore cleared
    20/12/24 16:26:51 INFO BlockManager: BlockManager stopped
    20/12/24 16:26:51 INFO BlockManagerMaster: BlockManagerMaster stopped
    20/12/24 16:26:51 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
    20/12/24 16:26:51 INFO SparkContext: Successfully stopped SparkContext
    [success] Total time: 5 s, completed 2020-12-24 16:26:51
    sbt:hello-world> 
    

    如果见到以上打印的Dataset ,那么恭喜,项目执行成功!

文档依据

scala官网:https://docs.scala-lang.org/getting-started/index.html#with-an-ide

spark官网:http://spark.apache.org/

相关标签: Scala Spark