欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Spark在MaxCompute的运行方式 mavenidea 

程序员文章站 2022-06-30 10:13:08
...
一、Spark系统概述
===========

![image](https://yqfile.alicdn.com/635018bd6813c3e8568bb4771e296407741cf94a.png)

左侧是原生Spark的架构图,右边Spark on MaxCompute运行在阿里云自研的Cupid的平台之上,该平台可以原生支持开源社区Yarn所支持的计算框架,如Spark等。

二、Spark运行在客户端的配置和使用
===================

**2.1打开链接下载客户端到本地**

[http://odps-repo.oss-cn-hangzhou.aliyuncs.com/spark/2.3.0-odps0.30.0/spark-2.3.0-odps0.30.0.tar.gz?spm=a2c4g.11186623.2.12.666a4b69yO8Qur&file=spark-2.3.0-odps0.30.0.tar.gz](https://yq.aliyun.com/go/articleRenderRedirect?url=http%3A%2F%2Fodps-repo.oss-cn-hangzhou.aliyuncs.com%2Fspark%2F2.3.0-odps0.30.0%2Fspark-2.3.0-odps0.30.0.tar.gz%3Fspm%3Da2c4g.11186623.2.12.666a4b69yO8Qur%26amp%3Bfile%3Dspark-2.3.0-odps0.30.0.tar.gz)

**2.2将文件上传的ECS上**

![image](https://yqfile.alicdn.com/b94dd979ed68c55a5a9805c1b945977f4bbc92b1.png)

**2.3将文件解压**

```
tar -zxvf spark-2.3.0-odps0.30.0.tar.gz

```

![](data:image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==)![](data:image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw== "点击并拖拽以移动")

**2.4配置Spark-default.conf**

```
# spark-defaults.conf
# 一般来说默认的template只需要再填上MaxCompute相关的账号信息就可以使用Spark
spark.hadoop.odps.project.name =
spark.hadoop.odps.access.id =
spark.hadoop.odps.access.key =

# 其他的配置保持自带值一般就可以了
spark.hadoop.odps.end.point = http://service.cn.maxcompute.aliyun.com/api
spark.hadoop.odps.runtime.end.point = http://service.cn.maxcompute.aliyun-inc.com/api
spark.sql.catalogImplementation=odps
spark.hadoop.odps.task.major.version = cupid_v2
spark.hadoop.odps.cupid.container.image.enable = true
spark.hadoop.odps.cupid.container.vm.engine.type = hyper

```

![](data:image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==)![](data:image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw== "点击并拖拽以移动")

**2.5在github上下载对应代码**

[https://github.com/aliyun/MaxCompute-Spark](https://yq.aliyun.com/go/articleRenderRedirect?url=https%3A%2F%2Fgithub.com%2Faliyun%2FMaxCompute-Spark)

**2.5将代码上传到ECS上进行解压**

```
unzip MaxCompute-Spark-master.zip

```

![](data:image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==)![](data:image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw== "点击并拖拽以移动")

**2.6将代码打包成jar包(确保安装Maven)**

```
cd MaxCompute-Spark-master/spark-2.x
mvn clean package

```

![](data:image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==)![](data:image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw== "点击并拖拽以移动")

**2.7查看jar包,并进行运行**

```
bin/spark-submit --master yarn-cluster --class com.aliyun.odps.spark.examples.SparkPi \
MaxCompute-Spark-master/spark-2.x/target/spark-examples_2.11-1.0.0-SNAPSHOT-shaded.jar

```

![](data:image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==)![](data:image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw== "点击并拖拽以移动")

三、Spark运行在DataWorks的配置和使用
=========================

**3.1进入DataWorks控制台界面,点击业务流程**

![image](https://yqfile.alicdn.com/66b37e79f22d5bd0595f7d350d0a6ee3294fd1d7.png)

**3.2打开业务流程,创建ODPS Spark节点**

![image](https://yqfile.alicdn.com/c837f4832d9450d9655e43b2ee59d8a9111a0170.png)

**3.3上传jar包资源,点击对应的jar包上传,并提交**

![image](https://yqfile.alicdn.com/8761c7e4e0b5721560468818b179df6fd2d5ecba.png)

![image](https://yqfile.alicdn.com/32cea69a7af6a17ede781cbf0059254cf88ba0cc.png)

![image](https://yqfile.alicdn.com/355955189400fb15244ae4636db6e9f433fd7304.png)

**3.4配置对应ODPS Spark的节点配置点击保存并提交,点击运行查看运行状态**

![image](https://yqfile.alicdn.com/1c21dbab3fe01463ccb50b9d44ca80341ccfd096.png)

四、Spark在本地idea测试环境的使用
=====================

**4.1下载客户端与模板代码并解压**

客户端: 
[http://odps-repo.oss-cn-hangzhou.aliyuncs.com/spark/2.3.0-odps0.30.0/spark-2.3.0-odps0.30.0.tar.gz?spm=a2c4g.11186623.2.12.666a4b69yO8Qur&file=spark-2.3.0-odps0.30.0.tar.gz](https://yq.aliyun.com/go/articleRenderRedirect?url=http%3A%2F%2Fodps-repo.oss-cn-hangzhou.aliyuncs.com%2Fspark%2F2.3.0-odps0.30.0%2Fspark-2.3.0-odps0.30.0.tar.gz%3Fspm%3Da2c4g.11186623.2.12.666a4b69yO8Qur%26amp%3Bfile%3Dspark-2.3.0-odps0.30.0.tar.gz)

![image](https://yqfile.alicdn.com/23358ff3b14cdfd918a32e8e977293f71c37bc79.png)

模板代码:

[https://github.com/aliyun/MaxCompute-Spark](https://yq.aliyun.com/go/articleRenderRedirect?url=https%3A%2F%2Fgithub.com%2Faliyun%2FMaxCompute-Spark)

**4.2打开idea,点击Open选择模板代码**

![image](https://yqfile.alicdn.com/0c9285cfe7c28416dc4ca136df4be1589947de68.png)

![image](https://yqfile.alicdn.com/016e728559b509302c319650f5b0bf51f231acd2.png)

**4.2安装Scala插件**

![image](https://yqfile.alicdn.com/3000a4ca1acaddadf0cc48d8636441325190c0cb.png)

![image](https://yqfile.alicdn.com/8ce7b393d37dfd4dd5d286ebf5c244ae7b748bd9.png)

**4.3配置maven**

![image](https://yqfile.alicdn.com/f2ee8b48b5bbbd73145353ecdea99b2f2224eedc.png)

**4.4配置JDK和相关依赖**

![image](https://yqfile.alicdn.com/8fc464aea6528cd484a9c21f2ff919e3937c8b0c.png)

![image](https://yqfile.alicdn.com/cfd709d6a34da2061be6a8c39a30bfefc08d44bb.png)
相关标签: maven idea

上一篇: TDD测试驱动开发

下一篇: Karam与TDD