spark-2.4.0-hadoop2.7-简单操作
程序员文章站
2022-03-29 19:26:02
1. 说明 本文基于:spark-2.4.0-hadoop2.7-高可用(HA)安装部署 2. 启动Spark Shell 在任意一台有spark的机器上执行 注意: 如果启动spark shell时没有指定master地址,但是也可以正常启动spark shell和执行spark shell中的程 ......
1. 说明
本文基于:spark-2.4.0-hadoop2.7-高可用(ha)安装部署
2. 启动spark shell
在任意一台有spark的机器上执行
1 # --master spark://mini02:7077 连接spark的master,这个master的状态为alive,而不是standby 2 # --total-executor-cores 2 总共占用2核cpu 3 # --executor-memory 512m 每个woker占用512m内存 4 [yun@mini03 ~]$ spark-shell --master spark://mini02:7077 --total-executor-cores 2 --executor-memory 512m 5 2018-11-25 12:07:39 warn nativecodeloader:62 - unable to load native-hadoop library for your platform... using builtin-java classes where applicable 6 setting default log level to "warn". 7 to adjust logging level use sc.setloglevel(newlevel). for sparkr, use setloglevel(newlevel). 8 spark context web ui available at http://mini03:4040 9 spark context available as 'sc' (master = spark://mini02:7077, app id = app-20181125120746-0001). 10 spark session available as 'spark'. 11 welcome to 12 ____ __ 13 / __/__ ___ _____/ /__ 14 _\ \/ _ \/ _ `/ __/ '_/ 15 /___/ .__/\_,_/_/ /_/\_\ version 2.4.0 16 /_/ 17 18 using scala version 2.11.12 (java hotspot(tm) 64-bit server vm, java 1.8.0_112) 19 type in expressions to have them evaluated. 20 type :help for more information. 21 22 scala> sc 23 res0: org.apache.spark.sparkcontext = org.apache.spark.sparkcontext@77e1b84c
注意:
如果启动spark shell时没有指定master地址,但是也可以正常启动spark shell和执行spark shell中的程序,其实是启动了spark的local模式,该模式仅在本机启动一个进程,没有与集群建立联系。
2.1. 相关截图
3. 执行第一个spark程序
该算法是利用蒙特•卡罗算法求pi
1 [yun@mini03 ~]$ spark-submit \ 2 --class org.apache.spark.examples.sparkpi \ 3 --master spark://mini02:7077 \ 4 --total-executor-cores 2 \ 5 --executor-memory 512m \ 6 /app/spark/examples/jars/spark-examples_2.11-2.4.0.jar 100 7 # 打印的信息如下: 8 2018-11-25 12:25:42 warn nativecodeloader:62 - unable to load native-hadoop library for your platform... using builtin-java classes where applicable 9 2018-11-25 12:25:43 info sparkcontext:54 - running spark version 2.4.0 10 ……………… 11 2018-11-25 12:25:49 info tasksetmanager:54 - finished task 97.0 in stage 0.0 (tid 97) in 20 ms on 172.16.1.14 (executor 0) (98/100) 12 2018-11-25 12:25:49 info tasksetmanager:54 - finished task 98.0 in stage 0.0 (tid 98) in 26 ms on 172.16.1.13 (executor 1) (99/100) 13 2018-11-25 12:25:49 info tasksetmanager:54 - finished task 99.0 in stage 0.0 (tid 99) in 25 ms on 172.16.1.14 (executor 0) (100/100) 14 2018-11-25 12:25:49 info taskschedulerimpl:54 - removed taskset 0.0, whose tasks have all completed, from pool 15 2018-11-25 12:25:49 info dagscheduler:54 - resultstage 0 (reduce at sparkpi.scala:38) finished in 3.881 s 16 2018-11-25 12:25:49 info dagscheduler:54 - job 0 finished: reduce at sparkpi.scala:38, took 4.042591 s 17 pi is roughly 3.1412699141269913 18 ………………
4. spark shell求word count 【结合hadoop】
1、启动hadoop
2、将文件放到hadoop中
1 [yun@mini05 sparkwordcount]$ cat wc.info 2 zhang linux 3 linux tom 4 zhan kitty 5 tom linux 6 [yun@mini05 sparkwordcount]$ hdfs dfs -ls / 7 found 4 items 8 drwxr-xr-x - yun supergroup 0 2018-11-16 11:36 /hbase 9 drwx------ - yun supergroup 0 2018-11-14 23:42 /tmp 10 drwxr-xr-x - yun supergroup 0 2018-11-14 23:42 /wordcount 11 -rw-r--r-- 3 yun supergroup 16402010 2018-11-14 23:39 /zookeeper-3.4.5.tar.gz 12 [yun@mini05 sparkwordcount]$ hdfs dfs -mkdir -p /sparkwordcount/input 13 [yun@mini05 sparkwordcount]$ hdfs dfs -put wc.info /sparkwordcount/input/1.info 14 [yun@mini05 sparkwordcount]$ hdfs dfs -put wc.info /sparkwordcount/input/2.info 15 [yun@mini05 sparkwordcount]$ hdfs dfs -put wc.info /sparkwordcount/input/3.info 16 [yun@mini05 sparkwordcount]$ hdfs dfs -put wc.info /sparkwordcount/input/4.info 17 [yun@mini05 sparkwordcount]$ hdfs dfs -ls /sparkwordcount/input 18 found 4 items 19 -rw-r--r-- 3 yun supergroup 45 2018-11-25 14:41 /sparkwordcount/input/1.info 20 -rw-r--r-- 3 yun supergroup 45 2018-11-25 14:41 /sparkwordcount/input/2.info 21 -rw-r--r-- 3 yun supergroup 45 2018-11-25 14:41 /sparkwordcount/input/3.info 22 -rw-r--r-- 3 yun supergroup 45 2018-11-25 14:41 /sparkwordcount/input/4.info
3、进入spark shell命令行,并计算
1 [yun@mini03 ~]$ spark-shell --master spark://mini02:7077 --total-executor-cores 2 --executor-memory 512m 2 # 计算完毕后,打印在命令行 3 scala> sc.textfile("hdfs://mini01:9000/sparkwordcount/input").flatmap(_.split(" ")).map((_, 1)).reducebykey(_+_).sortby(_._2, false).collect 4 res6: array[(string, int)] = array((linux,12), (tom,8), (kitty,4), (zhan,4), ("",4), (zhang,4)) 5 # 计算完毕后,保存在hdfs【因为有多个文件组成,则有多个reduce,所以输出有多个文件】 6 scala> sc.textfile("hdfs://mini01:9000/sparkwordcount/input").flatmap(_.split(" ")).map((_, 1)).reducebykey(_+_).sortby(_._2, false).saveastextfile("hdfs://mini01:9000/sparkwordcount/output") 7 # 计算完毕后,保存在hdfs【将reduce设置为1,输出就只有一个文件】 8 scala> sc.textfile("hdfs://mini01:9000/sparkwordcount/input").flatmap(_.split(" ")).map((_, 1)).reducebykey(_+_, 1).sortby(_._2, false).saveastextfile("hdfs://mini01:9000/sparkwordcount/output1")
4、在hdfs的查看结算结果
1 [yun@mini05 sparkwordcount]$ hdfs dfs -ls /sparkwordcount/ 2 found 2 items 3 drwxr-xr-x - yun supergroup 0 2018-11-25 15:03 /sparkwordcount/input 4 drwxr-xr-x - yun supergroup 0 2018-11-25 15:05 /sparkwordcount/output 5 drwxr-xr-x - yun supergroup 0 2018-11-25 15:07 /sparkwordcount/output1 6 [yun@mini05 sparkwordcount]$ hdfs dfs -ls /sparkwordcount/output 7 found 5 items 8 -rw-r--r-- 3 yun supergroup 0 2018-11-25 15:05 /sparkwordcount/output/_success 9 -rw-r--r-- 3 yun supergroup 0 2018-11-25 15:05 /sparkwordcount/output/part-00000 10 -rw-r--r-- 3 yun supergroup 11 2018-11-25 15:05 /sparkwordcount/output/part-00001 11 -rw-r--r-- 3 yun supergroup 8 2018-11-25 15:05 /sparkwordcount/output/part-00002 12 -rw-r--r-- 3 yun supergroup 34 2018-11-25 15:05 /sparkwordcount/output/part-00003 13 [yun@mini05 sparkwordcount]$ 14 [yun@mini05 sparkwordcount]$ hdfs dfs -cat /sparkwordcount/output/part* 15 (linux,12) 16 (tom,8) 17 (,4) 18 (zhang,4) 19 (kitty,4) 20 (zhan,4) 21 ############################################### 22 [yun@mini05 sparkwordcount]$ hdfs dfs -ls /sparkwordcount/output1 23 found 2 items 24 -rw-r--r-- 3 yun supergroup 0 2018-11-25 15:07 /sparkwordcount/output1/_success 25 -rw-r--r-- 3 yun supergroup 53 2018-11-25 15:07 /sparkwordcount/output1/part-00000 26 [yun@mini05 sparkwordcount]$ hdfs dfs -cat /sparkwordcount/output1/part-00000 27 (linux,12) 28 (tom,8) 29 (,4) 30 (zhang,4) 31 (kitty,4) 32 (zhan,4)
上一篇: jQuery day02
下一篇: 互联网产品怎么做数据埋点