hadoop-yarn集群中,利用hell脚本自动化提交spark任务
程序员文章站
2022-06-21 21:56:55
spark_submit.sh
#!/bin/sh
# spark_submit.sh
# 这是提交spark任务到yarn分布式集群上的自动化脚本
export HADO...
spark_submit.sh
#!/bin/sh # spark_submit.sh # 这是提交spark任务到yarn分布式集群上的自动化脚本 export HADOOP_HOME=/home/elon/hadoop/hadoop-2.7.5 spark-submit --master yarn --deploy-mode client --class org.training.examples.WordCount /home/elon/jars/examples-1.0-SNAPSHOT.jar yarn file:///home/elon/spark-2.2.1/README.md
控制台输出
[elon@hadoop1 shell]$ spark_submit.sh masterUrl:yarn, inputPath: file:///home/elon/spark-2.2.1/README.md 18/02/11 12:07:17 INFO spark.SparkContext: Running Spark version 2.2.1 18/02/11 12:07:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/02/11 12:07:19 INFO spark.SparkContext: Submitted application: WordCount 18/02/11 12:07:19 INFO spark.SecurityManager: Changing view acls to: elon 18/02/11 12:07:19 INFO spark.SecurityManager: Changing modify acls to: elon 18/02/11 12:07:19 INFO spark.SecurityManager: Changing view acls groups to: 18/02/11 12:07:19 INFO spark.SecurityManager: Changing modify acls groups to: 18/02/11 12:07:19 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(elon); groups with view permissions: Set(); users with modify permissions: Set(elon); groups with modify permissions: Set() 18/02/11 12:07:20 INFO util.Utils: Successfully started service 'sparkDriver' on port 45560. 18/02/11 12:07:20 INFO spark.SparkEnv: Registering MapOutputTracker 18/02/11 12:07:20 INFO spark.SparkEnv: Registering BlockManagerMaster 18/02/11 12:07:20 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 18/02/11 12:07:20 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 18/02/11 12:07:20 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-06b84a42-039f-41c5-a4d1-b70ab6009c8c 18/02/11 12:07:20 INFO memory.MemoryStore: MemoryStore started with capacity 117.0 MB 18/02/11 12:07:21 INFO spark.SparkEnv: Registering OutputCommitCoordinator 18/02/11 12:07:21 INFO util.log: Logging initialized @6857ms 18/02/11 12:07:22 INFO server.Server: jetty-9.3.z-SNAPSHOT 18/02/11 12:07:22 INFO server.Server: Started @7250ms 18/02/11 12:07:22 INFO server.AbstractConnector: Started ServerConnector@6f884ddb{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 18/02/11 12:07:22 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@324dcd31{/jobs,null,AVAILABLE,@Spark} 18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1804f60d{/jobs/json,null,AVAILABLE,@Spark} 18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@547e29a4{/jobs/job,null,AVAILABLE,@Spark} 18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1b39fd82{/jobs/job/json,null,AVAILABLE,@Spark} 18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@21680803{/stages,null,AVAILABLE,@Spark} 18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@c8b96ec{/stages/json,null,AVAILABLE,@Spark} 18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2d8f2f3a{/stages/stage,null,AVAILABLE,@Spark} 18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@58a55449{/stages/stage/json,null,AVAILABLE,@Spark} 18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6e0ff644{/stages/pool,null,AVAILABLE,@Spark} 18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2a2bb0eb{/stages/pool/json,null,AVAILABLE,@Spark} 18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2d0566ba{/storage,null,AVAILABLE,@Spark} 18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7728643a{/storage/json,null,AVAILABLE,@Spark} 18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5167268{/storage/rdd,null,AVAILABLE,@Spark} 18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@28c0b664{/storage/rdd/json,null,AVAILABLE,@Spark} 18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1af7f54a{/environment,null,AVAILABLE,@Spark} 18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@436390f4{/environment/json,null,AVAILABLE,@Spark} 18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@68ed96ca{/executors,null,AVAILABLE,@Spark} 18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3228d990{/executors/json,null,AVAILABLE,@Spark} 18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@50b8ae8d{/executors/threadDump,null,AVAILABLE,@Spark} 18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@51c929ae{/executors/threadDump/json,null,AVAILABLE,@Spark} 18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@29d2d081{/static,null,AVAILABLE,@Spark} 18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@28a2a3e7{/,null,AVAILABLE,@Spark} 18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@10b3df93{/api,null,AVAILABLE,@Spark} 18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3c321bdb{/jobs/job/kill,null,AVAILABLE,@Spark} 18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3abd581e{/stages/stage/kill,null,AVAILABLE,@Spark} 18/02/11 12:07:22 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.1.111:4040 18/02/11 12:07:22 INFO spark.SparkContext: Added JAR file:/home/elon/jars/examples-1.0-SNAPSHOT.jar at spark://192.168.1.111:45560/jars/examples-1.0-SNAPSHOT.jar with timestamp 1518322042715 18/02/11 12:07:25 INFO client.RMProxy: Connecting to ResourceManager at hadoop1/192.168.1.111:8032 18/02/11 12:07:26 INFO yarn.Client: Requesting a new application from cluster with 4 NodeManagers 18/02/11 12:07:26 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 18/02/11 12:07:26 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 18/02/11 12:07:26 INFO yarn.Client: Setting up container launch context for our AM 18/02/11 12:07:26 INFO yarn.Client: Setting up the launch environment for our AM container 18/02/11 12:07:26 INFO yarn.Client: Preparing resources for our AM container 18/02/11 12:07:30 INFO yarn.Client: Source and destination file systems are the same. Not copying hdfs:/home/elon/spark/spark-libs.jar 18/02/11 12:07:31 INFO yarn.Client: Uploading resource file:/tmp/spark-3b26c620-946b-4efe-a60b-d101e32ec42a/__spark_conf__7401771411523449275.zip -> hdfs://hadoop1:8020/user/elon/.sparkStaging/application_1518316627470_0003/__spark_conf__.zip 18/02/11 12:07:32 INFO spark.SecurityManager: Changing view acls to: elon 18/02/11 12:07:32 INFO spark.SecurityManager: Changing modify acls to: elon 18/02/11 12:07:32 INFO spark.SecurityManager: Changing view acls groups to: 18/02/11 12:07:32 INFO spark.SecurityManager: Changing modify acls groups to: 18/02/11 12:07:32 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(elon); groups with view permissions: Set(); users with modify permissions: Set(elon); groups with modify permissions: Set() 18/02/11 12:07:32 INFO yarn.Client: Submitting application application_1518316627470_0003 to ResourceManager 18/02/11 12:07:32 INFO impl.YarnClientImpl: Submitted application application_1518316627470_0003 18/02/11 12:07:32 INFO cluster.SchedulerExtensionServices: Starting Yarn extension services with app application_1518316627470_0003 and attemptId None 18/02/11 12:07:33 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED) 18/02/11 12:07:33 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1518322052230 final status: UNDEFINED tracking URL: http://hadoop1:8088/proxy/application_1518316627470_0003/ user: elon 18/02/11 12:07:34 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED) 18/02/11 12:07:35 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED) 18/02/11 12:07:36 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED) 18/02/11 12:07:37 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED) 18/02/11 12:07:38 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED) 18/02/11 12:07:39 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED) 18/02/11 12:07:40 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED) 18/02/11 12:07:41 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED) 18/02/11 12:07:42 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED) 18/02/11 12:07:43 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM) 18/02/11 12:07:43 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> hadoop1, PROXY_URI_BASES -> http://hadoop1:8088/proxy/application_1518316627470_0003), /proxy/application_1518316627470_0003 18/02/11 12:07:43 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 18/02/11 12:07:43 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED) 18/02/11 12:07:44 INFO yarn.Client: Application report for application_1518316627470_0003 (state: RUNNING) 18/02/11 12:07:44 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: 192.168.1.113 ApplicationMaster RPC port: 0 queue: default start time: 1518322052230 final status: UNDEFINED tracking URL: http://hadoop1:8088/proxy/application_1518316627470_0003/ user: elon 18/02/11 12:07:44 INFO cluster.YarnClientSchedulerBackend: Application application_1518316627470_0003 has started running. 18/02/11 12:07:44 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38368. 18/02/11 12:07:44 INFO netty.NettyBlockTransferService: Server created on 192.168.1.111:38368 18/02/11 12:07:44 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 18/02/11 12:07:44 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.111, 38368, None) 18/02/11 12:07:44 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.1.111:38368 with 117.0 MB RAM, BlockManagerId(driver, 192.168.1.111, 38368, None) 18/02/11 12:07:44 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.111, 38368, None) 18/02/11 12:07:44 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.111, 38368, None) 18/02/11 12:07:45 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5db3d57c{/metrics/json,null,AVAILABLE,@Spark} 18/02/11 12:07:46 INFO scheduler.EventLoggingListener: Logging events to file:/tmp/spark-events/application_1518316627470_0003 18/02/11 12:07:53 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms) 18/02/11 12:07:54 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 290.1 KB, free 116.7 MB) 18/02/11 12:07:54 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.7 KB, free 116.7 MB) 18/02/11 12:07:54 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.111:38368 (size: 23.7 KB, free: 116.9 MB) 18/02/11 12:07:54 INFO spark.SparkContext: Created broadcast 0 from textFile at WordCount.scala:22 18/02/11 12:07:54 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.1.113:35724) with ID 1 18/02/11 12:07:55 INFO storage.BlockManagerMasterEndpoint: Registering block manager hadoop3:33799 with 117.0 MB RAM, BlockManagerId(1, hadoop3, 33799, None) 18/02/11 12:07:55 INFO mapred.FileInputFormat: Total input paths to process : 1 18/02/11 12:07:58 INFO spark.SparkContext: Starting job: take at WordCount.scala:26 18/02/11 12:08:02 INFO scheduler.DAGScheduler: Registering RDD 3 (map at WordCount.scala:24) 18/02/11 12:08:02 INFO scheduler.DAGScheduler: Got job 0 (take at WordCount.scala:26) with 1 output partitions 18/02/11 12:08:02 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (take at WordCount.scala:26) 18/02/11 12:08:02 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0) 18/02/11 12:08:02 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0) 18/02/11 12:08:02 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount.scala:24), which has no missing parents 18/02/11 12:08:03 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.8 KB, free 116.7 MB) 18/02/11 12:08:03 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.8 KB, free 116.6 MB) 18/02/11 12:08:03 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.1.111:38368 (size: 2.8 KB, free: 116.9 MB) 18/02/11 12:08:03 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006 18/02/11 12:08:04 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount.scala:24) (first 15 tasks are for partitions Vector(0, 1)) 18/02/11 12:08:04 INFO cluster.YarnScheduler: Adding task set 0.0 with 2 tasks 18/02/11 12:08:05 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, hadoop3, executor 1, partition 0, PROCESS_LOCAL, 4856 bytes) 18/02/11 12:08:07 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on hadoop3:33799 (size: 2.8 KB, free: 117.0 MB) 18/02/11 12:08:08 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop3:33799 (size: 23.7 KB, free: 116.9 MB) 18/02/11 12:08:11 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, hadoop3, executor 1, partition 1, PROCESS_LOCAL, 4856 bytes) 18/02/11 12:08:11 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 7027 ms on hadoop3 (executor 1) (1/2) 18/02/11 12:08:12 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 446 ms on hadoop3 (executor 1) (2/2) 18/02/11 12:08:12 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 18/02/11 12:08:12 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (map at WordCount.scala:24) finished in 7.435 s 18/02/11 12:08:12 INFO scheduler.DAGScheduler: looking for newly runnable stages 18/02/11 12:08:12 INFO scheduler.DAGScheduler: running: Set() 18/02/11 12:08:12 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1) 18/02/11 12:08:12 INFO scheduler.DAGScheduler: failed: Set() 18/02/11 12:08:12 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCount.scala:24), which has no missing parents 18/02/11 12:08:12 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.2 KB, free 116.6 MB) 18/02/11 12:08:12 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2013.0 B, free 116.6 MB) 18/02/11 12:08:12 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.1.111:38368 (size: 2013.0 B, free: 116.9 MB) 18/02/11 12:08:12 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006 18/02/11 12:08:12 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCount.scala:24) (first 15 tasks are for partitions Vector(0)) 18/02/11 12:08:12 INFO cluster.YarnScheduler: Adding task set 1.0 with 1 tasks 18/02/11 12:08:12 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, hadoop3, executor 1, partition 0, NODE_LOCAL, 4632 bytes) 18/02/11 12:08:12 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on hadoop3:33799 (size: 2013.0 B, free: 116.9 MB) 18/02/11 12:08:12 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 192.168.1.113:35724 18/02/11 12:08:12 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 150 bytes 18/02/11 12:08:12 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 425 ms on hadoop3 (executor 1) (1/1) 18/02/11 12:08:12 INFO scheduler.DAGScheduler: ResultStage 1 (take at WordCount.scala:26) finished in 0.427 s 18/02/11 12:08:12 INFO cluster.YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool 18/02/11 12:08:12 INFO scheduler.DAGScheduler: Job 0 finished: take at WordCount.scala:26, took 14.817908 s (package,1) (this,1) (Version"](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version),1) (Because,1) (Python,2) (page](http://spark.apache.org/documentation.html).,1) (cluster.,1) (its,1) ([run,1) (general,3) (have,1) (pre-built,1) (YARN,,1) (locally,2) (changed,1) (locally.,1) (sc.parallelize(1,1) (only,1) (several,1) (This,2) 18/02/11 12:08:12 INFO spark.SparkContext: Invoking stop() from shutdown hook 18/02/11 12:08:13 INFO server.AbstractConnector: Stopped Spark@6f884ddb{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 18/02/11 12:08:13 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.1.111:4040 18/02/11 12:08:13 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on hadoop3:33799 in memory (size: 2013.0 B, free: 116.9 MB) 18/02/11 12:08:13 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on 192.168.1.111:38368 in memory (size: 2013.0 B, free: 116.9 MB) 18/02/11 12:08:13 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread 18/02/11 12:08:13 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors 18/02/11 12:08:13 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down 18/02/11 12:08:13 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices (serviceOption=None, services=List(), started=false) 18/02/11 12:08:13 INFO cluster.YarnClientSchedulerBackend: Stopped 18/02/11 12:08:13 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 18/02/11 12:08:13 INFO memory.MemoryStore: MemoryStore cleared 18/02/11 12:08:13 INFO storage.BlockManager: BlockManager stopped 18/02/11 12:08:13 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 18/02/11 12:08:13 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 18/02/11 12:08:13 INFO spark.SparkContext: Successfully stopped SparkContext 18/02/11 12:08:13 INFO util.ShutdownHookManager: Shutdown hook called 18/02/11 12:08:13 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-3b26c620-946b-4efe-a60b-d101e32ec42a
<