提交第一个spark程序,以及中间遇到的坑
首先启动zookeeper集群(前面4天偷懒没搞,刚开始忘记启zookeeper= =!)
./zkmanage.sh start/storp
启动spark集群
./sbin/start-all.sh
hdp02上启动master
./sbin/start-master.sh
运行语句
bin/spark-submit --master spark://hdp01:7077 --class org.apache.spark.examples.SparkPi examples/jars/spark-examples_2.11-2.3.3.jar 100
错误情况1,job一直在running状态,前台提示
TaskSchedulerImpl:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
html管理界面,job的core是0,推测,默认方法不给cup核数导致,当时老师的语句默认不给资源入参,系统会给默认值,但是我的用新版本应该是不支持这样使用,导致的cup为0的情况
修改运行语句后
bin/spark-submit --master spark://hdp01:7077 --class org.apache.spark.examples.SparkPi --executor-memory 512mb --total-executor-cores 5 examples/jars/spark-examples_2.11-2.3.3.jar 20000
这次指定了CUP和内存的资源调度,但是运行中遇到了新的异常
ERROR TransportRequestHandler: Error while invoking RpcHandler#receive() for one-way message. org.apache.spark.SparkException: Could not find CoarseGrainedScheduler. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:157) at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:137) at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:647) at org.apache.spark.network.server.TransportRequestHandler.processOneWayMessa
经过百度,提示需要 增加 –conf spark.dynamicAllocation.enabled=false 的配置 设置 动态资源分配的
ps:百度的例子是spark运行在yarn平台上,之前的课程中,老师提到过spark是先在内存中运算,然后如果内存不够在放在磁盘上,最后在统一聚合求解,但是我这里只是单机实例,个人推荐是!!!!自己的主机内存太小的原由。(上个月还特意新买了8G内存升级过…我的天)
解决办法,把计算圆周率 的次数降低,改成200次,完美运行。问题解决,哦也~
bin/spark-submit --master spark://hdp01:7077 --class org.apache.spark.examples.SparkPi --executor-memory 512mb --total-executor-cores 5 examples/jars/spark-examples_2.11-2.3.3.jar 200