欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

菜鸟的Spark 源码学习之路 -4 DAGScheduler源码 - part3

程序员文章站 2022-07-14 11:12:57
...

上文中,DAGScheduler-02我们了解到DagScheduler的Job提交和管理。接下来我们看一下DAGSCheduler中的一个重要的组件:DAGSchedulerEventProcessLoop,它是处理消息的组件。之前我们看到,Job和task等的提交、管理过程很多都是调用该组件的post方法发送一个event。

我们先看一下DAGScheduler 里面它的定义和初始化:

private[scheduler] val eventProcessLoop = new DAGSchedulerEventProcessLoop(this)

看一下DAGSchedulerEventProcessLoop的定义,它和DAGScheduler在同一个个文件中,传入一个DAGScheduler作为参数。

菜鸟的Spark 源码学习之路 -4 DAGScheduler源码 - part3

结构也比较简洁,主要是以下几个方法:

菜鸟的Spark 源码学习之路 -4 DAGScheduler源码 - part3

看下他的实现的接口EventLoop,功能就一目了然了:

菜鸟的Spark 源码学习之路 -4 DAGScheduler源码 - part3

接收和处理调用者发送的消息。它会启动一个独立的线程去处理所有的消息。

Event队列理论上是可以无限增长的,所以子类必须保证能够及时地处理消息避免可能的OOM。

看一下事件处理的入口方法,它负责接收事件:

菜鸟的Spark 源码学习之路 -4 DAGScheduler源码 - part3

调用了doOnReceive()方法:

private def doOnReceive(event: DAGSchedulerEvent): Unit = event match {
  case JobSubmitted(jobId, rdd, func, partitions, callSite, listener, properties) =>
    dagScheduler.handleJobSubmitted(jobId, rdd, func, partitions, callSite, listener, properties)

  case MapStageSubmitted(jobId, dependency, callSite, listener, properties) =>
    dagScheduler.handleMapStageSubmitted(jobId, dependency, callSite, listener, properties)

  case StageCancelled(stageId, reason) =>
    dagScheduler.handleStageCancellation(stageId, reason)

  case JobCancelled(jobId, reason) =>
    dagScheduler.handleJobCancellation(jobId, reason)

  case JobGroupCancelled(groupId) =>
    dagScheduler.handleJobGroupCancelled(groupId)

  case AllJobsCancelled =>
    dagScheduler.doCancelAllJobs()

  case ExecutorAdded(execId, host) =>
    dagScheduler.handleExecutorAdded(execId, host)

  case ExecutorLost(execId, reason) =>
    val workerLost = reason match {
      case SlaveLost(_, true) => true
      case _ => false
    }
    dagScheduler.handleExecutorLost(execId, workerLost)

  case WorkerRemoved(workerId, host, message) =>
    dagScheduler.handleWorkerRemoved(workerId, host, message)

  case BeginEvent(task, taskInfo) =>
    dagScheduler.handleBeginEvent(task, taskInfo)

  case SpeculativeTaskSubmitted(task) =>
    dagScheduler.handleSpeculativeTaskSubmitted(task)

  case GettingResultEvent(taskInfo) =>
    dagScheduler.handleGetTaskResult(taskInfo)

  case completion: CompletionEvent =>
    dagScheduler.handleTaskCompletion(completion)

  case TaskSetFailed(taskSet, reason, exception) =>
    dagScheduler.handleTaskSetFailed(taskSet, reason, exception)

  case ResubmitFailedStages =>
    dagScheduler.resubmitFailedStages()
}

这个方法负责处理各种事件,通过模式匹配,调用不同的方法来处理相应的事件。

def start(): Unit = {
  if (stopped.get) {
    throw new IllegalStateException(name + " has already been stopped")
  }
  // Call onStart before starting the event thread to make sure it happens before onReceive
  onStart()
  eventThread.start()
}

DagScheduler中最后调用了start方法启动事件处理:

菜鸟的Spark 源码学习之路 -4 DAGScheduler源码 - part3

start方法中调用onstart()通知事件处理器DAGScheduler启动了事件处理,并启动了一个线程去做事件处理。

菜鸟的Spark 源码学习之路 -4 DAGScheduler源码 - part3

接下来我们回到doOnReceive()中看一下不同的事件是如何处理的。以提交job事件为例:

case JobSubmitted(jobId, rdd, func, partitions, callSite, listener, properties) =>
  dagScheduler.handleJobSubmitted(jobId, rdd, func, partitions, callSite, listener, properties)

这里调用了dagScheduler.handleJobSubmitted去处理job提交事件。

private[scheduler] def handleJobSubmitted(jobId: Int,
    finalRDD: RDD[_],
    func: (TaskContext, Iterator[_]) => _,
    partitions: Array[Int],
    callSite: CallSite,
    listener: JobListener,
    properties: Properties) {
// 构建最后的stage
  var finalStage: ResultStage = null
  try {
    // New stage creation may throw an exception if, for example, jobs are run on a
    // HadoopRDD whose underlying HDFS files have been deleted.
    finalStage = createResultStage(finalRDD, func, partitions, jobId, callSite)
  } catch {
    case e: Exception =>
      logWarning("Creating new stage failed due to exception - job: " + jobId, e)
      listener.jobFailed(e)
      return
  }

  val job = new ActiveJob(jobId, finalStage, callSite, listener, properties)
  clearCacheLocs()
  logInfo("Got job %s (%s) with %d output partitions".format(
    job.jobId, callSite.shortForm, partitions.length))
  logInfo("Final stage: " + finalStage + " (" + finalStage.name + ")")
  logInfo("Parents of final stage: " + finalStage.parents)
  logInfo("Missing parents: " + getMissingParentStages(finalStage))

  val jobSubmissionTime = clock.getTimeMillis()
  jobIdToActiveJob(jobId) = job
  activeJobs += job
  finalStage.setActiveJob(job)
  val stageIds = jobIdToStageIds(jobId).toArray
  val stageInfos = stageIds.flatMap(id => stageIdToStage.get(id).map(_.latestInfo))
  listenerBus.post(
    SparkListenerJobStart(job.jobId, jobSubmissionTime, stageInfos, properties))
  submitStage(finalStage)
}

这里,stage的构建过程是从最后一个stage开始反向构建其ParentStage,主要是调用createResultStage方法

/**
 * Create a ResultStage associated with the provided jobId.
 */
private def createResultStage(
    rdd: RDD[_],
    func: (TaskContext, Iterator[_]) => _,
    partitions: Array[Int],
    jobId: Int,
    callSite: CallSite): ResultStage = {
  // 获取或创建其ParentStages
  val parents = getOrCreateParentStages(rdd, jobId)
  val id = nextStageId.getAndIncrement()
  val stage = new ResultStage(id, rdd, func, partitions, parents, jobId, callSite)
  //更新相应的数据结构
  stageIdToStage(id) = stage
  updateJobIdStageIdMaps(jobId, stage)
  stage
}

下面是getOrCreateParentStages的过程

/**
 * Get or create the list of parent stages for a given RDD.  The new Stages will be created with
 * the provided firstJobId.
 */
private def getOrCreateParentStages(rdd: RDD[_], firstJobId: Int): List[Stage] = {
  getShuffleDependencies(rdd).map { shuffleDep =>
    getOrCreateShuffleMapStage(shuffleDep, firstJobId)
  }.toList
}

这里主要涉及到shuffle依赖解析,划分stage的过程.

/**
 * Returns shuffle dependencies that are immediate parents of the given RDD.
 //这里只会返回直接关联的父依赖
 * This function will not return more distant ancestors.  For example, if C has a shuffle
 * dependency on B which has a shuffle dependency on A:
 *
 * A <-- B <-- C
 *
 * calling this function with rdd C will only return the B <-- C dependency.
 *
 * This function is scheduler-visible for the purpose of unit testing.
 */
private[scheduler] def getShuffleDependencies(
    rdd: RDD[_]): HashSet[ShuffleDependency[_, _, _]] = {
  val parents = new HashSet[ShuffleDependency[_, _, _]]
  val visited = new HashSet[RDD[_]]
  val waitingForVisit = new ArrayStack[RDD[_]]
  waitingForVisit.push(rdd)
  while (waitingForVisit.nonEmpty) {
    val toVisit = waitingForVisit.pop()
    if (!visited(toVisit)) {
      visited += toVisit
      toVisit.dependencies.foreach {
        case shuffleDep: ShuffleDependency[_, _, _] =>
          parents += shuffleDep
        case dependency =>
          waitingForVisit.push(dependency.rdd)
      }
    }
  }
  parents
}

对每一个父依赖,生成stage:

/**
 * Gets a shuffle map stage if one exists in shuffleIdToMapStage. Otherwise, if the
 * shuffle map stage doesn't already exist, this method will create the shuffle map stage in
 * addition to any missing ancestor shuffle map stages.
  如果存在 stage 已经在shuffleIdToMapStage,则返回,如果没有,则创建一个
 */
private def getOrCreateShuffleMapStage(
    shuffleDep: ShuffleDependency[_, _, _],
    firstJobId: Int): ShuffleMapStage = {
  shuffleIdToMapStage.get(shuffleDep.shuffleId) match {
    case Some(stage) =>
      stage

    case None =>
      // Create stages for all missing ancestor shuffle dependencies.
      getMissingAncestorShuffleDependencies(shuffleDep.rdd).foreach { dep =>
        // Even though getMissingAncestorShuffleDependencies only returns shuffle dependencies
        // that were not already in shuffleIdToMapStage, it's possible that by the time we
        // get to a particular dependency in the foreach loop, it's been added to
        // shuffleIdToMapStage by the stage creation process for an earlier dependency. See
        // SPARK-13902 for more information.
        if (!shuffleIdToMapStage.contains(dep.shuffleId)) {
          createShuffleMapStage(dep, firstJobId)
        }
      }
      // Finally, create a stage for the given shuffle dependency.
      createShuffleMapStage(shuffleDep, firstJobId)
  }
}

首先尝试查找未注册的祖先stage:

/** Find ancestor shuffle dependencies that are not registered in shuffleToMapStage yet */
private def getMissingAncestorShuffleDependencies(
    rdd: RDD[_]): ArrayStack[ShuffleDependency[_, _, _]] = {
  val ancestors = new ArrayStack[ShuffleDependency[_, _, _]]
  val visited = new HashSet[RDD[_]]
  // We are manually maintaining a stack here to prevent *Error
  // caused by recursively visiting
  val waitingForVisit = new ArrayStack[RDD[_]]
  waitingForVisit.push(rdd)
  while (waitingForVisit.nonEmpty) {
    val toVisit = waitingForVisit.pop()
    if (!visited(toVisit)) {
      visited += toVisit
      //这里又回到了依赖解析,相当于循环调用
      getShuffleDependencies(toVisit).foreach { shuffleDep =>
        if (!shuffleIdToMapStage.contains(shuffleDep.shuffleId)) {
          ancestors.push(shuffleDep)
          waitingForVisit.push(shuffleDep.rdd)
        } // Otherwise, the dependency and its ancestors have already been registered.
      }
    }
  }
  ancestors
}

创建shuffle map stage:

/**
 * Creates a ShuffleMapStage that generates the given shuffle dependency's partitions. If a
 * previously run stage generated the same shuffle data, this function will copy the output
 * locations that are still available from the previous shuffle to avoid unnecessarily
 * regenerating data.
 */
def createShuffleMapStage(shuffleDep: ShuffleDependency[_, _, _], jobId: Int): ShuffleMapStage = {
  val rdd = shuffleDep.rdd
  val numTasks = rdd.partitions.length
  // 这里又开始了构建parent staged 的过程
  val parents = getOrCreateParentStages(rdd, jobId)
  val id = nextStageId.getAndIncrement()
  val stage = new ShuffleMapStage(
    id, rdd, numTasks, parents, jobId, rdd.creationSite, shuffleDep, mapOutputTracker)

  stageIdToStage(id) = stage
  shuffleIdToMapStage(shuffleDep.shuffleId) = stage
  updateJobIdStageIdMaps(jobId, stage)

  if (!mapOutputTracker.containsShuffle(shuffleDep.shuffleId)) {
    // Kind of ugly: need to register RDDs with the cache and map output tracker here
    // since we can't do it in the RDD constructor because # of partitions is unknown
    logInfo("Registering RDD " + rdd.id + " (" + rdd.getCreationSite + ")")
    mapOutputTracker.registerShuffle(shuffleDep.shuffleId, rdd.partitions.length)
  }
  stage
}

期间包括注册shuffle过程,更新一些状态等操作。

到这里,job提交和stage 解析生成的过程都比较清楚了。

我们最后再来梳理一下:

DagScheduler内部都是通过event传递来触发操作。

创建之初 Dagscheduler会启动eventloop监听器,由单独的事件监听进程来处理事件。

Job等的提交都是向eventloop发送了一个事件,本质是调用eventloop.post方法向事件队列中添加一个事件。 事件处理线程轮询队列中的事件时,会调用OnReceive()方法开始处理事件。通过doOnReceive()方法对事件进行模式匹配并分发出去进行处理。具体的处理根据事件的不同会有所区别。

至此,DAGScheduler的消息处理过程的探索暂时告一段落,对DagScheduler的内部结构和 操作细节的了解暂时结束。下次我们会将目光转移到任务的执行端Executor上去。