2020 6.824 的 Raft Lab 2C

程序员文章站 2022-03-15 17:02:02

...

前言

做2020的MIT6.824，完成了实验Raft Lab2C，通过了测试，对于之前的两个实验请参考2020 6.824 的 Raft Lab 2A 以及2020 6.824 的 Raft Lab 2B

这个实验的重点在于实现persist()以及做优化。总的来说代码量比Lab2B要少，又沿用Lab2A已经搭好的框架，所以整体会比前两个实验简单一些

这个实验的坑是TestFigure8Unreliable2C，这里有相关的讨论可以根据自己的情况参考一下，总结有两点如下

AppendEntries实现有误
Time interval 的问题

下面有这个链接对我的实验测试很有帮助，主要是为了多测测试，保证没有因为概率通过而miss掉的一些测试用例
3. 并行运行测试的shellscript

##每20个test并行运行，运行100次2B的test
bash test_many.sh 100 20 2C

一、Raft2C

实验可以分为两个部分，首先是实现persist()以及做优化

二、Persist()

2.1 实现

Lab已经有部分参考的实验代码了，结合paper的Figure 2就可以写出来了

2020 6.824 的 Raft Lab 2C
最终实现如下

func (rf *Raft) persist() {
	// Your code here (2C).
	w := new(bytes.Buffer)
	e := labgob.NewEncoder(w)
	e.Encode(rf.currentTerm)
	e.Encode(rf.votedFor)
	e.Encode(rf.log)
	data := w.Bytes()
	rf.persister.SaveRaftState(data)
}

func (rf *Raft) readPersist(data []byte) {
	if data == nil || len(data) < 1 { // bootstrap without any state?
		return
	}
	// Your code here (2C).
	r := bytes.NewBuffer(data)
	d := labgob.NewDecoder(r)
	rf.mu.Lock()
	defer rf.mu.Unlock()
	d.Decode(&rf.currentTerm)
	d.Decode(&rf.votedFor)
	d.Decode(&rf.log)
}

2.2 调用

接下来就要分析什么时候需要调用persist函数了，其实就是一旦修改了上面持久化的三个值currentTerm, votedFor, log 就需要调用一次persist函数

那么对于我的implementation来说就是在convertToFollower，convertToCandidate以及convertToLeader的时候

func (rf *Raft) convertToFollower(newTerm int) {
	defer rf.persist()
	...
}

func (rf *Raft) convertToCandidate() {
	defer rf.persist()
	...
}

func (rf *Raft) convertToLeader() {
	defer rf.persist()
	...
}

在完成保存日志的时候

// 保存日志
	for i, logEntry := range args.Entries {
		...
	}
	rf.persist()

在调用RequestVote的修改votedFor时候

if (rf.votedFor == -1 || rf.votedFor == args.CandidateId) && rf.isLogMoreUpToDate(args.LastLogIndex, args.LastLogTerm) {
	rf.votedFor = args.CandidateId
	reply.VoteGranted = true
	rf.persist()
}

三、优化

一共有三点 对应实现就ok了

If a follower does not have prevLogIndex in its log, it should return with conflictIndex = len(log) and conflictTerm = None.

这一点Lab2B已经实现了

if len(rf.log)-1 < args.PrevLogIndex {
	conflictIndex = len(rf.log)
	goto label1
}

If a follower does have prevLogIndex in its log, but the term does not match, it should return conflictTerm = log[prevLogIndex].Term, and then search its log for the first index whose entry has term equal to conflictTerm.

翻译一下就是如果有index一直但是term不一致的情况，让conflictTerm等于follower对应prevLogIndex的term，然后找出conflictTerm第一次出现的位置（看不懂的话可以看看下面我举的例子）

if args.PrevLogIndex > 0 && rf.log[args.PrevLogIndex].Term != args.PrevLogTerm {
	conflictTerm = rf.log[args.PrevLogIndex].Term
	for i := 1; i <= args.PrevLogIndex; i++ {
		if rf.log[i].Term == conflictTerm {
			conflictIndex = i
			break
		}
	}
	goto label1
}

Upon receiving a conflict response, the leader should first search its log for conflictTerm. If it finds an entry in its log with that term, it should set nextIndex to be the one beyond the index of the last entry in that term in its log. If it does not find an entry with that term, it should set nextIndex = conflictIndex.

翻译一下就是当Leader收到Reply=false的时候

Leader找出conflictTerm对应最后出现的位置+1
如果找不到，把nextIndex设为conflictIndex
（看不懂的话可以看看下面我举的例子）

Example
2020 6.824 的 Raft Lab 2C
Follower e：Leader跟e在AppendEntries通讯之后，f会把Reply设为false，ConflictTerm设为4，ConflictIndex设为4(conflictTerm第一次出现的位置)。那么对于Lerder收到Reply=false之后，会找Term为4的最后一个index，也就是5，对应1中conflictTerm对应最后出现的位置+1，于是把nextIndex[e]设为5+1=6，再发送AppendEntries

Follower f：Leader跟f在AppendEntries通讯之后，f会把Reply设为false，ConflictTerm设为3，ConflictIndex设为7(conflictTerm第一次出现的位置)。那么对于Lerder收到Reply=false之后，会找Term为3的最后一个Index，没有找到于是把nextIndex[f]设为7，再发送AppendEntries

实现如下

if reply.Success == true {
	...
} else {
	if reply.ConflictTerm < 0 {
		rf.nextIndex[p] = reply.ConflictIndex
	} else {
		//Upon receiving a conflict response, the leader should first search its log for conflictTerm.
		lastEntryOfConflictTerm := -1
		for entryIndex := args.PrevLogIndex; entryIndex > 0; entryIndex-- {
			if rf.log[entryIndex].Term == reply.ConflictTerm {
				lastEntryOfConflictTerm = entryIndex
				break
			}
		}
		//If it does not find an entry with that term, it should set nextIndex = conflictIndex
		if lastEntryOfConflictTerm == -1 {
			rf.nextIndex[p] = reply.ConflictIndex
		} else {
			//If it finds an entry in its log with that term, it should set nextIndex to be the one beyond the index of the last entry in that term in its log.
			rf.nextIndex[p] = lastEntryOfConflictTerm + 1
		}
	}
}

总结

至此，Raft的Lab2就全部实现完了，回过头来思考就是要紧跟Paper实现，多使用DPrintf进行调试，AppendEntries有很多需要调试的地方，包括优化的部分，包括log保存的部分。实现一次还是很enjoy的，have fun

2020 6.824 的 Raft Lab 2C

目录

前言

一、Raft2C

二、Persist()

2.1 实现

2.2 调用

三、优化

总结

MIT 6.824 2020 分布式系统 Lab1 Mapreduce 实验思路和代码分析

6.824(2020年) Lab1 MapReduce

MIT 6.824 Lab2 Raft实现总结

mit6.824 lab2 RAFT node persist 要点记录

6.824 Lab 2: Raft

2020 6.824 的 Raft Lab 4A

MIT 6.824 lab2 Raft

6.824（2020春） Lab1：MapReduce

MIT 6.824 Lab1 MapReduce实现 2020 Spring

2020 6.824 的 Raft Lab 2C