In December, I took a course in which we attempted to implement the Raft algorithm from this paper. Parts 1-5 of this series share insights from the course. From now on, I’m guiding you through my continued work implementing Raft “for fun” (I know. I don’t understand me, either).
In the previous post, we calmed down and started structuring our code.
Now, It’s time to get log replication to work in a more sophisticated fashion.
Because right now, there’s a glaring hole in it.
Suppose I start up two servers and I start a client to tell one of them to
set a 1,
set b 2. Then I start a different client to tell the other one
get b…just to make sure that those values aren’t set on the second server before I attempt to replicate logs from the first to the second server. Here’s what I’m going to get:
The first server thinks the second server is up to date because these two logs have the same length:
We can fix this particular issue by only persisting writes, not reads (this commit):
That will save us for now, provided that we never have to deal with a network partition.
Suppose, though, that we did have to deal with a network partition, such that two servers could each receive different
delete commands from clients. When the partition is eliminated, how would these servers adjudicate whose logs to keep?
It’s time to start talking about leaders, followers, and terms. Which means, for perhaps the first time in this series, looking at what the Raft paper actually says about this.
The Job of the Lead Server
Think of Raft as a sledding rig and the servers as sled dogs.
One dog acts as the leader. This dog is responsible for taking commands from the driver and guiding the other dogs to navigate the terrain. In raft, one server takes charge of receiving requests from clients and sending updated log information to the other servers. If a client tries to contact a server that is not the leader, that server responds by telling the client who is the leader.
Any of the servers could, theoretically, lead the pack. The one that is currently leading the pack needs a way to assert that it is still up and leading. It does that by sending periodic messages (referred to in the Raft paper as the “heartbeat”).
If another server waits a while and doesn’t hear from the leader, it sends out a message to all servers to call a leader election. This election will result in another server becoming the leader. It will also result in each online server incrementing its term, an integer that keeps track of how many times there has been a change in leadership.
Instead of determining which of two logs is more up-to-date by comparing their length, Raft prescribes comparing these term numbers. It says:
Raft determines which of two logs is more up-to-date by comparing the index and term of the last entries in the logs. If the logs have last entries with different terms, then the log with the later term is more up-to-date. If the logs end with the same term, then whichever log is longer is more up-to-date.
If the last term of one log is
4 set a 1 and the last term of another log is
5 set a 2, the one that starts with 5 is more up-to-date regardless of the lengths of the two logs.
In fact, we have already included a placeholder for the term in our log: the
0 recorded at the beginning of each command. At the moment, the term always remains 0. After all, our servers don’t ever undergo a change of leader: we start them all up, tell one of them that it’s the leader, and let it scramble to have a one-time conversation with each of the other servers to update their logs to match its own log.
Now, we need to get the lead server to send out a heartbeat to the other servers. (We’ll come back to elections later).
What is the heartbeat?
Until now in this series, I implemented servers to have whole conversations, like this:
Now that I’ve done all that work, it turns out that Raft prescribes its servers a much smaller set of social skills 😂.
Instead, it says servers should have a message called
AppendEntries. It doesn’t ask the other servers where their logs are; instead, it sends
AppendEntries with any new entries, and the responsibility falls upon each server to determine whether this is the only entry they’re missing or not.
In addition, instead of having a separate heartbeat message, leaders repurpose the
AppendEntries message, which they can send without any entries to append. It’s time for us to remove a lot of extraneous conversation skills from our server, in this commit.
Differentiating the Leader
Until now, I have launched five of the same server, then started up a client to tell one of the servers that it is the leader. That server would respond, issue a series of commands to update the logs of all the other servers, and then…that’s it. The system had no state. No server maintained ongoing leadership beyond doing one round of leader behavior in response to a client command.
Now, we need to introduce a new piece of state to each server: its role as a leader or a follower (in this commit):
Now, we need that leader to send out the heartbeat to tell the other servers that it is still up and leading.
So we create a method to send out the heartbeat.
As you can see, the method broadcasts the
append_entries message to all of the other servers with an empty collection of entries.
We start a five second timer when we listen for new connections, and we run our
prove_aliveness method every five seconds.
As of now, the server does not send
append_entries with entries in it anymore. For our next step, we’ll reimplement log replication to better assess where two servers’ logs diverge.
If you liked this piece, you might also like:
This post on my entree into live coding (plus, how to get a transcript of a YouTube video)
The series on reducing job interview anxiety—no relation to this post, but people seem to like this series.
This talk about the technology and psychology of refactoring—in which you’ll hear me explain some of what you see me do when I code.