Raft 8: Log Replication, Part 2

Reading Time: 10 minutes

In December, I took a course in which we attempted to implement the Raft algorithm from this paper. Parts 1-5 of this series share insights from the course. From now on, I’m guiding you through my continued work implementing Raft “for fun” (I know. I don’t understand me, either).

Here’s where you can see all the posts so far.

two_rafts

In the previous post, we reinstated a rudimentary form of log replication after mercilessly breaking it while adding the heartbeat call (not because the code was fragile, but out of necessity. How dare you come for my baby like that). However, followers still aren’t checking to make sure their existing logs are up to date before adding whatever the leader sent. That has to happen if we want to have confidence that our system produces identical logs even if two servers get disconnected for a while. So let’s implement that.

I want the implementation to be able to pass, at a minimum, the following set of manual tests:

  1. Fire up a leader and a follower server, with the leader having a log with multiple commands and the follower having an empty log.
  2. Check that the leader successfully replicates its logs on the follower.
  3. Without shutting down either server, issue some write commands to the leader.
  4. Check that the new write commands are accurately replicated on the follower log.
  5. Shut down both servers.
  6. Delete some number of commands from the tail of the follower’s log. Replace them with commands that are different from the leader logs, or at least have different index-term pairs at the front.
  7. Fire up the leader again.
  8. Check that the leader successfully replaces the bogus logs on the follower with the correct logs.
  9. Without shutting down either server, issue some write commands to the leader.
  10. Check that the new write commands are accurately replicated on the follower log.

So how am I going to do this?

The Raft paper says:

The leader keeps track of the highest index it knows to be committed, and it includes that index in future AppendEntries RPCs (including heartbeats) so that the other servers eventually find out. 

In a previous post, I suggested that I thought we might be able to avoid adding this index to the log by just assuming that whichever line a command appeared on in the context of the entire log as an ordered list would count as the index. The thing I wasn’t thinking about at the time was that:

  1. I store the state as a dictionary. I do this so clients can ask for values by their keys and the server can look them up lightning-fast. To store them in a list would mean doing something with string parsing and filtering, which would be a) slow and b) no fun to write. BUT…
  2. Dictionaries are unordered collections. You can get a list of the keys and see which one is “first” in the list, but that isn’t guaranteed to correlate with which one you put in the dictionary first. FURTHERMORE…
  3. Dictionaries don’t tell you about change in their contents over time. If one log statement sets “a” to “apple” and a future log statement deletes a, then a third log statement sets “a” to “banana”, at the point immediately after any of those calls, the dictionary contains no evidence that any of the previous calls existed. Indexes in Raft exist to help us coordinate the state of the system over timeSo they can’t be a computed property based on the state of any given server right now. They have to live in the logs.

So I went ahead and added the index to the front of each log statement in this commit. I initialize the key_value_store​ with an instance attribute of highest_index, set to zero (though this changes in a later commit). Then:

Screen Shot 2020-08-17 at 1.49.42 PM

I add it to the front of incoming commands that haven’t been “marked” yet with their term, and each time the server writes to its logs it also increments that highest_index attribute.

Bless Python collection assignment that I can add a new element to the beginning of the string and a new index to the end with the confidence that everything referencing a named index will still work correctly.

 

Next, I need to include the index and term of the previous log entry in the AppendEntries call itself. A running theme in implementing raft is “The system needs to be able to heal itself if a server goes down and then comes back up,” so I cannot rely on any of a server’s state variables as a source of truth on the state of the system. All of it must be stored in the logs and communicated in the servers’ calls to each other.

So here’s the commit where I add the index to the AppendEntries call. I do another thing in this commit, which is to extract that call into its own object, with a method that gets it into string form and out of string form. This way, if any more changes happen to this call, I can change them in one spot. I have repeatedly been changing this call in two spots prior to now.

The new class:

Screen Shot 2020-08-17 at 1.56.08 PM

How it’s used:

This looks like pretty standard practice, right? WELL. I’ll tell you, in January when I put down the Raft implementation, I extracted the request-parsing and repsonse-generation into their own method in a different file from the server. When I came back to it this summer, one of my first commits undid that refactor. Why? “I can’t figure out what’s going on with all this indirection.” So we’ll see if this AppendEntries object survives first brush with someone who is trying to learn the code base. I’m starting to think many of the lessons we learn about refactoring aren’t as universal as they’re billed to be.

Then there’s the final piece: getting a follower’s state right when it’s catching up from its existing logs. So in this commit, we get the KeyValueStore to catch up its latest_term and highest_index attributes from the logs whenever a server starts up—otherwise it restarts from zero every time. I didn’t anticipate this, but it made itself known during manual testing 😂. Ze fix:

Screen Shot 2020-08-17 at 2.40.30 PM

Now that we have our index in place, things get interesting.

Let’s turn back to the Raft paper to see how followers are supposed to use this index to figure out if their logs are up to date or not.

When sending an AppendEntries RPC, the leader includes the index and term of the entry in its log that immediately precedes the new entries. If the follower does not find an entry in its log with the same index and term, then it refuses the new entries. The consistency check acts as an induction step: the initial empty state of the logs satisfies the Log Matching Property, and the consistency check preserves the Log Matching Property whenever logs are extended. As a result, whenever AppendEntries returns successfully, the leader knows that the follower’s log is identical to its own log up through the new entries.

Screen Shot 2020-07-21 at 11.33.46 AM

During normal operation, the logs of the leader and followers stay consistent, so the AppendEntries consistency check never fails. However, leader crashes can leave the logs inconsistent (the old leader may not have fully replicated all of the entries in its log). These inconsistencies can compound over a series of leader and follower crashes. Figure 7 illustrates the ways in which followers’ logs may differ from that of a new leader. A follower may be missing entries that are present on the leader, it may have extra entries that are not present on the leader, or both. Missing and extraneous entries in a log may span multiple terms.

So now our followers need to check, when they receive any AppendEntries call, that they possess a line in their logs whose index and term together match the one that any lines in the AppendEntries call would be appended after.

How the heck am I gonna make that a quick lookup?

Whelp, here’s what’s convenient: any given combination of index and term in the log should be unique (only one of them) and universal (all servers should have the same set of index-term combinations). So it should be safe to store these combinations as keys, and the subsequent commands as values, in a dictionary.

That happens in this commit. Here’s the juicy-juice:

Screen Shot 2020-08-17 at 2.38.16 PM

That code would represent this log:

1 0 set h 8
2 0 set i 9
3 0 set j 10

like so:

{ '1 0' : 'set h 8', '2 0': 'set i 9', '3 0': 'set j 10' }

We can then use that to check, when a follower server receives an AppendEntries call, whether the follower server has the last line of the log that the server has. If it does, we can go ahead and append whatever the leader has sent. If not, we need to inform the leader that we are not caught up.

Screen Shot 2020-08-17 at 2.43.30 PM

So now our follower server can successfully differentiate when it’s ready to accept an AppendEntries command and when it isn’t. Next, of course, comes the part where we catch it back up so that it becomes ready to accept those commands.

That’ll be the topic of our next Raft post.

 

And if you found this piece helpful…

You can help me keep writing by tossing a coin at this Patreon 🙂

If you liked this piece, you might also like:

This post on my entree into live coding (plus, how to get a transcript of a YouTube video)

The ongoing series about advanced debugging in XCode—for debugging enthusiasts, of course

The listening series—Look, I’m supposed to include three links in this “you might also like” section, so if I put one more link in this one I can be done, and the listening series is popular, so 🙂

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.