Raft 14: The Grand Finale About the Barking Spider Circus

Reading Time: 17 minutes

You have watched me, over the course of 13 posts, implement the Raft distributed consensus algorithm from this paper. (By the way, here’s where you can read those 13 posts.)

I promised you a grand finale. Then I warned you to bring an espresso. Go ahead and pull those double-shots:

It is time.

Our story begins with a tweet:

I wanted to get to the bottom of what might be happening. I jotted down specific examples as I came across them in my implementation. I started to connect those examples to broader concepts.

I refined those notes into “A Further Note On Code Design” at the end of this piece. There, we encounter the role of context in software decisions.

What you need to know about Raft’s design:

The decisions that impacted my Raft implementation start long before I open an IDE, with Ongaro and Ousterhout’s design goals:

Our approach was unusual in that our primary goal was understandability: could we define a consensus algorithm for practical systems and describe it in a way that is significantly easier to learn than Paxos?

In Search of an Understandable Consensus Algorithm (Extended Version)
Diego Ongaro and John Ousterhout
Stanford University

The authors of the Raft paper make a big deal about Raft’s understandability relative to other distributed consensus algorithms (in particular, Paxos). Let’s explore how they did that and what we can learn about designing our own systems.

Before we do that, I have to address something.

I don’t endorse Raft’s claim about understandability.

The quote above reads like a claim that Raft is simple. That’s not actually the claim, but authors might regard that quote as precedent license to make such a claim.

I don’t appreciate technologists engaging in onanistic monologues how “simple” they find something to be. Calling something “simple” doesn’t make it easier to understand; it just signals to people who don’t understand it the first time that they should feel shame about that. I do not endorse running out and writing a paper that calls your own solution “understandable”. That’s not your call to make.

Folks who have read the Raft paper will point out (correctly) that the authors didn’t just claim this—they backed it up with a study. So lemme address that: the study lacks rigor. They showed 43 students two videos: one explaining Raft, the other explaining Paxos. After each video, they issued students a quiz, and the scores proxy the understandability of the algorithms. Some issues:

  1. Both the Raft and the Paxos video were made by the Raft team, who admit in the paper that they couldn’t figure out exactly how Paxos worked, whereas they literally wrote Raft. So the instructors’ preparedness to explain the two topics differed between the videos.
  2. They say Paxos had an “advantage” because the video was “14% longer.” I tell you what, “longer video” is one hell of a bad proxy for “better instructed.” Such a proxy accepts as given the effectiveness of an authoritarian teaching model that much of the past decade of education research has debunked.
  3. “Score on a quiz immediately after watching a video” is the proxy here for “understands the algorithm,” and in my view as a software engineer and an instructor of computer science graduate students, a bad one. It accepts as given the effectiveness of multiple choice tests as a demonstration of understanding: that’s another thing much of the past decade of education research has debunked.
  4. Clearly I’m not a fan of using the scores as proxies, but if we’re gonna do that, it needs to be explicitly acknowledged that the scores on these quizzes, in absolute terms, were not good for either Raft or Paxos (42% and 34%, respectively). This is what I mean about suggesting that something is simple. Would you say, on an absolute scale, that an algorithm is simple if computer science graduate degree candidates get a 42% on a quiz about it? I would not.
  5. It’s really hard to say that two quizzes on two different subjects were of exactly equal difficulty: harder still if the authors of both quizzes admit to not fully understanding one subject while having authored the other. A more believable (though considerably more work for the subjects) demonstration might be a single integration test suite or feature checklist for a distributed system to which participants’ implementations are subjected, using either Raft or Paxos.
  6. The other metric used besides quiz scores is the students’ opinions of how understandable the two algorithms are. I don’t think that someone’s stated opinion immediately after watching a single video and without ever having tried to implement the thing is particularly instructive.
  7. The sample size is 43. I re-ran the numbers for this study; it’s true that the confidence intervals of Raft and Paxos quiz scores do not overlap, but aforementioned grievances about the proxies obviate this validity metric. 43 is a suspiciously small number for measuring such a heavily proxied effect.

I just implemented Raft and, while it’s not the hardest thing I’ve ever done, it certainly was not the easiest, either. To their credit, Ongaro and Ousterhout are not saying that Raft is a walk in the park. By contrast, they’re saying that Raft is simple relative to Paxos, specifically.

And that, I believe they are right about. I read the Paxos paper and talked to some folks who implemented (or tried to implement) both Paxos and Raft. It was this effort, not the 43 person video comprehension quiz, that convinced me that I’d have cussed a lot more trying to get Paxos working than I did trying to get Raft working.

Chels, what’s with the “Barking Spider Circus” comment?

Many a Twitter reply guy felt that I had forgotten my place by suggesting that orthodox refactoring sensibilities weren’t working for me on my Raft implementation. On Twitter itself, I ignored those people. But I’ll address the sentiment here: if I thought orthodox refactoring techniques were useless, I wouldn’t have tried them on Raft.

Here’s what happened: when I tried the orthodox refactoring techniques at the points that they are coached in gang-of-four-era literature, they made the Raft implementation harder to follow. Let’s talk about some examples.

DRYing code up works when you have two conceptually identical things happening: they can’t just be similar, or else you’e subclassing with no-ops and doing other gymnastics to get around their differences in ways that would be unnecessary if you had just kept them separate. We talk about this in greater detail over here. I ran into this with respect to responding to similar, but not identical, requests in Raft: in particular, a client’s get or show request versus a set or delete request. I made this mistake once and remembered it when I extracted AppendEntries and RequestVote logic. Could I employ polymorphism here? I could. But in any case, they’re definitely not the same object.

Similarly, extracting lots of little objects works when each of the little objects handles a relatively separate set of concerns—that is, separate enough, that the probability drops below some threshold that a developer needs to hold both of those concerns in their head at once to sufficiently understand the system for maintenance or enhancement. When you extract objects while that probability remains above the threshold, the code gets harder to read. And if you’re super familiar with the code such that you already have both concerns in your head, it can be tough to guess whether understanding just one or the other would be enough to work in that particular section of code. I ran into this when I returned to my Raft implementation after a several month hiatus and found myself constantly switching between the server class and the method I had extracted from the server class for handling requests and generating responses. I needed to move that logic back into the Server class—which made the Server class bigger. Ugh.

Now that we have some specific examples, I want to introduce a terminology shift to clarify our upcoming discussion. The changes I have described thus far have this name, “refactoring,” with “re” implying “again”—that is, that this code has already been “factored” at least once when we wrote it to begin with. They’re design changes introduced after implementation to improve the code’s flexibility and–you guessed it–understandability. But we also use the term “refactoring” to describe other things that aren’t this specific thing, so moving forward in this piece, I’ll call such changes postfactoring.

We already use an equivalent term, prefactoring, to describe striving for a level of flexibility in the code design that the system doesn’t need or doesn’t yet need, often at the cost of understandability.

The term “refactoring” tends to cover both prefactoring and postfactoring, and refactoring tends to happen somewhere relative to the system’s implementation. Refactoring focuses on finding flexible and understandable ways to handle the details of the system’s intended behavior.

Refactoring therefore happens after defining the system’s intended behavior.

It’s at this preliminary step that authors of the Raft paper focus their efforts for making Raft’s design understandable. Are implementation details described in the paper? Yes: there’s a table detailing which objects to make and which attributes to put on those objects. Are these details the heart of the algorithm’s successes in achieving understandability? I don’t think so. Why not: at the beginning of this implementation, I set out to read the “what to do” portion of the paper and come up with the “how to do it” portion myself. I wanted to develop my intuition by, hopefully, slamming into the same barriers the original Raft team did while refining their implementation, so I had firsthand experience with why we did things the way we did them. I did not look at the paper’s implementation table until after my implementation was almost finished. When I did look at it, the implementation I had come up with differed a little, but not a ton, from the table. The system’s design, rather, strongly implicated a specific approach to implementation. When I tried to editorialize on that approach, I made things worse for myself.

So, more generally: if you think of a system’s implementation as a motorcycle, the different kinds of refactoring (prefactoring and postfactoring) are equivalent to improving the shocks on the motorcycle to compensate for uneven territory and provide a smooth ride for the driver. You can think of the system’s design, by contrast, as the road that the motorcycle follows. Raft’s design focuses on techniques for smoothing out the road, such that fancy shocks aren’t needed—and, when over-applied or inappropriately installed, make the ride harder than it has to be.

Waterlooville: WWII-era motorcycle painted allied colors. Photograph by Colin Moore

How does a system’s design resist a convoluted solution?

I don’t think of length as a proxy for complexity, but it’s a clue. I wrote my Raft implementation in Python and it clocks in at about 700 lines, not counting logs and tests, but counting all the places where I decided to get cute and give the servers some responses with personality (this project was for fun, after all). I believe the implementation mentioned in the Raft paper itself was in Java and came in at around 2000 lines, being probably somewhat more robust (though indubitably less cute) than mine and in a language that trends more verbose than Python. That’s objectively not 60,000 lines. We’re not talking about a massive body of code.

Here’s how the Raft paper describes the team’s approach to designing the algorithm:

There were numerous points in the design of Raft where we had to choose among alternative approaches. In these situations we evaluated the alternatives based on understandability: how hard is it to explain each alternative (for example, how complex is its state space, and does it have subtle implications?), and how easy will it be for a reader to completely understand the approach and its implications?

We recognize that there is a high degree of subjectivity in such analysis; nonetheless, we used two techniques that are generally applicable. The first technique is the well-known approach of problem decomposition: wherever possible, we divided problems into separate pieces that could be solved, explained, and understood relatively independently. For example, in Raft we separated leader election, log replication, safety, and membership changes.

Our second approach was to simplify the state space by reducing the number of states to consider, making the system more coherent and eliminating nondeterminism where possible. Specifically, logs are not allowed to have holes, and Raft limits the ways in which logs can become inconsistent with each other. Although in most cases we tried to eliminate nondeterminism, there are some situations where nondeterminism actually improves understandability. In particular, randomized approaches introduce nondeterminism, but they tend to reduce the state space by handling all possible choices in a similar fashion (“choose any; it doesn’t matter”). We used randomization to simplify the Raft leader election algorithm.

In Search of an Understandable Consensus Algorithm (Extended Version)
Diego Ongaro and John Ousterhout
Stanford University

The techniques that Ongaro and Ousterhout describe here are an altogether different set from the ones we find in Design Patterns or Refactoring to Patterns or Effective X. Instead, they look more like the set of techniques that Michael Feathers describes in Unconditional Programming, the book I have not-so-secretly used this blog to badger him to write for three years since I saw him do this talk about it. My badgering has earned me a peek at the table of contents, which we’ll get to soon.

In fact, we’ve discussed a similar set of techniques on this blog, specifically with regard to what software engineers can learn from rocket launchpad design (I try to keep it lively 😉). In that piece, I said:

The idea is to identify your edge cases—your what-ifs, and exceptions, and your it-shouldn’t-do-thats—and rather than handling them, make them go away. Doing so produces a more robust piece of software.

Doing this also requires software engineers (that’s us) to identify the moments when we are negotiating with an edge and determine whether that edge deserves negotiation. We have to identify the times when we are answering the question “How should I account for X?” and consider “Can I make X go away?”

Lessons from Space: Edge-Free Programming
Chelsea Troy (that’s me!)

The Two Techniques

First, Ongaro and Ousterhout mention problem decomposition: they separate leader election, log replication, safety, and membership changes in Raft to operate independently of each other. These four subjects offer us seams along which we can divide our implementation into separate modules: they don’t interfere with one another, so we don’t get trapped in our own net with the requirement to understand them all at once. This design choice eliminates whole classes of cross-functional interactions: collections of potential pitfalls for which we’d struggle to install a suitable set of shocks.

Feathers might characterize this effect as removing conditions. We have three separate states for each server: whether it’s the leader, what its log looks like, and whether it’s in the cluster or not (we get safety as an emergent property of the system). Mostly, we can handle these three axes of status independently of one another, rather than have to check them all before making any changes. Avdi Grimm talks about separating axes of status specifically in his talk “Build Graceful Processes.” If you’re interested, I have a series on this with some code examples.

Second, the Raft authors reduce the number of states to consider, and introduce nondeterminism only when any of the nondeterministic outcomes would be handled in pretty much the same way. These design choices, in Feathers parlance, impose edges on the system where previously some negotiation would happen. A lot of these edges come from Raft’s strong leadership model: the leader server is in charge. That server only appends entries to its log: it never removes or changes entries. If it encounters inconsistencies with a follower’s log, it overwrites the follower’s log. It issues all append_entries commands. It accepts all client requests. It treats its own state machine as the source of truth in responding to those requests. In this respect, it implements another of Feathers’ techniques: tell, don’t ask. There are few conditionals for how a follower responds to a command: the leader isn’t accepting consultation on the state of the system as a whole. The sole exception happens in log replication, where a follower notes that its log appears not to match the leader’s, in which case the follower receives lines from the leader to catch up its log and, if needed, delete entries that don’t agree with the leader’s.


We have sets of patterns for handling complexity in our code, as well as guidelines for when to introduce them. What about a set of practices that we can use before writing code to eliminate complexity before we write it?

That can be tough when we’re using an iterative approach, or don’t know all the requirements of our system up front. Designing Raft involved trying many different implementations and discovering where each of them broke down, or where edge cases popped up, before alighting on the final design. We might not be able to afford dozens of experimental tries in a production setting.

This is where something like formal verification might serve us well. We can use a language-agnostic formal verification tool like TLA+ or Alloy to describe the states and interactions in our system and identify potential edges and inconsistencies before implementation.

Even for systems where the formal verification approach feels heavy-handed, we can employ a pencil-and-paper risk analysis technique to assess what might go wrong with our current approach and “how bad” the problem might be. I describe such a framework right here, and my students complete a risk profiling workshop with a parrot emergency room app as their example (see this piece for an example diagram). I give the workshops at companies on request, by the way 🙂.

But at a minimum, even in an iterative shop, we can evaluate potholes in our pavement before jumping to install shocks. We look at some specific examples of such evaluations in this case study that looks at a mental health management app written for Android.

In any case, we’re used to striving for a flexible and understandable implementation of a model. We can perhaps benefit from drawing some of that effort forward into streamlining the model itself so that our implementation boasts fewer concerns to separate and fewer duplicated concepts to DRY up.


Have you enjoyed this series?

You can help me keep writing by tossing a coin at this Patreon 🙂

If you liked this piece, you might also like:

How to Jump-Start a New Programming Language, or maybe, even, gain a more concrete mental model of the one you already use!

Lessons from Space: Edge-Free Programming, which also explores an Android app’s design and the engineering choices behind it (plus, cool pictures of rockets!)

How does git detect renames?—This piece is about how git detects renames, but it’s also about how to approach questions and novel problems in programming in general

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.