Raft 12: Maintenance Repairs

Reading Time: 9 minutes

Over the preceding 11 parts of this series that I sometimes thought would never end, we (well, you, watching me) implemented the Raft distributed consensus algorithm from this paper with socket servers in Python.

Here’s where you can read all the posts so far.

We have one more thing to implement, but before we do, I wanted to go in and clean up some personal annoyances. We’ve talked about one of these before and I’ve silently fumed at the other two, so now we’ll fix all three.

Fix #1: Cache a Computationally Intensive Operation

Back in part 9, I showed you this monstrosity and promised you I’d come back to it:

Screen Shot 2020-08-17 at 3.16.30 PM

Yes, I am aware that this block of code would be very slow if I had millions of log entries. For one thing it calls log_access_object(), which we saw has a ton of logic, three separate times, when it could easily cache the result after the first call. I’m sure we’ll have more to do in this block, which will give us an opportunity to revisit optimizing it later.

I want to make good on that promise. I had a couple of options:

  1. Store the result of log_access_object at the call site and access that variable thereafter rather than calling the method multiple times
  2. Cache the result of the method in the KeyValueStore so that multiple method calls would not redo the computation as long as the logs had not changed between the two calls.

I decided on the second of these two approaches: that way, a client operation can use this method multiple times without having to choose between learning KeyValueStore‘s implementation details and ending up with something super slow.

It’ll still have to redo the calculation with every change to the logs, but for now, I decided, this commit was enough to satisfy me.

It instantiates an attribute on KeyValueStore called log_recently_changed,

Screen Shot 2020-09-01 at 5.48.39 PM

Stores the result of the first run of log_access_object in an instance variable,

Screen Shot 2020-09-01 at 5.49.42 PM

Keys off whether the logs recently changed to determine whether to send the current cached instance variable or replace that variable with the computation,

Screen Shot 2020-09-01 at 5.48.31 PM

And sets logs_recently_changed to True in every method that updates the logs, and False in the log_recently_changed method itself.

Screen Shot 2020-09-01 at 5.51.58 PM

This would still be slow in the event of a huge log and would run every time the log got changed. I’d worry about that if and when this implementation ran into that problem.

Fix #2: Stop Closing the Connection when Client Requests a Write

I like a communicative server. But when I started up a client to talk to a server, I was seeing some inconsistent behavior:

Screen Shot 2020-09-01 at 5.54.44 PM

What’s happening here:

  1. I start up the client and send a request to the leader (the server at port 10000) to send back the value associated with the key ‘a.’
  2. The leader (which is named “Every” as in “Every Tom, Dick, and Harry”) sends back the value ‘3’. The call ends, and the client prompt “Type your message: ” appears, encouraging the user (me) to type in another request.
  3. I send the leader a request to associate the key ‘z’ to the value ’26.’
  4. The socket closes and my client is shut down.

The ‘set z 26’ call works on the server side, but I wanted it to finish and give me the “Type your message: ” prompt like the other calls do. So I fixed that in this commit.

Commit message:

Because we break in the event of an empty string response (server.py lines 160 & 161), the server cuts off connection with the client if we do not include a response to the client.

This isn’t the worst thing in the world—we can easily restart the client. But I wanted continuity in the behavior of the server at the end of responding to any client request. So I added a cute reply from the server to the client when the client sends a write command, and now the connection remains open after the command succeeds.

The fix:

Screen Shot 2020-09-01 at 6.04.42 PM

Fix #3: Tell the Client Who is (Probably) the Leader

One of Raft’s keystone characteristics is its strong leadership model: Only the leader directs other servers what to add to their logs, and only the leader accepts client requests. Any follower server, if contacted by a client, should refuse to fulfill the request.

Until now, our servers have responded with an unhelpful and rather curmudgeonly message. This is, first of all, rude.

Secondly, it doesn’t provide the client with any information about which server to contact instead, which forces the client to keep guessing. Though no follower in Raft gets to be a source of truth, each follower does know who is probably the leader—it’s whichever server last reached out to it in a leaderly fashion. The followers can (and according to the Raft spec, do) pass that information to the client.

I implement that in this commit, creating an instance variable to store the name of the last server that sent an append_entries call:

And then sending that name to any client that contacts this follower server:

We set a sensible default value that will apply in the most common situation where a follower hasn’t heard from a leader yet: on startup of the system.

It’s theoretically possible that a client would contact a new follower in the brief period between when it starts up and when the leader sends it an append_entries call, in which case this default value isn’t totally accurate. I picked this value because it seems like the more likely case, and also because it was more fun to come up with than something like “unknown to me.”

And there you have it.

I can open 5 terminal windows like this and start a follower server in each of them (the ‘False’ argument at the end of each command line determines whether the server starts as a leader):

Screen Shot 2020-09-01 at 5.39.26 PM

Once one of these servers’ election timeouts elapses, it will start (and probably win) an election and start pinging the other servers. The servers have rafted! I can happily open a client against the leader server and get, set, delete key-value pairs to my little heart’s content.

Typically, this is where I might refactor.

I’m going to hold off in this case. I foreshadowed this in post 7, during log replication:

My server implementation relies on a conditional statement to parse requests and issue responses. Before my hiatus, I moved this conditional statement into its own file called parsing.py. As I got back into the code after six months away from it, I found that one of the first things I wanted to do was move the conditional back into the server class. You see that change reflected in this commit.

Now that I’ve done some more work and reacquainted myself with the code, I find myself wanting to pull it back out again, largely to make it easier to unit test. I feel weird about this because I pride myself on prioritizing ‘legibility’ in the code I write—but it seems that what makes my code ‘legible’ to me changes depending on my immediate familiarity with it. Maybe if the conditional statement were under unit tests, I’d like it in its own file even after coming back from hiatus. Hmm. Maybe I’ll save this for later and put it in a different blog post series.

And I came back to it in the post 9 as well:

To be frank, my experience with this code base has contradicted most conventional wisdom about refactoring. First of all, as I mentioned in a prior post, I un-refactored something to make it legible when I returned to this code base after a hiatus. Then there’s this:

So maybe in the next post we’ll go straight into implementing elections and see where that takes us.

Do I think refactoring is universally bad? Don’t be ridiculous. Would I write this much about refactoring if I thought it were universally bad?

No—I think there’s something more interesting going on here. And rather than ignore it, I’d like to dig deeper. That, my friends, is the big treat that I have planned for you in the Raft season finale.

But of course, this post just crossed 1000 words. So, I’ll be leaving that for the next one. Get excited!

If you liked this piece, you might also like:

How to Jump-Start a New Programming Language, or maybe, even, gain a more concrete mental model of the one you already use!

Lessons from Space: Edge-Free Programming, which also explores an Android app’s design and the engineering choices behind it (plus, cool pictures of rockets!)

How does git detect renames?—This piece is about how git detects renames, but it’s also about how to approach questions and novel problems in programming in general

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.