I’m working my way through Crafting Interpreters. The project guides programmers through building their own interpreters for the Lox programming language. I started writing a blog series about my progress. You can see all the posts so far right here.
In the last post covering Chapter 8, we talked about managing state. Now, in Chapter 9, we add the logic that makes Lox Turing-complete: conditionals and loops.
Bob starts this chapter with a helpful summary of the history of Turing machines so we know what we have left to do. It culminates here:
Turing machines are pretty dang simple, so it doesn’t take much power…You basically need arithmetic, a little control flow, and the ability to allocate and use (theoretically) arbitrary amounts of memory. We’ve got the first. By the end of this chapter, we’ll have the second.– Bob Nystrom in Crafting Interpreters
A Consistent Pattern
As we add each expression, then our first statements, and now, branched logic to Lox, the steps for each addition form a consistent pattern (exemplified here with the pattern for conditional logic).
- Add a line to
GenerateAst.javathat defines the new grammar. When we run this file, it conveniently transforms that line into a class representing the new grammar with all the requisite fields.
Parser.java, use our existing
match function to identify the new grammar from its leading token. We already defined all the tokens in
Scanner.java during an early chapter, so we haven’t had to add new ones so far.
3. Still in
Parser.java, use our existing
consume function to parse the subsequent tokens into the new class representing the new grammar:
Interpreter.java, translate the class representing the new grammar into execution in Java:
The process is beautiful, simple, and clever. We repeatedly harvest the fruits of early design decisions in a manner totally unlike any real-life software development process I have ever seen.
Our workflow for writing this interpreter is, in a word, rehearsed.
I can tell that Bob wrote this entire interpreter and then wrote this book from an incremental buildout of the finished version. There are no kludges and few workarounds. We don’t have to stop and revisit decisions that we made earlier when we had less information.The grammar class generator is a dead giveaway: If I were writing an interpreter, even if I had written interpreters before, I’m struggling to imagine myself starting with a class generator rather than generating classes by hand.
Now, to be fair, the other “dead giveaway” is that Bob admits that he started with the code. I would, too. I also think the choice paid off handsomely. This book endeavors to introduce readers to a representative sample of the judgment calls inherent to programming language design, and the generator is a clever way to avert the risk of confounding the book’s subtle points with 350 lines of boilerplate.
But I want to draw attention to it because there are two possible denotations for the phrase “Understand how an interpreter is written”:
- Understand the process by which an interpreter is written
- Understand, once the interpreter is written, what the final choices were
This book is doing the latter, in my opinion, uncommonly well. It does not do the former. Most code written for educational purposes doesn’t. Bob does an unusually good job of calling this out, too.
Let’s look at some more examples.
if conditional operator behind us, Bob turns to the logical operators
or. He explains:
We could reuse the existing Expr.Binary class for these two new expressions since they have the same fields. But then– Bob Nystrom in Crafting Interpreters
visitBinaryExpr()would have to check to see if the operator is one of the logical operators and use a different code path to handle the short circuiting. I think it’s cleaner to define a new class for these operators so that they get their own visit method.
Bob elects to give the logical operators their own expression representation despite the fact that they have the same fields as other binary operators. I love this choice in part because I’m a big believer in having three distinct examples of a pattern, which I can confirm are all getting used the same way, before I refactor them toward a consolidated version. Plus, in this case, the interpreter immediately does different stuff for the logical operators than the other binary ones. So this decision, I think, matches what might happen in a real-life incremental interpreter buildout.
Then we do the 4 steps listed above to build out a
while loop and immediately implement the
for loop by…
…uh, by bolting it onto the existing interpretation for the
And this is where Bob’s explicitness comes in handy. In the chapter, he explains:
We’re going to desugar– Bob Nystrom in Crafting Interpreters
forloops to the
whileloops and other statements the interpreter already handles. In our simple interpreter, desugaring really doesn’t save us much work, but it does give me an excuse to introduce you to the technique. So, unlike the previous statements, we won’t add a new syntax tree node. Instead, we go straight to parsing.
He’s taking the opportunity here to introduce desugaring—taking a construct that programmers can use on the frontend (called syntactic sugar) to make their development experience more comfortable and deconstructing it during parsing into the same execution path as the graceless, non-sugary version. The minimal example of this might be a language that allows
i++; and parses that just like
i = i + 1; on the backend.
The choice here is to prioritize the opportunity to demonstrate a concept over the way a programming language developer might actually do this. For me, this feature would represent a second instance of what we did with the
while loop and would therefore fail my rule of three. So I’d do the logic completely separate from the
while loop and only consider consolidating if I found myself doing the exact same thing a third time.
Now, Bob is an experienced programming language designer and I’m just some dev with a blog, so take my opinion with a grain of salt, but I don’t think that that obscures the larger point:
Instructional code optimizes for different things than deployed code.
Let’s look at just one more example. When Bob shares the code for interpreting the
for loop, the first change to
Parser.java is the addition of the
java.util.Arrays import. Why? Because, for simplicity, that means that the code changes are shared from the top of the file to the bottom, as if we are reading a finished commit. This isn’t the order in which that line would be added in a real development process. Instead, on a modern IDE, a developer would start using
Arrays, wait for the red underline underneath it, and then smack the hotkey that adds the import.
Once again: this is fine. Bob’s decisions optimize for demonstrating judgment calls in programming language design—not for writing this specific interpreter.
This is common of instructional code. In fact, in my mobile software development class, I show my students this code example of how to set an empty list view in iOS. Then I explicitly ask them: “How do you think the author’s goals with this piece of code might be different from the goals of someone reading or referencing this code?” The author of that code wanted readers to be able to copy and paste the example into a brand new iOS app and get something running. Presumably then, the readers would transfer that code to an existing app and integrate it with that app’s other features (and probably also not put three separate extensions in one file).
It’s valuable to keep this divergent purpose in mind.
Instructional code is there to inform and enrich our perspective on a problem, but it’s not there to solve our problem for us. In the iOS code linked above, an iOS developer might start from Taha Sonmez’s example, but would then modify it to produce a different end result. In the case of Crafting Interpreters, a programming language designer might arrive at Bob’s code in the end, but would probably take a more circuitous route.
And this is okay. It is, in fact, why I take umbrage with books that bill themselves as The Definitive Guide to Whatever. Unless you’re talking about an extremely narrow topic—like “How to use this one method with one signature option in this one version of this one framework”—unless it’s that narrow, there is no “definitive guide.” In fact, for most topics, there’s not even a singular definitive understanding to put in a guide. Instead, folks have to triangulate the landscape from several different authors and teachers with different perspectives on the topic. Only then can they contribute to the discussion in a meaningful way.
If you liked this post, you might also like…
The rest of the Crafting Interpreters series
This post about structural verification (I’m just figuring you’re into objectcraft, so)
This post about why use, or not use, an interface (specifically in Python because of the way Python does, or rather doesn’t exactly do, interfaces)
Really interesting! I might checkout this series myself. Glad to have stumbled upon this.