I’m working my way through Crafting Interpreters. The project guides programmers through building their own interpreters for the Lox programming language. I started writing a blog series about my progress. You can see all the posts so far right here.
Lately I’ve been doing some of the book’s exercises. I wrote here about its exercise to bring an object-based pattern to a functional language. But most of the exercises focus on making changes to the Lox compiler. You can check out my solutions to those in pull requests against my fork of the repo. As of this writing you can see my solutions for adding block comment syntax, ternary syntax, and comma operator syntax to Lox.
Though the exercises are meant to be done on the partially complete compiler that the reader has at each point in the book, I have so far exhibited the gall (or maybe the foolishness) to implement them on the finished compiler, so you can see how they’d work in a complete programming language! In case you’d like to try any of these exercises yourself, I’ve attempted to make each feature’s automated tests a separate branch from the solution, so you can check out the tests branch and code against that :).
In one case so far I have failed to separate the tests and implementation. That case is the implementation of error productions.
What is an error production?
Sometimes a compiler’s conversion of frontend code to an abstract syntax tree works perfectly. More often it doesn’t, because the frontend code contains syntax errors or unfinished code. The ideal parse error should happen, among other things:
- soon enough that it’s easy to spot and fix
- not so soon (read: constantly) as to annoy the programmer
- with enough information to fix the problem
- without rendering all the downstream code un-parseable such that the editor window becomes a living hell of red highlighting (looking at you, Swift/XCode).
That last one is a real drag. So far we’ve avoided such an event in Lox with a good ol’ exception: we raise the first parse error we run into, which effectively stops the parsing process, so the parseability of the remaining code isn’t even checked.
Another option, which I’ll show you today, is to use an error production: that is, account for common syntax errors as part of the parse tree such that the parser doesn’t choke on them. The nice thing about this is that, since the parser didn’t choke on the first one, we can present (ideally) all of the issues to the programmer at once rather than forcing them to fix them one by one and wait in anticipation after each fix to figure out what else they did wrong.
The error production challenge in Crafting Interpreters reads:
3. Add error productions to handle each binary operator appearing without a left-hand operand. In other words, detect a binary operator appearing at the beginning of an expression. Report that as an error, but also parse and discard a right-hand operand with the appropriate precedence.– Chapter 6, Exercise 3, Crafting Interpreters
As I am wont to do from several years of Labs training, I began my endeavors with a test:
The syntax is a bit cryptic, but what happens is that the test runner runs the Lox program and then checks for the error described in that line above the code that looks like a comment.
To make error productions, I added two constructs.
One is in the Lox grammar:
Expr.Nothing represents the absence of an expression where there is supposed to be one.
We can use this to parse lines of code like
* 4;, which would multiply the left operand by four except that there is no left operand. It takes in a string, but that’s not because it needs one; it’s because the
GenerateAST class in the Lox compiler doesn’t like generating a node that takes no arguments, and I wanted to stay focused on these error productions rather than go mess with the AST generator. So when I initialize that expression…well…
Once I had replaced the exception thrown in
primary() with my shiny new
Expr.Nothing expression at the point where it would catch a missing operand in my binary expressions, I hopped over to a spot in the code where the operands of a binary expression get parsed—
term()—and checked the type of the return value for the left operand.
I extracted this helper function after the fact: for a while, I just had the behavior right on this line. Here’s what
checkForMissingExpression looks like:
So when the parsing step has hands me an
Expr.Nothing, I create my other new construct:
HandledParseError. This complements the existing
ParseError, which I renamed to
HandledParseError takes a reference to the token where the error happened (to later report where it was) and a message for the programmer (so they know what’s wrong). I add that shiny new instance to a list of
handledParseErrors that lives on the parser object itself.
Once parsing is complete, the Parser reports an error for each item in this list:
It looks beautiful, right?
So at this point I run my tests. They pass!
A whoooole bunch of other tests fail.
Why? Because it turns out that the Parser was leaning pretty heavily on that “Expect expression” exception.
So we have two options: one option, perhaps the sensible option, is to stop here and call this a proof of concept.
I didn’t choose that option.
Bring on the errors!!
To be candid, the errors that a lot of these failing tests were highlighting turned out to be not all that specific to begin with. For example, this lox code also surfaced “Error: Expect expression”:
It’s telling us that
.123 is not an expression. Specifically, you can’t start an expression with a period like this in Lox. But if you don’t know that and you think you’ve written a perfectly valid decimal expression here, this error message is a little…uh…
What if it were this instead?
So let’s add an error production for that. The first step is to identify where this error comes from, and it turns out that this expression gets parsed in
primary() because it’s a number. So we add a check for a dot at the beginning of the expression and surface a helpful message about it.
This is another little helper function that I pulled out. Like the other one, I didn’t do this until later on in the effort, and for a while the implementation I’m about to show you was just lying here in
primary() in all its verbose glory.
This function, instead of checking for
Expr.Nothing, checks whether a token you pass in matches a specific type that you also pass in, and generates an error production at that token with a passed-in message if it does.
I ended up using it in a few other places throughout the Parser to surface more helpful error messages for other parse errors that failed tests once I started messing with the binary operators. For example, these error messages needed a face lift:
You’ll also notice that the updated tests check for additional error messages that the original ones don’t check for. This is because error productions allow parsing to continue past a parse error, such that if the code has additional parse errors, the parser will catch those, too.
Et voilà, helper function to the rescue:
So far, we are missing some low-hanging fruits.
Specifically, we implemented an error production to capture the absence of a left-hand operand for addition and subtraction operations, but not for multiplication, division, or any of the comparison operators like
==, or the logical operators
But we’ve done the hard part, so now we can step through the parser and add a line to the function that parses each of those types of expressions. I won’t put them all here as that would get repetitive, but here’s the one for inequality comparators:
And here’s one for a logical operator:
Our hard work from earlier is paying off. Have an apple, as a treat!
I added one more fun error production.
Lox has a
For the pedantic among you: yes, this is technically a statement and not an expression. Lovingly, shut up. We’re about to change this error message anyway.
Is that better? You prefer that? Good.
Implementing for this test offers us a second opportunity to use our helper function that checks for missing values:
I’ve done my level best to make this code approachable for you to try out some error productions of your own!
Feel free to use my examples to write your own tests, and feel free, of course, to use my helper functions. What common syntactic errors would you like to see Lox surface, and what would you like the error message to include?
Here is the branch to get you started. Here are also some ideas for getting your feet wet, in increasing order of difficulty based on what’s on the branch today:
- Add an error production to handle mathematical, comparison, or logical operators appearing without a right-hand operand.
- Add an error production to explain to a programmer that their variable name cannot be just numbers, or cannot start with a number.
- Add an error production to handle a typo in a keyword like
flase. To do this, you’ll have to add a token type for your typo of choice!
For all of these, I recommend starting by writing a test and running it (
make test_jlox, from the
craftinginterpreters directory) to help you understand how the compiler handles these situations today.
That’s all I’ve got for you today. Hopefully you enjoyed!
If you liked this post, you might also like…
This post about what pride means to me in practice (if you’re prepared to watch me talk about error productions for 2,000 words, you can survive 16 minutes of hearing why any of this matters to me)
This post about why use, or not use, an interface (specifically in Python because of the way Python does, or rather doesn’t exactly do, interfaces)