Implementing Function Declaration and Invocation in a Programming Language

Reading Time: 7 minutes

I’m working my way through Crafting Interpreters. The project guides programmers through building their own interpreters for the Lox programming language. I started writing a blog series about my progress. You can see all the posts so far right here.

In the last post covering Chapter 9, we talked about conditionals and loops. Now in Chapter 10, we add programmer-defined functions to Lox.

A deceptively simple proposition

At this point in our journey of writing an interpreter for the Lox programming language, we have already added the syntax for someone to declare a variable like var greeting = "Hello!".

That’s binding a a programmer-defined name to an expression. Adding a programmer-defined function sounds like a similar operation, except that instead of binding a name to an expression, we will now bind that name to a collection of statements—the body of the function (hereafter I’ll call it a “block”). And in fact, that is where we’ll ultimately end up:

But the curious thing about programming is that sometimes, the things that we think are going to be hard are not the things that are actually hard. Or, more accurately, the things we thought would be easy turn out to be trickier than we expected.

Original tweet here: https://twitter.com/Pinboard/status/761656824202276864

As we implement an interpreter to recognize and call functions properly, we run into three such vessels of trickiness.

Tricky Thing #1: Parsing Arguments to a Function

Getting the parser to recognize the declaration of a function isn’t so bad. Just as we use var to signal the start of a name-expression binding, we use fun to signal the start of a name-block binding. But that block might use values that the programmer passes in, and the interpreter needs to understand the name-expression binding for each of those values. But parsing something like (arg_one, arg_two, arg_three) into a list of variables with one token of lookahead means we can’t really say “take everything between the parentheses and split on the commas.” By the time we get to that last parenthesis, the first has been consumed. So we have to figure out an incremental way to do this.

The way that Bob demonstrates in the book looks something like this:

“After the opening parenthesis, look for an expression. After that, look for a list of expressions, each preceded by a comma. It’s also okay if none of that is there.”

The “none of that” option covers a function with no arguments.

I don’t know that this production would have occurred to me, but I guess it occurred to someone, and now it’s pretty common. Bob explains:

I admit, this seems more grammatically awkward than you’d expect for the incredibly common “zero or more comma-separated things” pattern. There are some sophisticated metasyntaxes that handle this better, but in our BNF and in many language specs I’ve seen, it is this cumbersome.

-Bob Nystrom, Crafting Interpreters

Tricky Thing #2: Returning a Value

Once we have parsed arguments into a list of expressions, we need to wrap those up with some kind of mechanism for calling the function. For this, Bob introduces the LoxCallable interface. It has one attribute called arity for expressing the number of arguments the function should take and one function of its own: call.

package com.craftinginterpreters.lox;

import java.util.List;

interface LoxCallable {
  int arity();
  Object call(Interpreter interpreter, List<Object> arguments);
}

The class LoxFunction implements that interface (eventually we will also have classes implement the interface):

package com.craftinginterpreters.lox;

import java.util.List;

class LoxFunction implements LoxCallable {
  private final Stmt.Function declaration;

  LoxFunction(Stmt.Function declaration) {
    this.declaration = declaration;
  }

  @Override
  public Object call(Interpreter interpreter,
                     List<Object> arguments) {
    Environment environment = new Environment(interpreter.globals);
    for (int i = 0; i < declaration.params.size(); i++) {
      environment.define(declaration.params.get(i).lexeme,
          arguments.get(i));
    }

    interpreter.executeBlock(declaration.body, environment);
    return null;
  }
}

So, when a programmer-defined function is called, the interpreter:

  1. Allocates a new environment for it, descended from the global environment
  2. Goes through the arg list one by one and defines each argument in the new environment
  3. Executes the list of statements in the function body one by one
  4. Returns…null.

But functions need to be able to return values. Simple, right? Execute the expression that appears right after the keyword return, or look up the variable name in the environment if that’s the expression, and then replace return null with whatever that is.

That version of the plan works as long as the return statement is the last thing in the function, but that’s not always the case. So the hard part is figuring out how to unwind the call stack back to the original call of the function when the interpreter encounters a return statement. How do we know where to unwind to?

Here’s Bob’s dirty trick: when the interpreter encounters a return statement, it throws it.

 @Override
  public Void visitReturnStmt(Stmt.Return stmt) {
    Object value = null;
    if (stmt.value != null) value = evaluate(stmt.value);

    throw new Return(value);
  }

That return object has a value attribute and inherits from RuntimeException in Java. When thrown, it turns off all of Java’s normal alarms and messaging (like the stack trace) and saves off the value:

package com.craftinginterpreters.lox;

class Return extends RuntimeException {
  final Object value;

  Return(Object value) {
    super(null, null, false, false);
    this.value = value;
  }
}

Then, when the call to a method catches this type of exception, it can unwrap that value and use it as the result of the function call expression:

// In LoxFunction

  @Override
  public Object call(Interpreter interpreter,
                     List<Object> arguments) {
    Environment environment = new Environment(interpreter.globals);
    for (int i = 0; i < declaration.params.size(); i++) {
      environment.define(declaration.params.get(i).lexeme,
          arguments.get(i));
    }

    interpreter.executeBlock(declaration.body, environment);    
    try {
      interpreter.executeBlock(declaration.body, environment);
    } catch (Return returnValue) {
      return returnValue.value;
    }
}

I think this is pretty ingenious, though Bob seems compelled to further justify the choice:

For the record, I’m not generally a fan of using exceptions for control flow. But inside a heavily recursive tree-walk interpreter, it’s the way to go. Since our own syntax tree evaluation is so heavily tied to the Java call stack, we’re pressed to do some heavyweight call stack manipulation occasionally, and exceptions are a handy tool for that.

-Bob Nystrom, Crafting Interpreters

Tricky Thing #3: Enclosing Scope Properly

As we saw earlier, when a function is called, the interpreter creates an environment for it that inherits directly from the globals. This would work in cases where all of the functions are declared at the same level: one below the globals. But it doesn’t work if we try to nest functions because an inner function needs an environment that inherits from the outer function’s environment, which itself needs to inherit from the globals.

To resolve the problem, we add support for functions being closures: that is, “closing over” (encapsulating and keeping track of) their own internal state.

This isn’t especially tricky to do—we add a closure attribute to the LoxFunction object and have its call function instantiate an environment that inherits from closure instead of from the globals—BUT:

  1. It can be tricky to catch, because we don’t catch the fact that functions drop their encompassing state upon returning until we declare functions inside other functions
  2. The solution described above actually does leak something. Evidently, we find out more about it in the next chapter 🙂

If you liked this piece, you might also like:

The rest of the Crafting Interpreters series

This post about structural verification (I’m just figuring you’re into objectcraft, so)

This post about why use, or not use, an interface (specifically in Python because of the way Python does, or rather doesn’t exactly do, interfaces)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.