The Hard Thing About Nested Scopes: Tracking State in an Interpreter

Reading Time: 9 minutes

I’m working my way through Crafting Interpreters. The project guides programmers through building their own interpreters for the Lox programming language. I started writing a blog series about my progress. You can see all the posts so far right here.

In the last post covering Chapter 10, we talked about adding programmer-defined functions to Lox. We left off here:

…when a function is called, the interpreter creates an environment for it that inherits directly from the globals. This would work in cases where all of the functions are declared at the same level: one below the globals. But it doesn’t work if we try to nest functions because an inner function needs an environment that inherits from the outer function’s environment, which itself needs to inherit from the globals.

To resolve the problem, we add support for functions being closures: that is, “closing over” (encapsulating and keeping track of) their own internal state.

This isn’t especially tricky to do—we add a closure attribute to the LoxFunction object and have its call function instantiate an environment that inherits from closure instead of from the globals—BUT:

– It can be tricky to catch, because we don’t catch the fact that functions drop their encompassing state upon returning until we declare functions inside other functions

– The solution described above actually does leak something. Evidently, we find out more about it in the next chapter.

– The end of the previous post

So what does that solution leak??

It leaks the state in cases where a nested scope names a variable with the same name as a variable in the outer scope. Bob includes this brief example at the beginning of the chapter and explains that we should expect this code to print “global” but instead it prints “block”:

var a = "global";
{
  fun showA() {
    print a;
  }

  showA();
  var a = "block";
  showA();
}

NOTA BENE: If the above example confuses you and you think it should, in fact, print “global,” note that the second declaration of a has a “var” in front of it. It’s explicitly supposed to be a distinct variable in a distinct scope. For some reason my eyeballs glossed over that second “var” the first time I read this code and I was like “wait, it SHOULD print block though!”

Gah! The catch of nested scopes.

I thought a more concrete example here might help me better understand and motivate our quest to fix the issue. So I wrote myself a concrete example, which I’ll share with you below. One of the tough things about writing examples is that you often have to evaluate tradeoffs in how you do it. Bob went with a relatively abstract example that makes the code kinda meaningless, but maximally terse and simple. To make it concrete, we have to add a domain. And when you add a domain, you add all the details of that domain that force the code to be more complicated. People with programming language design experience will probably prefer Bob’s example and parse it correctly the first time through. But most of my programming experience comes from using programming languages for concrete applications, so an example in that arena helps me out a lot.

Suppose we have a program to calculate how much memory we need to display a webpage:

var totalMemoryRequirement = 0;

fun calculateForWebPageElement() {

   ...
}

We want a way to log the calculated total along the way while we’re developing, so we add an inner function to calculateForWebPageElement called logCommittedExpenditureSoFar. Then we start start doing the work:

# For this example, we're pretending Lox has classes and we can represent the DOM with one. 
# The point is to explain where our environment leak is happening, so we can pretend we implemented classes
# before we fixed this leak.

var totalMemoryRequirement = 0;

fun calculateForWebPageElement(documentObjectModel) {

   fun logCommittedExpenditureSoFar() {
       print totalMemoryRequirement;
   }

   logCommittedExpenditureSoFar();

   var totalForWebPage = 0;

   totalForWebPage = totalForWebPage + documentObjectModel.memoryOverhead;

   for (var i = 1; i < documentObjectModel.childObjects.length; i = i + 1) {
       totalForWebPage = totalForWebPage + calculateForWebPageElement(documentObjectModel.childAt(i));
       logCommittedExpenditureSoFar();
   }

}   

So, this code will work as expected and print 0 in both cases, because at both points we haven’t modified the totalMemoryRequirement number yet. Suppose, now, that a programmer comes along ant wants to change the name of totalForWebBrowser. In a properly working program, changing the name of a variable should affect the program in one of two ways:

  1. It shouldn’t affect the program. Well, except in a case where…
  2. It’s trying to use the same name as another variable in the same scope. That’s ambiguous and should result in a compilation error.

Since totalForWebPage is the only variable declared in the scope of calculateForWebPageElement besides documentObjectModel, it should be renameable to anything besides documentObjectModel.

Now, suppose that a programmer wants to rename totalForWebPage to totalMemoryRequirement. In the current implementation of Lox, that will change the behavior of the program: the second logCommittedExpenditureSoFar() will start to print whatever numbers have been added for the documentObjectModel.

Why?

Well, Lox represents its runtime environment as a stack of HashMaps. So at the point of the second print statement, the environment looks something like this:

{ totalForWebBrowser = 3}        # The environment that is local to this function
{ totalComputeExpenditure = 0 }  # The global environment

When a variable is referenced (as it is in the body of logCommittedExpenditureSoFar()) the interpreter starts at the top of the stack, looking for a key in the HashMap. If it can’t find it, it goes down to the next environment. When a closure (such as a function) finishes executing, its environment HashMap is popped off the stack. The thing is, in our example, logCommittedExpenditureSoFar gets called after calculateForWebPageElement gets called (in the loop) and before the execution is finished, so it picks up the function’s local environment before the global one. As a result, variables inside the closure whose names conflict with variables outside the closure will override the global ones and make their values inaccessible to the program.

Let’s compare this, for a second, to a programming question that initially seems unrelated: why is it that an array/list can have duplicate values in it, but a set can’t? Or, for that matter, why a dictionary/hashmap can’t have duplicate keys?

Well, a list can have duplicates because a list has an order: A 2 in position 0 is not the same as a 2 in position 3. But sets and maps (by default) do not have order, so there’s no way to distinguish two of the same value.

That’s not exactly what’s happening with our environments, but our problem is stemming from the fact that our current variable resolution strategy doesn’t distinguish variables according to which scope they’re in—it just does it by name. We need to distinguish variables by their scopes.

How do we do that?

Bob does it by adding a variable resolution pass to the interpreter after the parsing step. The decision about when and in what order to perform various compilation steps is a somewhat subjective one, as we talked about over here in my video walkthrough of the Roc compiler. On the one hand, keeping each step in a separate pass allows them to each have their own file and places guardrails against them growing weird dependencies on each other. On the other hand, every pass takes time, and as we discussed earlier in this series, time is of the essence. For a language to get used, compilation has to happen fast.

In this case Bob makes variable resolution a separate step for the sake of illustrating how a separate pass might be done. The resolver, like the AST printer, parser, and interpreter, implements the visitor pattern to account for each node and resolve any variables there.

It declares the variable within the scope that the program is currently in, in a map that indicates whether it’s ready for use (which it isn’t yet):

  private void declare(Token name) {
    if (scopes.isEmpty()) return;

    Map<String, Boolean> scope = scopes.peek();

    if (scope.containsKey(name.lexeme)) {
      Lox.error(name,
          "Already a variable with this name in this scope.");
    }

    scope.put(name.lexeme, false);
  }

It then defines the variable with requisite value and sets that “ready for use” flag to true:

  private void define(Token name) {
    if (scopes.isEmpty()) return;
    scopes.peek().put(name.lexeme, true);
  }

The interpreter’s resolve method will take in the expression to set to a variable’s value as well as the relative depth of its scope—that is, how many “hops” down the HashMap stack into enclosing environments the interpreter should go to retrieve this variable:

  private void resolveLocal(Expr expr, Token name) {
    for (int i = scopes.size() - 1; i >= 0; i--) {
      if (scopes.get(i).containsKey(name.lexeme)) {
        interpreter.resolve(expr, scopes.size() - 1 - i);
        return;
      }
    }
  }

That resolve method on the Interpreter looks like this:

  void resolve(Expr expr, int depth) {
    locals.put(expr, depth);
  }

…and the Environment can both get a variable from a specific scope…

  Object getAt(int distance, String name) {
    return ancestor(distance).values.get(name);
  }

…and set a varable in a specific scope:

 void assignAt(int distance, Token name, Object value) {
    ancestor(distance).values.put(name.lexeme, value);
 }

Bob adds a call to the Resolver in Lox.java after the parsing step:

    Resolver resolver = new Resolver(interpreter);
    resolver.resolve(statements);

So separate scopes can now define and use new variables with names that match the names of enclosing scopes without mixing them up!

It occurs to me that another way to do this without storing the “hops” would be to repurpose the concatenated key trick from database development: assign each scope a unique identifier, like an integer or even a hash value, and tack that onto the front of the variable name before setting it to a value in a single environment HashMap. This way, instead of storing the “hops,” the compiler essentially changes the variable names it uses to include scope information. I don’t know that I’m partial to this strategy, but were I doing this again, I might give it a try.

This concludes the chapters on the implementation details for a scripting version of jlox. Next, we get to implement classes! Joy.

If you liked this post, you might also like…

The rest of the Crafting Interpreters series

This post about structural verification (I’m just figuring you’re into objectcraft, so)

This post about why use, or not use, an interface (specifically in Python because of the way Python does, or rather doesn’t exactly do, interfaces)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.