SICP 2: State and Modularity

Reading Time: 7 minutes

Screen Shot 2019-11-05 at 10.35.08 AM

I recently took Dave Beazley’s week-long course on The Structure and Interpretation of Computer ProgramsIn this series, I’ll share memorable insights from the course (here’s where you can see all the posts so far).

In this post, we’ll look at some of the concepts from the first chapter of the book.

Briefly: Scheme

The examples and exercises in the book use the Scheme programming language. I want to get to the concepts, so I’ll show you some Scheme here as a warmup and link you to this tutorial in case you want to go deeper on Scheme.

Here’s what Scheme looks like:


(define (find-smallest a b c)
(cond ((and (<= a b) (<= a c) a))
((and (<= b c) (<= b c) b))
((and (<= c a) (<= c b) c))
)
)
(define (largest-squared-hypotenuse a b c) (
cond ((= (find-smallest a b c) a) (+ (* b b) (* c c)))
((= (find-smallest a b c) b) (+ (* a a) (* c c)))
((= (find-smallest a b c) c) (+ (* b b) (* a a)))
)
)
(largest-squared-hypotenuse 1 2 3) ;–> 13 (3 squared plus 2 squared)

Things to notice:

  • The operator comes first.
  • A set of parentheses explicitly demarcate the scope of each piece of code.

We’ll return to the significance of these characteristics in a future post.

Scheme uses applicative (eager) evaluation. So the following code will never finish:


(define (p) (p))
(define (test x y)
(if (= x 0)
0
y))
(test 0 (p))

In this code, we have a method called test that takes two arguments. If the first argument is 0, it returns 0. If the first argument is not 0, it returns whatever the second argument is.

The reason this code doesn’t finish is that it’ll try to evaluate (p) forever at the point where it’s provided as an argument to test. Were scheme lazy evaluated, the fact that (p) recurses forever wouldn’t hang up the call to test on line 8 because, since the first argument passed to the test  method is 0, the return value of test would be 0 and (p)​ would never be used.

New variables and functions in Scheme can live inside the scope of other functions, as you see in the subcomponents of sqrt here. Note that the variable x remains accessible throughout the scope of sqrt, including inside the encapsulated functions.


(define (square x) (* x x))
(define (average x y)
(/ (+ x y) 2))
(define (sqrt x)
(define (improve guess)
(average guess (/ x guess)))
(define (good-enough? guess)
(< (abs ( (square guess) x)) (* 0.000001 x)))
(define (sqrt-iter guess)
(if (good-enough? guess)
guess
(sqrt-iter (improve guess)
)
))
(sqrt-iter 1.0)
)
(square (sqrt 9)) ; 3.000073824
(square (sqrt 0.0001)) ; 0.000100213
(square (sqrt 845000000000)) ; 845000004201

One thing you might notice about the above examples: you don’t see any for loops or while loops. You do see one function, though, that appears to do something repeatedly: sqrt-iter checks a condition called good-enough? and, if the result is false, it calls itself.

A Loopless Language

Scheme doesn’t have loops: to perform an operation on each item in a collection, we instead use recursion.

Let’s stop for a moment and focus on our experiences with programming languages like Swift, Kotlin, Java, JavaScript, .Net, Ruby, or Python. How often do you loop over something in these languages—counting all your maps, filters, and applys? Constantly.

How often do you use recursion in these languages? I have used it three times in a professional capacity. All of those occasions had to do with database queries that traversed a tree whose depth was unknown when the function was called.

For this reason, I conceptualized looping as a subset of collections operations that met two conditions:

a) a one-dimensional collection
b) whose length we knew when the function got called.

For collections where one or both of these things were not true, we’d need a more general-purpose collection operation. That’s what we get with recursion. We just don’t run into it often because the subset of collections operations that can be handled with looping are the majority of cases that we run into professionally.

SICP changed my perspective on this, because it demonstrates how either recursion or iteration might be used to handle the same collection operation.

Here is how we might sum a collection of numbers recursively in Scheme:


(define (sum-n n)
(if (= n 0)
0
(+ n (sum-n ( n 1)))
)
)
(sum-n 5) ; 15

view raw

recursive.scm

hosted with ❤ by GitHub

This would work, but as n increases, it takes longer—and, more troublingly, uses more memory. It constructs a list from beginning to end of all the addends and, when that list is complete, sums them together.

Compare that to this version. Instead of the function calling itself directly, the function contains an inner, iterative function that receives the number yet to be summed and the sum of the numbers processed so far:


(define (sum-i n)
(define (iter n sofar)
(if (= n 0)
sofar
(+ (iter ( n 1) (+ n sofar)))))
(iter n 0)
)
(sum-i 5) ; 15

view raw

iterative.scm

hosted with ❤ by GitHub

This iterative implementation passes the state of the progressing operation to each successive step. The call therefore doesn’t need to keep the individual items that have already been processed, reducing memory load.

For any collection operation in Scheme, you can convert between an iterative and a recursive approach. In fact, SICP dedicates a legion of exercises to doing exactly this.

This feels surprising because most programming languages I’ve used—including those with functional capacity, that allowed me to perform collections operations on streams—treat iteration and recursion as fundamentally different things, one of which we use all the time and the other of which we use almost never.

This isn’t because they’re fundamentally different operations. It’s those languages have introduced syntactic affordances for collection operations, and those affordances have implicitly conveyed a perspective about the cases in which iteration should be used, and recursion should be used.

There’s also something to be explored in this passage of state. We’ll come back to this in a moment.

Screen Shot 2019-06-09 at 7.53.15 AM

Higher Order Procedures

Scheme allows you to pass functions around as arguments to other functions. This allows us to abstract away common behavior, like the accumulation of the results of an operation on a series of numbers:


(define (sum term a next b)
(accumulate + 0 term a next b))
(define (product term a next b)
(accumulate * 1 term a next b))
(define (factorial n)
(product identity 1 inc n))
(define (accumulate combiner base-case term a next b)
(define (iter a result)
(if (> a b)
result
(iter (next a) (combiner (term a) result))))
(iter a base-case))
(define (identity x) x)
(define (inc x) (+ 1 x))
(accumulate * 1 identity 1 inc 5) ; 120
(factorial 5) ; 120

This allows us to incrementally build functions with other functions. We see this ability represented with lambdas and higher order procedures in Scheme. We sometimes see this represented with closures and decorators in other languages.


; 1.41
(define (double f)
(lambda (x) (f(f x))))
(define (add-2-to x) (+ 2 x))
(define (inc x) (+ 1 x))
(define (square x) (* x x))
((double add-2-to) 3)
(((double (double double)) inc) 5) ;—> 21 WHY IS THIS 16 TIMES AND NOT 8 TIMES?
; 1.42
(define (compose f g)
(lambda (x) (f (g x))))
((compose add-2-to inc) 4) ;———–> 7

SICP drives this point by providing exercises in which we repeatedly compose more and more functions together. This allows each function to retain a name and a scope, and it allows programmers to assemble procedures that share patterns with the same parts. We see that in this example, where we pass an improvement function to an iterative improver that allows us to progressively guess at both square roots and cube roots.


(define (square x) (* x x))
(define (qb x) (* x x x))
(define (average x y)
(/ (+ x y) 2))
(define (sq-improve guess for-value)
(average guess (/ for-value guess)))
(define (qb-improve guess for-value)
(/ (+(/ for-value (square guess)) (* 2 guess)) 3))
(define (root x improve-function)
(define (good-enough? guess previous-guess)
(< (abs (/ ( guess previous-guess) guess)) 0.000001))
(define (root-iter previous-guess)
(let ((guess (improve-function previous-guess x)))
(if (good-enough? guess previous-guess)
guess
(root-iter guess)
)
))
(root-iter 1.0)
)
(define (qbrt x)
(root x qb-improve)
)
(define (sqrt x)
(root x sq-improve)
)
(qb (qbrt 27))
(square (sqrt 9))

By passing around functions, we can also repeat the same function on the same data a number of times, as we do in this case for smoothing out the results of a square:


; 1.43
(define (repeated operation n)
(define (operation-iter iterator composed-function)
(if (= iterator 1)
(lambda(x) (composed-function x))
(operation-iter ( iterator 1) (compose operation composed-function))
))
(operation-iter n operation)
)
((repeated square 2) 5) ;———–> 625
; 1.44
(define (smooth f)
(lambda (x dx) (/ (+ (f ( x dx)) (f x) (f (+ x dx))) 3))
)
((smooth square) 0 4) ;———–> 10 & 3/4
(repeated ((smooth square) 0 4) 5)

State and Modularity as Tools for the Engineering Process

The passage of functions and the passage of state together allow us to perform an operation, perform a test on the result, and keep performing that same operation (or a different operation) if the test fails. We can also save off these states, if we like, to log the progress of a procedure, or to pause a procedure, or back it up.

Or, if it’s a long running process with the potential to fail, we could save off the progress at the point of failure, debug the failure, and restart the process from the failure point rather than starting the whole thing over. These types of operations frequently come up on the wish lists of machine learning engineers who use the scikit-learn library. They’ll start to train a model, run out of memory, and lose all the work done so far. This especially sucks if it was some kind of deep learning model that takes four days to run and failed three days in.

How do these models converge on their answers? The process is called gradient descent. Would it be possible to implement this in such a way that state is passed from one step to the next and saved off in the event of a failure?

More broadly, how do the choices that we make in programming language design affect the way its programmers are encouraged, or limited, to manage operations and state?

Now that an exploration of the first chapter of SICP has raised this question, I’d like to think more about it.

In the meantime, we’ll move on to the second chapter of the book, which deals with the representation of data structures.

If you liked this post, you might also like:

This ongoing series tracking my progress through Crafting Interpreters by Bob Nystrom

This piece that dives into the scipy CSR sparse matrix

This applied data science case study

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.