I recently took Dave Beazley’s week-long course on *The Structure and Interpretation of Computer Programs. *In this series, I’ll share memorable insights from the course (here’s where you can see all the posts so far).

In this post, we’ll look at some of the concepts from the second chapter of the book.

### Representing Data Structures

In the last post about SICP, where we talked about iteration and recursion, all of the collections in our examples comprised a series of integers. Scheme allows us to group things together into collections as pairs, like this:

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.

Learn more about bidirectional Unicode characters

> (define pair (cons 2 3)) | |

> pair | |

'(2 . 3) | |

> (car pair) | |

2 | |

> (cdr pair) | |

3 |

Lists are represented as pairs of pairs, with `car`

pointing to an item in the list and `cdr`

pointing to another pair, whose `car`

points to another item and whose `cdr`

points to yet another pair, and so on. We can make such lists like so:

`(list 1 2 3 4)`

or even:

`'(1 2 3 4)`

We can represent any arbitrary data in scheme as a list as long as it fits this format.

`'(Hello world!)`

Here’s how we might represent points and segments in an (x, y) plane using Scheme’s pairs. If you’ve done any programming involving data with spatial components, the below implementation is unlikely to surprise you.

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.

Learn more about bidirectional Unicode characters

; 2.2 Representing data with procedures | |

(define (make-point x y)(cons x y)) | |

(define (x-point point) (car point)) | |

(define (y-point point) (cdr point)) | |

(define a-point (make-point 0 0)) | |

(define b-point (make-point 2 2)) | |

(x-point a-point) | |

(y-point a-point) | |

(define (make-segment start-point end-point)(cons start-point end-point)) | |

(define (start-point segment) (car segment)) | |

(define (end-point segment) (cdr segment)) | |

(define segment (make-segment a-point b-point)) | |

(define (mid-point segment) | |

(make-point (/ (+ (x-point (start-point segment)) (x-point (end-point segment))) 2) | |

(/ (+ (y-point (start-point segment)) (y-point (end-point segment))) 2))) | |

(mid-point segment) ;–> (1 . 1) |

Here’s where things get interesting. Though this implementation with pairs is the *conventional*^{1} way to do things, It’s not the *only* way.

In exercise 2.4, we re-implement `car`

, `cdr`

, and `cons`

to manage the data with lambdas.

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.

Learn more about bidirectional Unicode characters

; 2.4: Representing data AS procedures | |

(define (2.4-cons x y) | |

(lambda (m) (m x y))) | |

(define (2.4-car z) | |

(z (lambda (p q) p))) | |

(define (2.4-cdr z) | |

(z (lambda (p q) q))) | |

(2.4-car (2.4-cons 2 3)) ;–> 2 | |

(2.4-cdr (2.4-cons 2 3)) ;–> 3 |

`2.4-cons`

returns a procedure that accepts the two arguments to the function. When we pass that procedure into `2.4-car`

or `2.4-cdr`

, each of those functions calls the procedure passed to it on *another* procedure that takes the two arguments and does something with them: `2.4-car`

returns the first one, and `2.4-cdr`

returns the second. Nary a pair present here.

What if we tried storing the data as an integer rather than as a pair? In exercise 2.5, we re-implement the pairs functions again, this time persisting `(a, b)`

as the result of 2^{a}3^{b}.

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.

Learn more about bidirectional Unicode characters

; 2.5 | |

(define expt (** a b)) | |

(define (2.5-cons a b) | |

(* (expt 2 a) (expt 3 b)) | |

) | |

(define (2.5-car product) | |

(if (= 0 (remainder product 2)) | |

(+ 1 (2.5-car (/ product 2))) | |

0) | |

) | |

(define (2.5-cdr product) | |

(if (= 0 (remainder product 3)) | |

(+ 1 (2.5-cdr (/ product 3))) | |

0) | |

) | |

(define example (2.5-cons 5 7)) ; 69984 | |

(2.5-car example) ; 5 | |

(2.5-cdr example) ; 7 |

`2.5-cons`

accepts two arguments a and b, and it returns 2^{a}3^{b}. `2.5-car`

takes in that result and recursively divides it by 2 until the remainder of that operation is no longer zero, and it returns the number of times it did the division. `2.5-car`

does that operation with 3 rather than 2.

Does that seem like it should work? Can you come up with a number that can break it?

This may seem like a strange way to store data, but we do sometimes smash together disparate pieces of information for storage. Example: concatenated keys in relational databases.

### So what *is* data?

The above exercises demonstrate how data gets defined by its behavior, not its content or storage or format.

Maybe that’s true for fancy compound data types, but it can’t be the case for primitives…right?

I dunno…

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.

Learn more about bidirectional Unicode characters

; 2.6: Representing a primitive (numbers) as procedures | |

(define zero (lambda (f) (lambda (x) x))) | |

(define (add-1 n) | |

(lambda (f) (lambda (x) (f ((n f) x))))) | |

(define (inc x) (+ 1 x)) | |

(define one (lambda (f x) (f x))) | |

(define two (lambda (f x) (f (f x)))) | |

(define three (lambda (f x) (f (f (f x))))) |

Here, we generate a function that we can call recursively to implement individual integers. Would a programming language do this? Probably not. That’s not really the point. The point is to reveal the role of *choice* in designing a programming language.

The implementation of your favorite programming language implicitly conveys a *perspective* on how a computer should get a certain job done. In designing anything, it is critical that we recognize the roles of *choice* and *perspective* in determining where the designed things end up.

### Iterating Over Collections

Now that we have a compound data structure, we can revisit iteration and recursion from the post on Chapter 1.

Take a look at the following code that reverses a list in Scheme, and note what we’re doing:

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.

Learn more about bidirectional Unicode characters

; 2.18 | |

(define (reverse list) | |

(define (iter-list reversed-list unprocessed-remaining) | |

(if (null? unprocessed-remaining) | |

reversed-list | |

(iter-list (cons(car unprocessed-remaining) reversed-list) (cdr unprocessed-remaining)) | |

) | |

) | |

(iter-list null list)) | |

(reverse (list 3 4 5 6)) ;—> 6 4 5 3 |

We iterate over it, adding each element in the list to a new empty one. The procedure effectively turns our list inside out. We can apply it recursively to deep-reverse lists of lists:

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.

Learn more about bidirectional Unicode characters

; 2.27 | |

(define (deep-reverse list) | |

(define (iter-list reversed-list unprocessed-remaining) | |

(if (null? unprocessed-remaining) | |

reversed-list | |

(iter-list (cons (if (list? (car unprocessed-remaining)) | |

(deep-reverse (car unprocessed-remaining)) | |

(car unprocessed-remaining)) | |

reversed-list) (cdr unprocessed-remaining)) | |

)) | |

(iter-list null list) | |

) | |

(deep-reverse (list 3 4 (list 5 6))) ;—>((6 5) 4 3) | |

(deep-reverse (list 3 4 (list 5 6 (list 7 8 9)))) ;—>(((9 8 7) 6 5) 4 3) | |

(deep-reverse (list 3 4 (list 5 (list 6 (list 7))))) ;—>((((7) 6) 5) 4 3) |

You can think of this one as iterating in as many dimensions as you have represented with nested lists.

So, when it matters (or might matter) to keep our list in a specified order, we have a reason to choose a recursive approach rather than an iterative one:

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.

Learn more about bidirectional Unicode characters

; Collection Operations | |

(define (map proc items) | |

(if (null? items) | |

null | |

(cons (proc (car items)) | |

(map proc (cdr items))))) | |

(define (filter condition items) | |

(if (null? items) | |

null | |

(if (proc (car items)) | |

(cons (car items) (filter condition (cdr items))) | |

(filter condition (cdr items))))) | |

(define (append seq1 seq2) | |

(if (null? seq1) | |

seq2 | |

(cons (car seq1) (append (cdr seq1) seq2)))) |

This puts us back at the potential memory issue we discussed in the last post, and we’re going to return to this in a couple of chapters.

That said, maintaining a few general-use list operations like these allows us to compose them together to manage more complex list manipulations. On this topic, section 2.2.3 in SICP is worth a read. It shows two functions that perform a similar set of steps, but do so in different orders. It contrasts these approaches with the one that a signals programmer might take—separate the steps for iterating over a list, filtering it, applying a modifier to the elements—then perform those steps in order on each unique list operation.

Here, we extract a method `accumulate`

to further unify common behavior from our list operations.

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.

Learn more about bidirectional Unicode characters

; 2.33 | |

(define (accumulate op initial items) ; accumulate also referred to as "reduce" | |

(if (null? items) | |

initial | |

(op (car items) (accumulate op initial (cdr items))))) | |

; Defining collection operations in terms of accumulate | |

(define (append seq1 seq2) | |

(accumulate cons seq2 seq1)) | |

(define (map proc items) | |

(accumulate (lambda (x y) (cons (proc x) y)) '() items)) | |

(define (filter proc items) | |

(accumulate (lambda (x y) (if (proc x) (cons x y) y)) '() items)) |

We’ll talk further about managing common behavior across different implementations in the next post, which will continue with some more examples from Chapter 2 of SICP.

^{1}By the way, when programmers say ‘obvious,’ a more accurate term for what they’re describing is usually ‘conventional.’ It’s the way we’ve done things before, and the way we currently assume they should be done. Conventions aren’t intuitive, as evidenced by the fact that the conventional approach to do things changes over time—which wouldn’t happen if they were intuitive solutions like sleeping when you’re tired, eating when you’re hungry, or f**ing when you’re horny. Referring to something ‘conventional’ as ‘obvious’ conflates someone having heard of something before with that person possessing basic intuition, which is why it’s possible to deflate (or insult) other programmers by calling things ‘obvious.’ This is worth thinking about when you find yourself tempted to use the word ‘obvious.’

### If you liked this post, you might also like:

This series on *Crafting Interpreters* (extremely related content)

This series on hosting a conference (unrelated, but a recent series nonetheless)

Live Coding! (with show notes, so you can skip to the good parts instead of watch me type for 3 hours)