SICP 3: Representing Data Structures

Reading Time: 6 minutes

Screen Shot 2019-11-05 at 10.35.17 AM

I recently took Dave Beazley’s week-long course on The Structure and Interpretation of Computer ProgramsIn this series, I’ll share memorable insights from the course (here’s where you can see all the posts so far).

In this post, we’ll look at some of the concepts from the second chapter of the book.

Representing Data Structures

In the last post about SICP, where we talked about iteration and recursion, all of the collections in our examples comprised a series of integers. Scheme allows us to group things together into collections as pairs, like this:


> (define pair (cons 2 3))
> pair
'(2 . 3)
> (car pair)
2
> (cdr pair)
3

view raw

data.scm

hosted with ❤ by GitHub

Lists are represented as pairs of pairs, with car pointing to an item in the list and cdr pointing to another pair, whose car points to another item and whose cdr points to yet another pair, and so on. We can make such lists like so:

(list 1 2 3 4)

or even:

'(1 2 3 4)

We can represent any arbitrary data in scheme as a list as long as it fits this format.

'(Hello world!)

Here’s how we might represent points and segments in an (x, y) plane using Scheme’s pairs. If you’ve done any programming involving data with spatial components, the below implementation is unlikely to surprise you.


; 2.2 Representing data with procedures
(define (make-point x y)(cons x y))
(define (x-point point) (car point))
(define (y-point point) (cdr point))
(define a-point (make-point 0 0))
(define b-point (make-point 2 2))
(x-point a-point)
(y-point a-point)
(define (make-segment start-point end-point)(cons start-point end-point))
(define (start-point segment) (car segment))
(define (end-point segment) (cdr segment))
(define segment (make-segment a-point b-point))
(define (mid-point segment)
(make-point (/ (+ (x-point (start-point segment)) (x-point (end-point segment))) 2)
(/ (+ (y-point (start-point segment)) (y-point (end-point segment))) 2)))
(mid-point segment) ;–> (1 . 1)

Here’s where things get interesting. Though this implementation with pairs is the conventional1 way to do things, It’s not the only way.

In exercise 2.4, we re-implement car, cdr, and cons to manage the data with lambdas.


; 2.4: Representing data AS procedures
(define (2.4-cons x y)
(lambda (m) (m x y)))
(define (2.4-car z)
(z (lambda (p q) p)))
(define (2.4-cdr z)
(z (lambda (p q) q)))
(2.4-car (2.4-cons 2 3)) ;–> 2
(2.4-cdr (2.4-cons 2 3)) ;–> 3

2.4-cons returns a procedure that accepts the two arguments to the function. When we pass that procedure into 2.4-car or 2.4-cdr, each of those functions calls the procedure passed to it on another procedure that takes the two arguments and does something with them:  2.4-car returns the first one, and 2.4-cdr returns the second. Nary a pair present here.

What if we tried storing the data as an integer rather than as a pair? In exercise 2.5, we re-implement the pairs functions again, this time persisting (a, b) as the result of 2a3b.


; 2.5
(define expt (** a b))
(define (2.5-cons a b)
(* (expt 2 a) (expt 3 b))
)
(define (2.5-car product)
(if (= 0 (remainder product 2))
(+ 1 (2.5-car (/ product 2)))
0)
)
(define (2.5-cdr product)
(if (= 0 (remainder product 3))
(+ 1 (2.5-cdr (/ product 3)))
0)
)
(define example (2.5-cons 5 7)) ; 69984
(2.5-car example) ; 5
(2.5-cdr example) ; 7

2.5-cons accepts two arguments a and b, and it returns 2a3b.  2.5-car takes in that result and recursively divides it by 2 until the remainder of that operation is no longer zero, and it returns the number of times it did the division. 2.5-car does that operation with 3 rather than 2.

Does that seem like it should work? Can you come up with a number that can break it?

This may seem like a strange way to store data, but we do sometimes smash together disparate pieces of information for storage. Example: concatenated keys in relational databases.

So what is data?

The above exercises demonstrate how data gets defined by its behavior, not its content or storage or format.

Maybe that’s true for fancy compound data types, but it can’t be the case for primitives…right?

I dunno…


; 2.6: Representing a primitive (numbers) as procedures
(define zero (lambda (f) (lambda (x) x)))
(define (add-1 n)
(lambda (f) (lambda (x) (f ((n f) x)))))
(define (inc x) (+ 1 x))
(define one (lambda (f x) (f x)))
(define two (lambda (f x) (f (f x))))
(define three (lambda (f x) (f (f (f x)))))

Here, we generate a function that we can call recursively to implement individual integers. Would a programming language do this? Probably not. That’s not really the point. The point is to reveal the role of choice in designing a programming language.

The implementation of your favorite programming language implicitly conveys a perspective on how a computer should get a certain job done. In designing anything, it is critical that we recognize the roles of choice and perspective in determining where the designed things end up.

perspective by danielle margutti

Iterating Over Collections

Now that we have a compound data structure, we can revisit iteration and recursion from the post on Chapter 1.

Take a look at the following code that reverses a list in Scheme, and note what we’re doing:


; 2.18
(define (reverse list)
(define (iter-list reversed-list unprocessed-remaining)
(if (null? unprocessed-remaining)
reversed-list
(iter-list (cons(car unprocessed-remaining) reversed-list) (cdr unprocessed-remaining))
)
)
(iter-list null list))
(reverse (list 3 4 5 6)) ;—> 6 4 5 3

We iterate over it, adding each element in the list to a new empty one. The procedure effectively turns our list inside out. We can apply it recursively to deep-reverse lists of lists:


; 2.27
(define (deep-reverse list)
(define (iter-list reversed-list unprocessed-remaining)
(if (null? unprocessed-remaining)
reversed-list
(iter-list (cons (if (list? (car unprocessed-remaining))
(deep-reverse (car unprocessed-remaining))
(car unprocessed-remaining))
reversed-list) (cdr unprocessed-remaining))
))
(iter-list null list)
)
(deep-reverse (list 3 4 (list 5 6))) ;—>((6 5) 4 3)
(deep-reverse (list 3 4 (list 5 6 (list 7 8 9)))) ;—>(((9 8 7) 6 5) 4 3)
(deep-reverse (list 3 4 (list 5 (list 6 (list 7))))) ;—>((((7) 6) 5) 4 3)

You can think of this one as iterating in as many dimensions as you have represented with nested lists.

So, when it matters (or might matter) to keep our list in a specified order, we have a reason to choose a recursive approach rather than an iterative one:


; Collection Operations
(define (map proc items)
(if (null? items)
null
(cons (proc (car items))
(map proc (cdr items)))))
(define (filter condition items)
(if (null? items)
null
(if (proc (car items))
(cons (car items) (filter condition (cdr items)))
(filter condition (cdr items)))))
(define (append seq1 seq2)
(if (null? seq1)
seq2
(cons (car seq1) (append (cdr seq1) seq2))))

This puts us back at the potential memory issue we discussed in the last post, and we’re going to return to this in a couple of chapters.

That said, maintaining a few general-use list operations like these allows us to compose them together to manage more complex list manipulations. On this topic, section 2.2.3 in SICP is worth a read. It shows two functions that perform a similar set of steps, but do so in different orders. It contrasts these approaches with the one that a signals programmer might take—separate the steps for iterating over a list, filtering it, applying a modifier to the elements—then perform those steps in order on each unique list operation.

Here, we extract a method accumulate to further unify common behavior from our list operations.


; 2.33
(define (accumulate op initial items) ; accumulate also referred to as "reduce"
(if (null? items)
initial
(op (car items) (accumulate op initial (cdr items)))))
; Defining collection operations in terms of accumulate
(define (append seq1 seq2)
(accumulate cons seq2 seq1))
(define (map proc items)
(accumulate (lambda (x y) (cons (proc x) y)) '() items))
(define (filter proc items)
(accumulate (lambda (x y) (if (proc x) (cons x y) y)) '() items))

We’ll talk further about managing common behavior across different implementations in the next post, which will continue with some more examples from Chapter 2 of SICP.

1By the way, when programmers say ‘obvious,’ a more accurate term for what they’re describing is usually ‘conventional.’ It’s the way we’ve done things before, and the way we currently assume they should be done. Conventions aren’t intuitive, as evidenced by the fact that the conventional approach to do things changes over time—which wouldn’t happen if they were intuitive solutions like sleeping when you’re tired, eating when you’re hungry, or f**ing when you’re horny. Referring to something ‘conventional’ as ‘obvious’ conflates someone having heard of something before with that person possessing basic intuition, which is why it’s possible to deflate (or insult) other programmers by calling things ‘obvious.’ This is worth thinking about when you find yourself tempted to use the word ‘obvious.’

If you liked this post, you might also like:

This series on Crafting Interpreters (extremely related content)

This series on hosting a conference (unrelated, but a recent series nonetheless)

Live Coding! (with show notes, so you can skip to the good parts instead of watch me type for 3 hours)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.