I recently took Dave Beazley’s week-long course on The Structure and Interpretation of Computer Programs. In this series, I’ll share memorable insights from the course (here’s where you can see all the posts so far).
In this post, we’ll look at some of the concepts from the second chapter of the book.
Representing Data Structures
In the last post about SICP, where we talked about iteration and recursion, all of the collections in our examples comprised a series of integers. Scheme allows us to group things together into collections as pairs, like this:
Lists are represented as pairs of pairs, with
car pointing to an item in the list and
cdr pointing to another pair, whose
car points to another item and whose
cdr points to yet another pair, and so on. We can make such lists like so:
(list 1 2 3 4)
'(1 2 3 4)
We can represent any arbitrary data in scheme as a list as long as it fits this format.
Here’s how we might represent points and segments in an (x, y) plane using Scheme’s pairs. If you’ve done any programming involving data with spatial components, the below implementation is unlikely to surprise you.
Here’s where things get interesting. Though this implementation with pairs is the conventional1 way to do things, It’s not the only way.
In exercise 2.4, we re-implement
cons to manage the data with lambdas.
2.4-cons returns a procedure that accepts the two arguments to the function. When we pass that procedure into
2.4-cdr, each of those functions calls the procedure passed to it on another procedure that takes the two arguments and does something with them:
2.4-car returns the first one, and
2.4-cdr returns the second. Nary a pair present here.
What if we tried storing the data as an integer rather than as a pair? In exercise 2.5, we re-implement the pairs functions again, this time persisting
(a, b) as the result of 2a3b.
2.5-cons accepts two arguments a and b, and it returns 2a3b.
2.5-car takes in that result and recursively divides it by 2 until the remainder of that operation is no longer zero, and it returns the number of times it did the division.
2.5-car does that operation with 3 rather than 2.
Does that seem like it should work? Can you come up with a number that can break it?
This may seem like a strange way to store data, but we do sometimes smash together disparate pieces of information for storage. Example: concatenated keys in relational databases.
So what is data?
The above exercises demonstrate how data gets defined by its behavior, not its content or storage or format.
Maybe that’s true for fancy compound data types, but it can’t be the case for primitives…right?
Here, we generate a function that we can call recursively to implement individual integers. Would a programming language do this? Probably not. That’s not really the point. The point is to reveal the role of choice in designing a programming language.
The implementation of your favorite programming language implicitly conveys a perspective on how a computer should get a certain job done. In designing anything, it is critical that we recognize the roles of choice and perspective in determining where the designed things end up.
Iterating Over Collections
Now that we have a compound data structure, we can revisit iteration and recursion from the post on Chapter 1.
Take a look at the following code that reverses a list in Scheme, and note what we’re doing:
We iterate over it, adding each element in the list to a new empty one. The procedure effectively turns our list inside out. We can apply it recursively to deep-reverse lists of lists:
You can think of this one as iterating in as many dimensions as you have represented with nested lists.
So, when it matters (or might matter) to keep our list in a specified order, we have a reason to choose a recursive approach rather than an iterative one:
This puts us back at the potential memory issue we discussed in the last post, and we’re going to return to this in a couple of chapters.
That said, maintaining a few general-use list operations like these allows us to compose them together to manage more complex list manipulations. On this topic, section 2.2.3 in SICP is worth a read. It shows two functions that perform a similar set of steps, but do so in different orders. It contrasts these approaches with the one that a signals programmer might take—separate the steps for iterating over a list, filtering it, applying a modifier to the elements—then perform those steps in order on each unique list operation.
Here, we extract a method
accumulate to further unify common behavior from our list operations.
We’ll talk further about managing common behavior across different implementations in the next post, which will continue with some more examples from Chapter 2 of SICP.
1By the way, when programmers say ‘obvious,’ a more accurate term for what they’re describing is usually ‘conventional.’ It’s the way we’ve done things before, and the way we currently assume they should be done. Conventions aren’t intuitive, as evidenced by the fact that the conventional approach to do things changes over time—which wouldn’t happen if they were intuitive solutions like sleeping when you’re tired, eating when you’re hungry, or f**ing when you’re horny. Referring to something ‘conventional’ as ‘obvious’ conflates someone having heard of something before with that person possessing basic intuition, which is why it’s possible to deflate (or insult) other programmers by calling things ‘obvious.’ This is worth thinking about when you find yourself tempted to use the word ‘obvious.’
If you liked this post, you might also like:
This series on Crafting Interpreters (extremely related content)
This series on hosting a conference (unrelated, but a recent series nonetheless)
Live Coding! (with show notes, so you can skip to the good parts instead of watch me type for 3 hours)