PoSD 2: What causes insidious bugs?

Reading Time: 7 minutes

I recently read John Ousterhout‘s book, Philosophy of Software Design (PoSD). This blog post includes my commentary on some parts that stuck with me.

In the last post we talked about eschewing complexity, handling special cases, and example choices in software literature.

In this post, we do a little more detective work. The book never directly discusses debugging. I see this pattern in most fundamental software development literature, and I think I understand why. We programmers think of bugs as exceptional, unique detours from some normal workflow in which we’re shipping functional code.

Screen Shot 2019-12-24 at 10.23.18 AM

And in so doing, we fundamentally misunderstand the bug.

Debugging is not the exception to my normal day—nor, I suspect, yours. Rather, I spend most of my time reading error messages or combing over code to determine why something isn’t working. Last week, I spent three hours with three other programmers doing only this.

We take this practice that occupies the majority of our work time, and we make it harder by treating it as an exception. We don’t have a unified praxis or pedagogy for debugging, and we don’t think we need one:

Screen Shot 2019-12-24 at 9.48.44 AM

Instead, we each build our own anecdotal libraries out of the bugs we have individually seen. And I think that’s such a waste.

So I’m going to start writing more about debugging. I’d like to pay homage to the late Grace Hopper for coining the term ‘bug’, so I’ll make all the posts on this topic available under the category ‘debugging‘ and also under the tag ‘entomology,’ which is how biologists refer to the study of bugs.1

I want to explore what Philosophy of Software Design can teach us about building a framework for understanding, preventing, and solving bugs.

Let’s start with Chapter 14 of PoSD: Choosing names.

Programmers tend to agree that how we name things is important, and also difficult. But why?Why are good names both so critical and so damn hard?

Check out this excerpt from the book. Ousterhout writes:

The most challenging bug I (John Ousterhout) ever fixed came about because of a poor name choice.

The file system code used the variable name block for two different purposes. In some situations, block referred to a physical block number on disk; in other situations, block referred to a logical block number within a file. Unfortunately, at one point in the code there was a blockvariable containing a logical block number, but it was accidentally used in a context where a physical block number was needed; as a result, an unrelated block on disk got overwritten with zeroes.

While tracking down the bug, several people, including myself, read over the faulty code, but we never noticed the problem. When we saw the variable block used as a physical block number, we reflexively assumed (emphasis by me, Chelsea Troy) that it really held a physical block number.

On the Origin of Bugs

I’ve tracked down a bug or two. I haven’t recorded them all rigorously enough to make a scientific case, but I have noticed something: the longer I spend chasing it, the more likely it becomes that the fix is a one-liner.

Chart: Time Spent on Bugs vs. Probability of One Liner

By the time I’ve sunk about four hours, it’s almost guaranteed to be a one-liner.

I used to think this was some kind of psychological bias, the way it only ever seems to rain when you didn’t pack an umbrella. But now I see why this happens.

The reason: insidious bugs come from inaccurate assumptions. This is why I bolded the text “reflexively assumed” in the example above.

But it’s not just that insidious bugs come from inaccurate assumptions. It’s deeper than that: insidiousness as a characteristic of bugs comes from inaccurate assumptions. We’re looking in the code when the problem is rooted in our understanding. It takes an awfully long time to find something when we’re looking in the wrong place.

When we name things, we’re not just encoding our understanding; we’re creating our understanding. This is true to a freaky degree. I talked about that more in this other post. How we name things shapes how we understand them. And also, how we name things is influenced by how we understand them.

It’s hard for us to detect when our assumptions about a system are wrong because it’s hard for us to detect when we’re making assumptions at all. Assumptions, by definition, describe things we’re taking for granted. They include all the details into which we are not putting thought. I talked more about assumption detection in this piece on refactoring. I believe that improving our ability to detect and question our assumptions plays a critical role in solving existing insidious bugs and preventing future ones.

Encoding assumptions doesn’t just happen in code, either.

Chapters 12 and 13 of PosD discuss code comments. Programmers sometimes advise to not write comments in favor of focusing on the legibility of the code itself. In my experience, this works right up until we relinquish full control over the APIs we use in our project—which happens, by the way, the moment we include even a single library dependency. Why? Because another set of programmers, with a different perspective from ours, frequently make different assumptions than we do.

If I could go back in time and inject the lesson of this paragraph into every code base I’ve ever seen, I’d have 30% fewer gray hairs:

Precision is most useful when commenting variable declarations such as class instance variables, method arguments, and return values. The name and type in a variable declaration are typically not very precise. Comments can fill in missing details such as:

  • What are the units for this variable?
  • Are the boundary conditions inclusive or exclusive?
  • If a null value is permitted, what does it imply?
  • If a variable refers to a resource that must eventually be freed or closed, who is responsible for freeing it or closing it?
  • Are there certain properties that are always true for the variable (invariants) such as”this list always contains at least one entry”?

These information points are all examples of high risk assumptions: things we know to be one way, that some other programmer is going to think they know to be some other way, or is going to not think about at all. These high risk assumptions make great hiding places for bugs.

So how, exactly, do we find and prevent these insidious little bugs? Philosophy of Software Design doesn’t directly say. However, it provides a robust framework for solving a different kind of problem that we can repurpose for finding insidious bugs. We’ll talk about that in the next post (I don’t mean to be a tease, but I try to keep these posts under 1200 words to respect your time and energy, and we’re at 1080 right now :).

1Anecdotally, my mother was an entomologist (the normal biology kind) and my father an engineer, so this series seems like one I’m fated by heritage to write.

If you liked this piece, you might also like:

The Leveling Up Series (a perpetual favorite for gearheads like yourself)

The Books Category, to which I’m slowly adding all the blog posts where I reflected on a book

The History of API Design Series (prepare to have your sacred cows threatened)

5 comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.