I’m writing this blog series about what software engineers can learn from spaceflight. You can check out all the posts in the space series (as well as some other space-related code posts) right here.
We talked about the roadmap for plants in space and compared it to software engineering roadmaps.
Today we will zoom in on the science happening in this red rectangle…
…and we’ll talk about what it can teach us about debugging.
As usual, I’ll start this piece off with a question that seems, at first, completely unrelated to debugging.
How do we know that plants are in distress?
If it looks wilted and sad, we assume it needs water. If water doesn’t fix it, we try moving it to the window. If that doesn’t work, and we’re feeling brave, we repot it. Not feeling brave? We panic and ask Twitter/Facebook what’s wrong.
Does that flow remind you of anything we’ve discussed before?
The last time you saw that image it was in this piece about debugging strategy. I call this The Standard Strategy, and here’s how it breaks:
As it turn out, some plants get wilted and brown because they need more water. Also, some plants get wilted and brown when they have too much water.
The less we understand what’s going on with our plant, the less effective The Standard Strategy is for nursing it back to health. That’s because this strategy relies on our guesses as to what might be wrong, and it doesn’t work if our guesses are inaccurate.
Most plant owners have accidentally overwatered a plant to death at some point. The blood of those plants is on The Standard Strategy’s hands. (Or it would be, if plants had blood, or a figurative concept had hands).
Amateur plant nurturing strategies aren’t going to cut it 250 miles above Earth’s surface.
First of all, plants evolved for Earth’s environment with gravity, greenhouse effects, soil, and other things space doesn’t have. So it’s harder for them to survive there than here.
Second of all, our goal as plant parents on Earth is pretty simple: the plant should have as little stress as possible. Not so in space—at least not with fruiting plants. Since capacity is so limited on the space station or a spacecraft, each plant needs to earn its keep by producing as much edible matter as possible. Unstressed plants get leafy, and produce some fruit, but not a lot. As plants get more stressed, they direct their energy away from leaves and toward fruit, which is what astronauts need. Space botanists don’t want to minimize stress. They want to hit a specific amount of stress, which is a lot harder to do.
To accomplish that, we have to be really good at understanding a plant’s needs. So the VEGGIE team is conducting experiments to look for leading indicators that a plant needs something.
Here’s how it works:
- Grow some plants
- Deprive them of something, like water (introduce a confirmed need)
- Take pictures of them and observe how they look in different ranges of the electromagnetic spectrum.
- Look for correlations between the confirmed need and the wavelengths they’re absorbing or reflecting.
We do an ersatz version of this in our kitchen windows. Does the plant look green? It’s reflecting lots of waves at the frequency that our eyeballs perceive as “green.” This usually means it’s healthy. If it starts to reflect more waves at the frequency that our eyeballs perceive as “yellow,” we get worried.
There are lots of wavelengths besides green and yellow, though. In addition to the colors we can see, there are wavelengths on either side that we cannot see. But they’re still there, and objects still absorb or reflect them:
We can use cameras to pick those up, and then we can convert the data about those wavelengths into images whose colors we can see to observe them.
We already do this for plants in production contexts.
For example, I have shown you the NASA Landsat data processing pipeline. NASA Landsat satellites orbit the Earth and collect data about its surface in various ranges of electromagnetic wavelengths (called “bands”). The app allows researchers to request that data and, eventually, process it all kinds of ways—including converting specific bands to colors that humans can see. The Floating Forest project takes the infrared band, which is highly reflected by kelp on the Earth’s surface but is outside the visible spectrum, and makes it bright green so we can see where the kelp is!
VEGGIE is determining whether some of these non-visible wavelengths contain information about how a plant is doing that we don’t know about yet. The experiments us multispectral imaging, which captures 3 to 15 bands with equipment that costs about $200, and an amped-up version called hyperspectral imaging that measures hundreds of different bands with a camera that runs a cool hundred grand.
They’re looking for early and accurate signs that a plant is under stress. They’re taking the “eyeballing green and yellow” trick and turning it up to 11.
Chelsea, what does any of this have to do with debugging?
Glad you asked. We started this post by asking a question: How do we know that plants are in distress?
We don’t; we’re software engineers, not botanists. So we stick to aloe vera since we heard it’s hard to kill, hope for the best, and focus on code. Ah, yes, code. That’s a thing we know about.
How do we know that code is in distress?
What makes code maintainable, or illegible, or buggy? How do we know whether it’s going to work or not, or be resilient to change?
We think we know. We have ideas about what constitutes “good code” and “bad code.” We call the evidence of bad code a “code smell.” We act like this when we find it in a junior programmer’s code:
This systematic literature review collates the findings of sixteen studies that attempt to measure the correlation between code smells and bugs. The studies confirmed, supposedly, a correlation between about 80% of the smells they studied and bugs in the code. Well, I dug into those studies a little bit, and unsurprisingly, almost none of the studies possess the number of samples they would need to have the requisite statistical power to back up the claims they make. That’s pretty normal in software studies, to be honest. But also, correlation does not equal causation.
Empirical software engineer Hillel Wayne did an interesting talk in 2018 about studying the impacts of software engineering practices. The comments on that video hint at why statistical rigor isn’t prioritized in software studies: software engineers only register the results of studies if they conclude “This one weird trick will make you a total h@XX0r!” But among the findings that don’t make it into the commenters’ TL;DRs, Hillel mentions this one:
Code smells correlate with bugs, but fixing one does not fix the other.
Maybe engineers and teams who take steps to minimize code smells also take some other unstudied, poorly understood steps that actually minimize code defects. But it might not be the code smells themselves that cause the defects.
We have seen, and discussed before, this tendency in the programming community to pass around ideas about what is true without confirming them. This is the last paragraph from the post where we discovered that numpy vectorization doesn’t do what every dingdang data scientist thinks it does:
So what’s the lesson here? The lesson is much like the lesson we learned from studying the history of API design. The lesson is that just because lots of people think something, doesn’t mean it’s true. And if we investigate what we’re learning rather than take it at face value, sometimes the underlying truth can be just as interesting.
Maybe we would gain a lot from experimenting with the causes of bugs. Unlike plants, code is made by humans, so we’re not observing natural phenomena. Additionally, our studies need to trend observational, because we don’t introduce defects in our code on purpose. So our study design might look something like this:
- Make some code
- Identify a confirmed defect
- Upon identifying the cause, label the cause with relevant identifiers like “insufficient scope,” “excessive scope,” “two variables named the same thing,” “typo,” “dependency conflict,” et cetera.
- Figure out some kind of way to also look for these things in software that is working, and compare their precedence
- Look for large differences in precedence between the two groups
It’s worth noting that this study design has some problems. First of all, I have no idea how we could reliably do #4 (yet). Second of all, it makes multiple comparisons (one for each label), so we would need a lot of examples to give such a study statistical power.
I have some personal ideas about what we might run such an experiment, though. Like I said, I’ll need a lot of examples. So please fill out this form if you’d be interested in contributing your bugs to my example bank (bugalog?).
In the meantime, though, it’s worth considering how we might apply experimental approaches to learning about the things we grow in our profession: code bases.
If you liked this piece, you might also like:
This piece about debugging strategy, complete with alternatives to The Standard Strategy
This Technical Briefing (by an engineer in a bikini top!) of the NASA Landsat data processing pipeline
The time we went deep and determined for sure that numpy vectorization doesn’t do what every dingdang data scientist thinks it does