Last month I perused Statistics Done Wrong: The Woefully Complete Guide by Alex Reinhart.
It’s all about how to screw up your experimental design and data analysis. It draws from copious examples, the majority of which come from published studies in reputable journals 😲.
This post will draw some interesting pieces from my notes on the second half of the book.
The first part of the book, in my view, introduced some of the mathematical mishaps associated with less-than-rigorous data analysis. As the book continued, Reinhart began to examine potential pitfalls in data representation and study design (with multiple comparisons, featured in our previous blog post on this book, falling into the study design category). He also talks more about the incentive structure of academia favoring techniques that produce interesting results over techniques that produce results that represent the actual state of the world.
This type of data representation error happens when someone misrepresents continuous data that does not fall into discrete categories (like body mass index) and either misrepresents it as discrete categories or interprets it in some way that isn’t true to the data. For example, with body mass index, we frequently see two categories: ‘normal weight’ and ‘overweight.’ So first of all, body mass index as a metric in the first place has been demonstrated to be a poor measure of health and fitness and in the second place was devised from subject population comprising exclusively white people, so it misrepresents other body types (and in particular pathologizes black women’s bodies and poorly predicts health markers for several communities of color). So, out the gate, we have some issues. But let’s stick to continuity errors specifically. Frequently body mass index data gets categorized as ‘normal weight’ (24.9 or lower) or ‘overweight’ (above 24.9). Where is underweight? Also, what is the difference between a 24.8 and a 25.1? When these middle values get averaged together with extremes on either end, it looks like this 0.2 difference in the middle is life or death. It’s not. We’re just representing a wide range with a tiny number of categories. It’s worth examining whether and why we need to categorize continuous variables before we do it. There are good reasons (get evenly sized buckets of points to compare means, draw meaningful visualizations, et cetera). But it’s not a default thing to do.
Then there’s mistaking correlation with causation: where two variables track together does not mean one causes the other. The other could cause the one, or both could be caused by a third thing. Then there’s overfitting: creating a linear model with more features than points such that it matches the training data perfectly and eats concrete on the test set. LASSO can help us with this.
This one gets only a small section in the book, but I’m mentioning it because Reinhart’s explanation impressed me. In my experience, Simpson’s Paradox typically gets taught something like this: “You see a trend in one direction in the subgroups, but when you aggregate them all together the trend goes the other way. It’s a weird paradox!” Eh, not really. Reinhart articulates it well: Simpson’s Paradox usually comes from a confounding variable in the data groups. (Another fun lesson from the book: introductory lessons on Simpson’s Paradox often exemplify a lawsuit purportedly filed against UC Berkeley about admissions discrimination. That lawsuit never happened).
Academics have a saying: ‘Publish or perish.’ Statistical rigor is important, but what’s an academic to do when a rigorous study stands a poorer chance of getting published? Journals want to see interesting results and big differences, so there’s little incentive for academics to interrogate their findings for truth inflation.
There’s also a disincentive for academics to publish their data because what if someone else finds an insight in it and beats them to publication on an insight about their own data? It’s also rare for editors to hold researchers accountable for sharing any differences between their original experiment plan and what they ended up doing, or why they chose one analysis method over another and what the results might have been had an alternative been used.
There’s room here for improvement, but change is slow. Luckily, online journals have managed to shed one issue that faced the paper journal world: page space limits. Without a limit on length of papers, researchers can publish their negative results and study explanations rather than cut them out of the paper to get under the length limit.
For now, unfortunately, online journals get the side-eye in the academic community as a ‘safety’ for papers that could not get published in print journals. Reinhart mentions Randy Shekman, a Nobel Laureate cell biologist who has published exclusively in open-access online journals for the past five years. Reinhart suggests that researchers who are not “shielded by [their] Nobel prize” couldn’t pull this off and still get credibility, but that perhaps Shekman’s move will help shift the industry (sidenote: what Shekman is doing here is an instructive example of how to use privilege. Excellent work).
Black Star Opportunities
In these notes I put a black star wherever I identified an opportunity to introduce rigor in the way I design research papers, should I ever end up doing that. Namely, there is relatively little work on appropriate study designs for behavioral studies relative to clinical ones, so there is value in doing work in that area. Additionally, as mentioned before, papers rarely explain why a given study design or analytic method was chosen and how it compares to others. I’d like to believe I would try to do this, if I ended up publishing serious research.
I took a few notes on some of the additional sources and organizations that Reinhart recommended—these are demarcated in the notes above by a pair of eyeballs.
These include a few sources about how to write and handle code in research projects. As a software engineer, these sources offered me a valuable shift in perspective toward the researcher’s mindset. For example, in the lists of desirable programming practices for researchers as recorded in these notes, we have things like “write code for people,” “create reusable pieces of code,” and “use version control.” These things come second nature to folks whose job is to write code for a living. Developers, if devoid of additional context, would scoff at these lists.
That having been said, a developer writes code as their job. An academic researcher is expected to be able to write code, write coherent papers, conduct analysis, design studies, gather data, fundraise (to get the money to gather and analyze the data), teach students, grade things, coach teaching assistants, review other researchers’ work, and on and on. Yeah, some of their code is thrown together. And given the laundry list of responsibilities they have, I understand why.
I still haven’t gotten around to the 560 page book on statistical power analysis. It’ll happen.
Again, I’ve glossed over parts of the notes and alighted, in my summary, on items that were particularly shiny to me. I found the examination of arbitrarily categorizing continuous variables helpful in the context of a project I was working on where I wanted to divide examples into equal-sized buckets based on a continuous variable. I also found Reinhart’s explanation of Simpson’s Paradox pithy and memorable.
Rather than chastise researchers for lack of rigor, Reinhart also examines the context in which some rigor disappears. Of particular concern: journal policies that disincentivize scientific rigor. There is value in judging the quality of someone’s work based on its contents (and its omissions) rather than by the name of the journal in which it got published. Unfortunately academia has a tough habit of using journal names as a proxy for researcher credibility sometimes. It’s a growth area for our scientific fields.
Finally, the book highlighted some shortcomings of existing research that I’ll want to remember in the event that I end up conducting or publishing research. My hope, in such a circumstance, would be to contribute to the field something that it currently needs more of.
Reinhart also listed many additional sources and organizations to check out, and I took a look at some of them. They gave me additional perspective on the role of the researcher, but they demonstrate that a programmer’s eye can be very useful in a research field.