Everybody Lies has some takeaways for you to use your own data science initiatives. Please consider the example applications critically, though. The statistical rigor of Stephens-Davidowitz’s studies ranges in quality from questionable to unequivocally poor (I’ll share some examples).
I have separated my discussion of the book into two parts:
- Part I: Distillation of Useful Information for Data Scientists
- Part 2: Dangerous Methodology and Why it Matters (this post)
Part II: Dangerous Methodology in this Book
As a data scientist, I struggled throughout the book to humor the author as he continued to discuss studies based on fundamentally flawed assumptions and a remarkable dearth of statistical rigor. In fact, the author’s study designs fall victim to several of the exact limitations that he explicitly mentions at the end of the book (summarized in Part 1 of this review). It sends a confusing message. How did this book end up explicitly discussing statistical rigor in one section and then completely ignoring it in the example studies?
For example, in one study, Stephens-Davidowitz sets out to find out what percentage of the population is gay. It’s evident that that zero queer scholars, and possibly zero gay people at all, were consulted about the design of this study.
To begin with, sexuality is a multidimensional continuous variable. To pretend it is a binary, categorical variable fundamentally misrepresents the data. If you’re misrepresenting how your data works, your results won’t just be false: they won’t mean anything.
Then there’s the second issue: bad proxies. Stephens-Davidowitz uses, as a proxy for gayness, porn searches. I don’t know who told this guy that a person’s porn preferences line up with their sexuality enough to proxy one with the other, but it’s patently false. Other metrics used: acceptance of gayness, by state, as indicated on surveys. We know that what people say on surveys is different than what they think—in fact, it’s part of the base premise of why we should read this book. The author then measures out gay population based on facebook profiles—also a poor proxy, for a million reasons, but to start with it skews toward young people who are public about their personal lives.
From the poor proxies, we go straight to statistically unsound extrapolations. Stephens-Davitowitz finds a state-by-state relationship between acceptance of gayness, as indicated on surveys, and outness on facebook profiles. He then scales this relationship to theoretical 100% acceptance to figure out how many people are ‘really gay.’ Does he at least acknowledge the massive error bars that belong around an extrapolation like this? No, he doesn’t. But he notes that the number he comes up with looks congruent with his porn searches estimate. This is purely coincidence, and it sounds like the author knows it. For some reason, he draws the connection anyway.
Then, the author refers to difference between his massively extrapolated gay number and self-reported homosexuality as ‘lying.’ As if everyone knows one way or the other, and as if there is no coming out process, and as if sexuality isn’t fluid. But all those things are false, too.
This is exactly why diversity is important in science and technology. In this instance, the researcher’s complete failure to understand the subject matter made the entire study both laughably useless and not at all funny. Just because there are some numbers floating around doesn’t make a study valid. This is also exactly why numbers conjured by data science ‘generalists,’ with no input from subject matter experts, are patently problematic.
We’re talking about just one subject here, but the sheer number of wrong assumptions in this case makes me question the assumptions from which all of this researcher’s other work starts. This sort of perseverance in spite of ignorance is how we accidentally get facial recognition systems that don’t see black people, market segmentation algorithms that allow customers to target ‘jew haters,’ and computer programs that direct police to flood a city’s most vulnerable neighborhoods.
This wildly inaccurate study isn’t funny because the use of results like these for any serious practical application are dangerous.
And lest you think this is a gay rant…
Let’s talk about another example of sloppy data science in this book: the world leaders assassination experiment.
So in this case, the author declares that new leaders dramatically change the course of countries. This research, he claims, undermines decades of institutional knowledge, chiefly the idea that leaders are largely impotent figureheads affected by their surroundings.
To support this, he discusses a study with two natural experimental groups: countries whose leaders were assassinated (causing a leadership change) and countries where leaders were not assassinated followed by no leadership change.
So here we have three variables: assassination attempt, assassination success, and leadership change. They are not tested separately. The study does not discuss (or the book glosses over) leadership change without assassination attempt, or continuation of dynasties despite a successful assassination. There is no mention of leadership change where an assassination attempt did take place, and the leader survived, but stepped down soon after. No discussion of sudden death of a leader, followed by leadership change, without an assassination attempt. It’s a wad of variables tested together.
Therefore, we cannot know from this study that leaders aren’t reacting chiefly to their surroundings. What if that is indeed the case, and an assassination merely changes the national dialogue to which the leader must respond? Wars and terrorist attacks change national dialogues without a a change in leadership. Traumatic events change the way people think about things. Events that traumatize a lot of people may result in big collective changes. If a new leader reacts to those, is the leader driving the dialogue, or is the dialogue driving the leader?
To maintain rigor in natural experiments, we need to isolate one variable between the two most similar possible groups we can find. Oddly, Stephens-Davidowitz explicitly mentions this at one point in the book, but then fails to heed it in the research he cites. Again.
I’ll leave it at these two examples.
The data science cited in the book concerns me greatly. The future of data science demands far more statistical rigor than this—first to not harm society, let alone to help it.
I can only hope that, in an effort to appeal to a lay audience, the author of this book left out and skewed a bunch of details about how these studies were conducted, and they actually had much more rigorous implementations than the book suggests.