Today I’m taking a break from talking about the Raft distributed consensus algorithm, compiler design, or the inclusion rubric. Instead, I want to talk about what pride means to me, a queer/gay/lesbian staff software engineer at Pocket.
I have written in the past about my coming out story, and what I think is the role of heterosexuals at Pride, and what I think is the role of corporations at Pride. This is not any of those posts. This post is not even about pride month. Instead, it exemplifies my day to day experience of queerness and how it affects my work all the time (including, ahem, not in June).
This piece also touches on the importance of centering the perspectives of your most marginalized constituents when you make product decisions. I’ve written more about that specific topic here.
Without further ado, here’s the piece you came here for.
I never really intended to work at Pocket—or anywhere ever again.
This surprises people. Anytime I mention that I work here, I get one of two responses:
- What is that?
- OH MY GOD I LOVE POCKET!
For those in the first camp: Pocket is an app that allows you to save, organize, and annotate stuff you find on the internet. It tries to parse articles for you and, if you’re so inclined, it can try to read them aloud to you. Even our free version is pretty great. We’re currently owned by Mozilla, who uses our article recommendations on the Firefox new tab.
I’ve been working here for about six months. Before that I freelanced after quitting what I thought would be my last full time engineering job. I had plenty of clients, I got to choose challenging projects, and I made my own schedule. I had figured out invoices, health insurance, and even taxes. I couldn’t imagine working full-time for any singular employer again unless the position were truly perfect for me.
That changed the day Amy Coney Barrett got confirmed.
For those not from the USA: the “final say” on legal decisions in the United States belongs to the Supreme Court (SCOTUS), which decides cases based on majority vote. Amy Coney Barrett’s addition to that court created a conservative supermajority—clearing a path for explicitly anti-LGBTQ legislation at the federal level. The shift has emboldened states to pass more anti-trans legislation this year—already, as of bill #80 in March—than in any prior year. Immediately after ACB’s confirmation, SCOTUS Justices Clarence Thomas and Samuel Alito threatened that they would welcome a case that would allow them to repeal Obergefell vs. Hodges—the decision from 2015 to recognize gay marriage at the federal level.
Now, the SCOTUS is explicitly supposed to mitigate the capriciousness of policy decisions as party power switches. So repealing a legal decision that is only five years old defeats the court’s purpose. Current Chief Justice John Roberts recognized this and put the kibosh on the idea, but the highest court in the land has made it clear that they’d love to throw the book at people like me.
They could get their opportunity to do that if they see a case about healthcare. Right now, we have the Affordable Care Act, which makes healthcare—well, not exactly affordable, but at least available, regardless of preexisting conditions. Before we had that, health insurance companies could refuse to insure based on preexisting conditions. In those cases, usually, an employer could get you covered. But if you worked for yourself, you were literally on your own. I could see a future where a conservative supermajority engineered conditions under which gayness made someone ineligible to purchase health insurance on their own. It wasn’t that long ago that the U.S. government deliberately ignored the AIDS epidemic, and today state legislatures are perfectly happy to deny care to trans kids. So when hetero people ask me “Don’t you think it’s a bit reactionary to assume that they’d come for LGBT people?” Uh, no.
So I started looking for an employer who would insure me.
And then I got extremely lucky; I found a position that I might have cleared that “perfect for me” bar even before the ACB confirmation. Pocket wanted an engineer to guide and enable their efforts in the machine learning arena. It’s tough to find a role at a place that’s using machine learning on a problem that’s ethical to use machine learning on, but this one seemed like it might be that.
Believe it or not, Firefox fans, Pocket does not currently personalize your article recommendations on the new tab. A lot of people think that we do! Here’s why: our editorial team understands what people want to read. So, for any six things they stick on that tab (that’s how many fit above the jump on most browser windows), we’re almost certain you’ll want to read at least two. We’re fairly sure that you’ll feel viscerally drawn to at least one. It turns out that human expertise goes a long way, even on problems that we try to hand over to computers.
That said, our editorial team cannot personally recommend pieces for every individual Pocket reader. There are just too many of y’all!
So we’re building tools to personalize article recommendations at scale. For example, at the moment, when you save an article to Pocket in your browser, you’ll often get recommendations for three other articles that you might want to read. The plan, of course, is to raise the sophistication of the recommendations over time.
We want to be thoughtful about how we recommend stuff.
Here’s the tricky thing about content recommendations: they can go sideways to catastrophic effect. Let’s look at an example: YouTube built a content recommendation algorithm that optimized for the percentage of the video that people watched and how much they commented. That sounds like it would work, right? Recommend stuff people want to watch and talk about! Except that the videos that get watched all the way through trend shorter, and the videos that generate the most commentary trend edgier. Before you know it—bam. You’ve got a political radicalization algorithm on your hands. The company has deployed a few technical workarounds to try to mitigate this effect, but the nature of the optimizing metric limits what can be done. Facebook ran into similar issues with divisive content recommendation strategies, but they chose not to fix it because, after all, engagement is their profit model.
We hope, with Pocket recommendations, to help people find reading material so they can learn things and stay informed, without burning them out or radicalizing them.
We’d also like to avoid harvesting their personal data.
Data privacy in machine learning matters.
As I mentioned before, we’re owned by Mozilla—a company that has built its brand on giving its constituents some privacy on the internet. That privacy has become scarce as advertising has moved online. This excellent Twitter thread delineates the extent of it better than I could, but the upshot is this: whenever we search for something, click on something, or buy something, or even when our neighbors do it, companies collect that info to determine what ads to show us. There’s a conversation to be had around the role of ads on the web because the truth is, ads are the reason the web is free. I don’t want the web to become a luxury good. That doesn’t change the fact that most consumers don’t even realize just how much information their browsers harvest from them.
And that’s even before we’re logged in anywhere. Once we make an account, we’re often more willing to give that site some personal information because, after all, it’s in a private account, right? Except that it would be tough for me to exaggerate how cavalier companies might be with that data. I mean—Equifax, the company that has all our social security numbers, could not be arsed to update its Struts dependency for months after the upversion fixed a known security flaw. The next thing you know, 140 million people have to lock down their credit histories to avoid identity theft. In my professional opinion, the way to save user data is to prepare for when the database gets hacked—not “if.” I look at Signal as a fantastic example of this: they encrypt your messages and store no plaintext. So, even if they get subpoenaed for your message data, they scarcely even have anything useful to turn over.
And security isn’t individual—it’s collective. Your data is only as secure as your most trusting friend, or even a stranger who gets a video or image of you and puts it on a public instagram. Particularly if it’s at a repeating event like a fitness class, if someone in that fitness class has a stalker, the stalker could easily find out exactly what time their target is at exactly this gym’s address. Is Instagram checking for consent on videos with people in them? I doubt that’s even on the roadmap. But it became an issue last year when folks showed up in droves to racial justice protests and took selfies in front of the crowds without blurring faces. People’s pandemic masks saved them from getting identified and targeted based on Instagram posts by people they don’t even know. Our data security matters, and we have no control over it.
What does any of this have to do with Pocket—or with gays?
Well, this is the part where I need to clarify that I am now speaking for myself—an engineer who works at Pocket—and not on behalf of Pocket itself. But here’s how I see it.
We could easily ask you to let us grab your save history, or maybe even your search history, and make lots of recommendations based on that. And it’s not medical data or whatever—it’s just articles you saved—so that’s not so private, right?
Well, of all the links I’ve put in this piece, I think the thing that will end up making my point the best is the piece of data that I don’t have.
I have no idea how many closeted queer and trans people asked their first question about queer resources to a librarian.
And that’s not a bug; it’s a feature of the job. I’m not supposed to know the answer to that question, for the sake of those people’s privacy.
Here’s what I do know: most librarians and bookstore attendants with a couple of years’ experience have at least one story about an embarrassed kid walking in the door and asking, under their breath, about queer literature.
When you’re a kid growing up in a homophobic town, with hostile friends who compare gayness to pedophilia and view “lesbianism” as predatory, and you go to a school whose principal will refer to queerness as a “compulsion,” and you cannot risk finding out that your parents’ love is not unconditional, there’s no one in your circle that you can turn to for help with understanding yourself or coming out. It’s dangerous.
So you can only go to someone who will keep your secret safe, and that’s probably not friends or family. But a librarian can point you to all the right things without asking a bunch of invasive questions or leaking your request to anyone—indeed, without even knowing your name.
For queer folks in the past twenty years, the internet has provided an unprecedented level of community. And once upon a time, it could maybe even provide the safety of anonymity. It can’t now. It can’t even from back then. There’s too much data. Jenny Zhang points out an especially terrifying example:
Imagine…that you’re a queer kid living in a small town in 1999, and you sign up for Livejournal and use it to find a supportive and loving queer community online. Then in 2007 Livejournal gets sold to a company based in Russia, which in 2013 criminalizes the distribution of pro-LGBTQ content to minors, and in 2017 Livejournal loses the account info of 26 million users. Was it your responsibility to monitor the shifting geopolitical context of your childhood diary for the next two decades?Jenny Zhang, left alone, together, 3 May 2021
People turn to the web as an initial source of information, much in the way that folks have turned to libraries for decades. People show up to the library—and the web—sometimes clueless, and sometimes insecure about what they don’t know. The search engine plays the role of a card catalog—well, except that it also keeps a running record of everything they looked up. In online communities, folks can get personalized recommendations from others, but those recommendations could be misinformed or they could have an agenda. The requests are also a matter of community record and become vectors for getting outed, harassed, or doxxed.
So what would the internet’s librarian look like?
Well, an automated component would help with scaling. But the data strategy du jour doesn’t fit the spirit of the librarian. Instead, it’s to collect as much data, and as rich of data, as humanly possible—far, far more than we can secure, far more than we can even use—on the off chance that there’s something in there that ekes out another click percentage.
As far as I’m concerned, we’ve seen what click percentage can do to an algorithm. We’ve also seen what that algorithm can then do to the people it reaches. We tried this, folks. It didn’t f**king work.
Instead, I like to turn this strategy on its head by asking this question:
How much can we help a constituent with as little information as possible from them?
Here is an example of how I implement that question in an actual software project—in this case a data import. I think it informs the article recommendation problem, too.
To make a good recommendation, a librarian needs very little information about what an individual is looking for. The thing they need to know is how to evaluate the available materials and choose a small number of them that clear the individual’s satisficing metric. That’s easier said than done because a librarian cannot read everything themselves. So they rely on summary statistics, reviews from other librarians, and even the opinions of past patrons to evaluate the inventory. They can speak to each book’s topic, its reading level, and the difficulty of the material inside. They can even calibrate their recommendations to offer a reader multiple different perspectives on an issue. If a recommender could capture that, it might be able to provide accurate, bespoke recommendations based on a single search or a single article save.
So that’s the “what.” To get to the “why,” I turn to my other favorite question for informing software design:
Under what circumstances could this software save, or ruin, someone’s life?
I grew up gay and closeted, made it to adulthood, and ultimately came out. I know that not every queer kid makes it that far. I don’t have to think hard to imagine a circumstance in which access to the right resources could save someone’s life.
I also don’t have to think hard to imagine a circumstance in which poor execution could ruin someone’s life. First, online radicalization has actual, human casualties. And second, to live with a smartphone as a marginalized person is to have an honorary doctorate in identifying all the bad things that could happen to you because a tech company didn’t think about you. It’s a pretty constant, steady stream, unfortunately.
So when I think about the problem we’re trying to solve, I’m not picturing a specific metric.
I’m picturing an embarrassed kid whispering to the librarian that they’re looking for queer lit.
I’m picturing a young adult who left their homophobic church and wants queer-informed resources to reconnect with their faith.
I’m thinking about the twenty-year careerist searching for beginner-level books on a subject that people expect them to know already.
I’m thinking about the person who got suckered into white nationalism, who is looking for their first resource about anti-racism.
I’m thinking about the way that society uses shame to discourage people from seeking knowledge.
I’m thinking about the immense courage that it can take for people to seek that knowledge anyway.
I’m thinking about how much it means to those people to know that their search is safe with the librarian.
And I’m thinking about how to give them that safety online.
To me, “pride,” on a day to day basis, is about thoughtfulness.
I work hard to stay abreast of inclusion issues. I try to create spaces within this high-leverage field that consider folks on the margins. I consider that a pillar of engineering skill rather than a disparate skill that competes for my time with improving my engineering. And the longer I do this, the more I realize how some of the tenured “greats” have hobbled their own technical skill attainment by refusing to listen to marginalized voices. I take pride in having made that commitment, and I take pride in the engineering successes it has afforded me.
Perhaps that’s still the little gay girl inside me, who busted her ass to be smart and well-read and athletic and accomplished and attractive and, and, and, and, and, desperately hoping to make herself worthy of acceptance and love in spite of her repulsive sexuality. Just to be completely clear, I was a child when I started doing this.
Kids, children—they deserve a better world than that, you know?
I’m proud of each opportunity I have taken to build that world for them. And I feel an immense sense of responsibility to keep building it, one algorithm at a time.
If you liked this piece, you might also like:
The listening series—I don’t often get very personal on this blog, but I did a little bit in the piece you just read, and I do also in the listening series. My hope is that I’m choosing to do it in places that allow me to save you a gray hair or two 🙂
The series about writing an interpreter for a programming language—this is the obligatory “you just read a sappy piece, but don’t get it twisted, I’m actually very technical” portion of the program
Improving Your API Design Skills with Open Source Examples—this piece is just sort of fun 🙂 Examples in an object-oriented Python library, but the concepts apply to any programming language or paradigm.