This post is part of a blog series called Leveling Up: A Guide for Programmers. The series covers skills you can use to learn faster, more easily, and more strategically as a programmer.
In a recent post about warmup reading, I mentioned that I distinguish warmup reading from active reading, which is much less formulaic and involves a lot more decision-making. I promised a deep dive on active reading later. This is that deep dive.
Active reading is what happens when I’m looking to execute on a project. It starts, perhaps, as the upside-down version of warmup reading.
That is, when I do warmup reading, I start with a specific source of information. I go through that source, section by section, and take notes. I finish by writing down assumptions I found, questions about the material, and gaps in the field.
For active reading, I start with the assumptions, questions, and gaps. Then, much as I do with academic reading, I go fishing for the information I need to confirm the assumptions, answer the questions, and fill the gaps.
Step 1: Identify some research questions.
Let’s walk through my active reading steps with an example project: my recent three-part series about the history of HTTP API protocols. It helped that I have designed several APIs myself, but I also wanted to more deeply research the subject matter before I explained it to others.
My experience in API design has exposed me to a number of assumptions about APIs that pervade the programming populace. I wanted to confirm or debunk (probably debunk, was my guess) those assumptions. So, to begin outlining the piece, I chose some starting assumptions and drew some research questions out of them.
Assumption: SOAP must be the oldest HTTP API protocol, since it is the oldest one I have heard about.
Suspicion: There is a 30 year gap between the birth of the internet and the arrival of SOAP. Something else must have existed in that gap.
Questions: How did apps talk to each other over the web before SOAP? What did SOAP inherit from these predecessors, and what did it leave behind? Are any of those predecessors still used? If so, where?
Assumption: REST is a cure-all HTTP protocol, and we should write all APIs in a RESTful way.
Suspicion: That is not consistent with my experience.
Questions: Have others found SOAP (or something else besides REST) to better fit certain use cases? What are the patterns here?
Assumption: GraphQL is a new paradigm, and we should write all APIs in GraphQL.
Suspicion: That is not consistent with my experience. GraphQL looks like SOAP to me, with associated strengths and limitations.
Questions: What differences have others identified between GraphQL and SOAP?
Step 2: Go looking for answers.
This is where some Googling skill comes in handy.
For the first question about predecessors to SOAP, I started with very broad searches like these:
History of API protocols APIs before SOAP HTTP protocol history
As we talk about Google searches, we’re also going to talk about two types of sources–primary sources and secondary sources. Let’s go over those terms now:
primary source – a source that directly exemplifies the subject of research. For programming, this might be the actual code written to solve a problem. For computer history, this might be the actual communiques between computer labs. For academic research, this is the actual paper describing an approach, or an interview with the academics who did it.
I might use a Google search including the words example, study, or interview if I am looking for primary sources.
secondary source – a source that talks about, summarizes, explains, or opines on a primary source or a collection of primary sources. These sources are useful for drawing connections across disciplines, applying theoretical solutions to practical problems, and making information more accessible. My blog series on APIs is a secondary source. It draws from several primary and secondary sources.
I might use a Google search including the word explained or survey, or history if I am looking for secondary sources.
As I am starting my research on a topic, I tend to look for secondary sources. Those sources help me understand long-running timelines like this HTTP API timeline. They can also help me see how major concepts fit together, so I can picture my location in a mind map of my field of study as I’m digging up more details later.
My initial searches led me to some historical overviews—secondary sources—which introduced me to SOAP’s predecessor: CORBA. From here, I could develop more specific questions to ask the search engine:
CORBA example call CORBA at IBM Greg Turnquist CORBA
At this point, I wanted more primary sources: examples of CORBA requests and for statements from individuals working at companies that still use CORBA. Now that I had a general lay of the land, I wanted to gather firsthand information about the experience of using CORBA and the technical implementation of the protocol. Firsthand information on technical procedures can also come from academic papers. In this case, I didn’t find much in the academic literature on CORBA, but REST famously came from a graduate school dissertation, so I did a little poking around in the academic literature for the API piece.
As I worked my way through the bloggerature about CORBA, I found lots of writers and commenters bashing CORBA and calling it names—which was expected. When lots of people echo the exact same perspective like this, that smells funny to me. I can tell that part of the truth has been buried here. Time to start digging.
Step 3: Look for the perspectives that have been pushed underground.
I specifically went looking for folks who defended the utility of CORBA and I read what they had to say.
An important part of active reading (or any kind of information gathering) is to ensure that I am separating biases and opinions from fact. Information sources are not, in and of themselves, the truth. Instead, they are models of the truth. And as George E.P. Box is oft quoted saying: “all models are wrong, but some are useful.”
There is bias and opinion laced into which information a source presents and which information it leaves out. It’s incumbent on me as a researcher to go find the information that got left out.
Case in point: to read most of the literature about computer science, you would think women just weren’t a part of the field. That’s false. We’ve always been here, and we’ve done things that other people got the credit for in the literature. But I have to go looking for information about where technology came from, because it’s often not the person that Wikipedia says it came from. (This is why, in my writing, you won’t see “so-and-so coined” or “so-and-so invented” or “so-and-so discovered.” Instead, you’ll see “credited with coining/inventing/discovering” unless I have really, really looked into it).
I saw a clear convergence of perspectives about CORBA: lots of bashing. Programmers love to bash tech, especially old tech. I wanted to find the opinion that doesn’t get as much coverage or as loud of coverage. I want a complete, balanced view of my research subject.
For this, I used searches like:
advantages of CORBA in defense of CORBA thank you CORBA
These searches helped me build a better picture of CORBA’s strengths.
Sometimes, though, reading alone doesn’t produce satisfactory answers to my questions. In those cases, I go a step further.
Step 4: Verify information with experiments.
Once I have researched all the perspectives on a technology to my satisfaction, I’ll try to develop my own. So in this case, I wrote a CORBA request…just kidding. I got lazy and didn’t write my own CORBA request. But I did watch this person do it on YouTube, along with a few other CORBA YouTube tutorials. The gold standard would have been to try it myself.
Later in the development of the blog series, I did confirm something else with my own experiments. I was researching the third question—What differences have others identified between GraphQL and SOAP? I confirmed what I knew, that programmers generally implement GraphQL in JSON whereas the original SOAP used XML. Everyone complains that XML is much slower and more complicated than JSON. This one I did, in fact, test myself. I tried out object mappers for XML and JSON in C++, Java, and Ruby. As it turned out, the C++ XML mapper took a little longer than the JSON one. The Java XML mapper took about twice as long as the JSON one, and the Ruby XML mapper took about 8 times as long as the JSON one.
I did this work to figure out whether the rumor about XML evaluating more slowly turned out to be true. The answer: yes-ish, but the degree to which this is true depends on the mapper.
Step 5: Rinse and repeat.
Often, as I perform this process on one research question, I’ll dig up other questions that I want to answer. This happened in the case we just discussed, as What differences have others identified between GraphQL and SOAP? became Who cares whether a protocol uses XML or JSON? and finally Does XML mapping in fact take longer than JSON mapping?
For each additional question, I find myself doing some part of my active reading cycle to answer it. As I read, searched, and tested my way through my questions, my draft outline for the API series filled up with quotes, links to sources, and example code. When I finished my first pass at research, I had enough information to begin framing and connecting the information I had found.
As I wrote, I did additional mini-cycles of active reading to address questions that came up as I wrote. The process doesn’t look like a course from start to finish: instead, it has several short feedback loops. I find myself looping back again and again to get a little more information, confirm this, challenge that. So as I go deeper into a project, my research activities focus on smaller and smaller details.
The Big Picture
For me, active reading feels like a misnomer for the process I have described. This process certainly can focus on reading. In practice, though, my information gathering strategy stretches past reading materials alone. It can include all kinds of media and experiments of my own as well. So I might more accurately describe the process I use as ‘active research.’
I use this process for a variety or projects: blog series, like the example we have discussed, but also for data analysis projects (like the ESG stock project) or learning projects, like my current quest to compare various pre-processing and modeling techniques for text analysis in machine learning.
Instead of going into my active research process with sources in mind, I start with assumptions, questions, and gaps in the topic I want to research. From there, I use judicious Google searches to unearth secondary sources that help me get an overview. From there, I may dive into primary sources to get a firmer understanding. As I do, I’ll start to get a view of the prevailing perspectives on the topic at hand. So I’ll specifically go in search of perspectives besides these prevailing ones: I want to include minority voices in the information I’m considering. Occasionally, primary sources don’t bring me close enough to the material for my satisfaction. In those cases, I need to test out procedures myself. Maybe that’s writing my own API call of the type I want to describe and evaluate.
As I go through this process, I’ll often end up with new questions that I did not have at the beginning. So I answer those as well in a series of ever-tightening research loops. The learning, of course, is never completely done, but I chalk that up to the nature of learning rather than a weakness in my method :).
Overall, I find that my active research strategy keeps me engaged and excited about what I’m learning. I get to decide what I’m going to find out next, and I get to choose how deeply I want to understand it. I get to go digging for facts that the cursory overviews miss. The process feels something like solving a mystery, so I can picture myself as the protagonist from one of the detective novels or spy novels I read in my youth. And as a result, I often come away with a more complete, more concrete grasp on the topic than I would get from warmup reading alone.