Tonight at OpenGov Hack Night in 1871, Dan Platt and Craig Booth of Narrative Science came in to give us a tour of ‘Chicago Crime Stories,’ which, as described on the EventBrite Description, is:
an application utilizing Chicago open data to produce narrative on crime in any Chicago neighborhood.
The app takes mountains of Data about Chicago crime and presents a story based on a specific neighborhood, which you can specify on the homepage.
The story looks something like this:
And Craig provides a few flowcharts that explain how the whole thing works under the hood:
The app relies on three data sources in addition to map API data:
- Community boundaries for each ward in Chicago
- Census information about average incomes, changes in demographics, et cetera
- The CSV of crime data
Also, some data processing happens to generate the report:
It’s a lot of data processing, really. We want to know where things happened and then, taking information from those places, identify how the aggregation of those happenings by place differ from other places nearby—in this case in Chicago.
Next step? Editorial configuration. This is where we determine what sort of information we’re looking for by asking a few questions:
Those questions circumscribe the blob of data that we need from the data processing suite. That blob gets sent to Quill, the AI that Narrative Science developed for natural language generation. That’s how a massive CSV can become a readable report without a journalist’s direct intervention. (Want to test drive it? Try QuillConnect, which takes your Twitter profile and generates a report about how you behave on Twitter. Not out yet. I’ll link when it’s out).
So, what’s different about this app from other data processing apps? Well, it’s creating a story—something that you can read, rather than interpret from, say, a picture.
That’s not to say that infographics don’t have their place. They do. In fact, I asked Dan and Craig about this after the presentation. Here’s what they said:
From Craig: “On a general level, a visualization potentially tells you many many thousands of things. It’s up to the viewer to interpret what the image is saying [which, in some cases, is what we want]. Stories, by contrast, shine when there’s something you know you want to get out of this data. A story puts it right there in front of you.”
Another question worth asking: how does technology like this change the role of the journalist in storymaking? Are these positive changes? I’d need a journalist to answer this. Personally, I’d be optimistic about how this changes the role of the journalist: we all have that horror story about butchered data in a news story, not necessarily because the journalist was stupid, but because he or she had limited time and limited statistical knowledge orrrrrr, not uncommonly, because he or she incentivized to make something sound sensational (which, I will also point out, is not entirely the journalist’s fault). Narrative Science’s technology lends integrity to data-driven reports. It leaves the heartstring-pulling case studies, individual stories, and critical thinking to human beings…for now.
Other fun questions:
Q: Can you add spin to the stories, say, if a client wants that?
A: Yes. Angelic example: little league games. Narrative Science can generate two different versions of the story: one for winning team parents and one for losing team parents. The story sent to the losing team parents wouldn’t focus only on the winning team. Instead, it might point out a good performance from someone on the losing team.
Q: Does the software take sample size into account?
A: No, but there’s an aspect of human intervention to correct for that.
Q: Will this thing one day write books? Fiction? Non-Fiction?
A: If someone wanted to build a 5-volume opus about something for which they had a lot of data? Sure. Fiction? That’s harder.