This is the second case study in a series called Adventures In Process Design. We started this series with this discussion of Avdi Grimm’s Southeast Ruby keynote. Our goal here is to identify transactions in existing software and redesign them to better model the processes they enable.
Decades of transaction-focused software have trained users not to expect such sophisticated designs. So by building our skills at looking for processes, we’re preparing ourselves to write software that centers our clients even more than clients center themselves!
Are you new to this series? Welcome! Check out all the series posts right here!
Case Study: Shazam for Birds
Suppose we’re writing an app for birdwatchers to upload, view, and share photos of the birds they see.
We want to label the birds so viewers can filter for images of specific kinds of birds.
We would also like amateur bird watchers to be able to upload photos of a bird they don’t recognize, so they can see what kind of bird it is. For this, we’ll need an automated strategy for labeling the birds.
So we decide to train a neural network on a whole bunch of bird pictures and then use that model to label each bird photo uploaded to our app.
Here’s how lots of modern products incorporate machine learning into the software:
In such an architecture, our
Bird model inside the app might look something like this (example in Django):
Our data might be stored in a database like so:
We would then create a model that attempts to predict
species based on a vectorization of each
image at upload time.
Here is the problem: an app design like this assumes that model training and predictions consist of two transactions: tech team trains model, model labels birds with 100% accuracy. This rarely happens. Models get things wrong.
What happens if we start relying on the model as the source of objective truth when it gets things wrong? We have talked about ethics-flavored cases of this in the past, with their elevated consequences. But even with no ethical implications, relying on a wrong model for information makes our understanding of the topic (in this case, birds) fundamentally incorrect.
In practice, teams sometimes retrofit their development process to try to do some error analysis. The retrofitted solution is often much less sophisticated than the original one: data scientists manually go into app consoles and export out all the data, look at it in giant spreadsheets, send it off to subject matter experts to get their opinions, then start the process of error analysis. Correcting the already-mislabeled things is often also a manual call with a database query language. This much human involvement introduces more opportunities for mistakes, like entering invalid data.
Developing a model, like developing a software application, is a process—one with several steps, most of which involve interaction with one of several groups of people. Even a simplified version of that process might look something like this:
What’s happening here? We use some training data about birds to build a model. We then deploy the model in our application, and it predicts the label for the incoming bird pictures.
As the bird watchers (users) observe and upload photos, the subject matter experts among them may identify when the model mislabels a bird. Those experts need a way to tell us that our model got something wrong—both so we can re-label that bird, and so we can figure out how to make the model more accurate.
For that, we need the app developers to put some kind of button or interface in the app to tell us that a bird got mislabeled and to tell us how the bird should be labeled. Bird watchers can then give us feedback on that interface. Is it easy to understand and use? Does it allow them to share all the relevant information about the mistake?
Meanwhile, through that interface, bird watchers are also building up a list of mislabelings. Our data science team can do two things with those mislabelings: first, they can update the data itself so that the image has the right bird label. Second, they can analyze all the errors that the model has made and try to identify patterns of mistakes. Does it mislabel the blurry birds? Does it mislabel birds in flight? In these cases, they can improve the model by adding representative images to the training data or changing the model structure to better accommodate these cases.
How can we accommodate a process like this?
Step 1. Finding the Process
In our last process design post on importing data, we started with some questions. Let’s see how those questions apply to our current application.
1. Who is our client? The last time we did this, we had one type of client: an athlete. Now, we have a few types of clients: beginner birdwatchers, expert birdwatchers, and data scientists.
2. What does our client want/need?
Beginning Birdwatchers – Want the ability to upload a photo and see a type of bird for that photo.
Expert Birdwatchers – Want the ability to make predictions more accurate.
Data Scientists – Want more, and more accurate, data.
3. What do we need from our client to give them what they need?
Beginning Birdwatchers – We need their photos.
Expert Birdwatchers – We need their knowledge. That is, we need a way for them to tell us when predictions are correct or incorrect.
Data Scientists – We don’t specifically need something from them right away. After the model is live, we need a way for them to update the model.
4. Why do we need that?
Beginning Birdwatchers – We need their photos because we need to see the bird they saw in order to help identify it. A possible alternative here is to have them somehow describe a bird they saw. There are many failure points with this (two big ones: inaccuracy in eyewitness testimony, differences in the ways different people use to describe things), but we could possibly in the future allow a description to substitute for an image and maybe make a model to predict that way. For now, we’re not going to do that: it’s a way around the needed information, but it’s for a future iteration of this software.
Expert Birdwatchers – We need their knowledge because we need help identifying the birds. We do not have a way to correctly label the birds unless someone who knows birds looks at the pictures. We’re trying to make one with the model, but for now that model depends on labeled data. Right now, we’re struggling to make a better model because we are missing this piece in the flow of our software.
Data Scientists – We don’t technically need anything from them on the first pass. In fact, supervised learning model training begins with a labeled dataset, and starting with more data gives models a better crack at higher accuracy. That said, in order for data scientists to perfect an existing model, it is helpful to have fodder for error analysis—that is, information about what the model is getting wrong, and how. Once the team can tune the model to offer better labels, we need a way for the data scientists to replace the model in production with the new version.
The corollary to this question: is there any way we can move forward without the thing from our client? If there is, we should find a way to continue the relationship without the thing, for now, until we really need it. To use an example from signing up for a paid SaaS product: I’d prefer if I only absolutely had to provide credit card info after the free trial, not before.
For beginning birdwatchers, we’ve made a scoping decision for now not to move forward without an image.
For expert birdwatchers, we’ve also decided that for now, we need their expertise.
For data scientists, we want to help them with error analysis and model deployment, and those are things we can address in our redesign. We’ll model them in our process.
So how will we model the process?
That’ll be the subject of Part 2 in this case study. In the meantime, here’s what I recommend as an exercise in building a graceful process for your software:
- Choose a problem that you are currently trying to solve with software. If you don’t have one, you can choose a problem you know a lot about that somebody is currently trying to solve with software.
- Think about the problem in terms of the 4 questions:
- Who cares?
- What do they want/need?
- What do we need from them to give them what they want/need?
- Why do we need that? (And do we really need that?)
In the next post, we’ll talk about using this information to model a graceful process for the people who care.
If you liked this piece, you might also like:
This post on my entree into live coding (in case you’re interested in real-time programming demonstrations)
This series about Structure and Interpretation of Computer Programs—in which I share what I learned in a week-long course on the classic book
The listening series—Unrelated to a specific programming problem, but hopefully useful 🙂