Structural Verification: Catching Data Model Clusterfumbles Before They Happen

Reading Time: 6 minutes

I recently finished up a series here showing you the Raft distributed consensus algorithm, implemented in Python. In the final post of that series, we talk about code design—specifically, how a system’s design affects the tools that a developer needs in order to implement it. Raft was designed with the explicit goal of understandability, and the resulting specification made it clear to me (or clearer to me than most code bases) what functionality would probably end up belonging where.

How do we streamline a software system’s design? We have talked about this a little bit in this prior post about what we can learn from rocket launchpad design, and also this talk review. As I find myself in charge of more complex, more impactful, and more critical software systems, I think its incumbent upon me to learn (and introduce to you!) additional strategies to build systems that work, and can be maintained by others.

Today, I want to introduce you to another idea: structural verification.

What is structural verification and why should you care?

Suppose you’re building a relatively complex app with a relatively complex data model. You had lots of conversations with your team and development is going smoothly until—what’s this?—something is wrong. Is it the implementation? You stare at the code. No, it seems like that code does exactly what you want it to do. The tests all pass. Every debugging pass catches nothing. You roughly simulate what you think the data model looked like when the problem occurred. This time it works fine.

So you dig deeper. You pull your production database to recreate the exact situation that caused the problem. Finally, you can reproduce. D’oh! It turns out, the data models are in a configuration that you never intended them to get into! Model A and Model B both point to Model C. You only expected one model to ever point to Model C.

This configuration never occurred to you because you cannot picture why the heck somebody would want to do this. And maybe nobody did—rather, someone just got confused while using the UI, and this is what they made, and now the system doesn’t work.

Hopefully, it’s quick fix. Often, with data models. it isn’t. A database schema might need to change. You might need migrations.

Even if you can fix it in code, what you’re probably doing is adding a conditional. An edge. Another thing to find a place for, and to maintain.

This example is relatively simple, but it’s possible with a complex system to end up with weird states involving 17 models and 13 classes—clusterfumbles that are realistically too complex to have expected the team to anticipate.

Structural verification allows us to check for these kinds of things before we implement the system.

Introducing: Alloy

Alloy bills itself as a programming language-agnostic structural verification tool. I don’t think of it that way: I think of it as a domain-specific programming language in and of itself, one that exists for the express purpose of modeling and checking constraints on our system. (Tune in for an upcoming post where I get on a soapbox and talk about programming languages). In any case, Alloy allows us to describe our data model, then helps us check for configurations that shouldn’t happen.

Take, as an example, the NASA Landsat data processing pipeline I introduced over here. Basic idea: this app gets data from Landsat satellites and performs a series of tasks on it in a specific order, as delineated by a research team, to get it ready for observation by researchers for specific purposes like detecting kelp or measuring ocean temperatures. Here’s a picture I’ve drawn of the data model:

And here I have written out some of the requirements that any instance of this data model must satisfy in order for it to work properly:

This is what we’ll model with Alloy.

In the following video, I walk you through this diagram and requirement list, introduce you to Alloy and its development environment, and incrementally build up a model using progressively more Alloy syntax until all of our requirements in the list are satisfied.

Here’s the description and show notes from the live stream:

You have seen me stream about theia, the colloquial name of the NASA Landsat data processing pipeline that I work on.

Here is a blog post describing that app.

Here is a video tour of that app.

In this stream, we take a step back and talk about risk mitigation techniques that I can use to make sure this app works. I use automated unit tests to make sure methods do what I expect and integration tests to make sure the pieces fit together as I expect. But that kind of testing helps me catch regressions and issues that I anticipated. It’s less useful for helping me catch issues that I did not anticipate.

Enter structural verification. In this stream, I show you a declarative description of the structure of theia’s data model, written in a domain-specific language called Alloy. You’ll see a little bit of Alloy syntax, a number of class diagrams, and a few slip-ups (sorry, kids—that’s live coding for ya :).

I am a relative beginner with structural verification, so we will follow up this stream with a pairing stream where I bring in an expert, Hillel Wayne, to yell at me for all the things I got wrong! Then we’ll fix them. Here is where you can see Hillel’s work.

I took Hillel’s “Software Modeling with Alloy Workshop,” which helped me get started with Alloy. Here is the Eventbrite where Hillel announces his workshops (tabbed to “past events” so you can see the specific workshop I mean, but look at “upcoming workshops” for chances to attend yourself!)

Also in this stream, I mention Worrydream’s work on domain specific languages. This is the piece I recommend for understanding what I’m talking about and why it’s important:

This gives us the basics of our model. In the next post on this, I’ll make some changes to the model I have written here under the guidance of Hillel Wayne, who sits on the Alloy board and teaches structural verification professionally.

That live stream will happen at 3:15 PM Central Time on Thursday, October 8th, streamed to YouTube and Twitch!

If you liked this piece, you might also like:

The Raft series, of course! Learn to implement a distributed system from the comfort of your home, complete with gifs and smack talk!

Lessons from Space: Edge Free Programming—about what launchpads have in common with software, and what that means for you and your work!

A Watcher’s Guide to Edge-Free Programming, a talk by Michael Feathers—highly recommend this talk.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.