When Jessica Kerr stepped onto the stage at Windy City Rails, she asked us to rethink our sometimes not-so-positive visceral reaction to testing our code. Maybe what we need isn’t to break up with TDD. Maybe we need to go deeper in and work on the relationship. (Related article from Jess here).
Jess’s suggestion for this is to take some wisdom from testing in the functional programming community: to consider including generative testing or property-based testing in our test suite, rather than reaching exclusively for the example-based tests we’ve come to know.
Generators are classes that produce random input data, and in the context of tests we can use them to generate, say, 100 random examples, and run all 100 examples through our tests.
Our tests will then evaluate those examples on properties. WARNING, Object-Oriented programmers: in the context of functional programming from whence these tests come, properties are not fields on a class. Instead, they are assertions: the kinds of things you want to make sure are true in a test. In property-based testing, we test for all these properties for each of our randomly generated examples.
This is different from our current example-based testing, in which we take one explicitly specified example and check that a hard-coded expected value equals the actual value.
But wouldn’t it be better, Jessica argued, if, instead of hard-coding a specific solution, we could specify for our program the bounds within which an acceptable solution should fall—and ensure that a slew of randomly-generated inputs would behave as expected based on the specified bounds? It sounds wonderful. It sounds robust. It sounds like an excellent way to ensure that we are including the “hard cases” and “sad paths” inside the scope of our pre-deployment evaluations.
But the downside that you’re imagining is true: these tests are wayyyy harder to write.
So what makes these harder-to-write tests worth writing?
Well, if your application has lots of crazy, asynchronous interactions and purposely incorporated randomness, that’s what it’s going to take. That’s how Jessica got into this kind of testing in the first place. When these tests passed, they made her feel confident that the system worked, despite its unpredictability on a sub-macro scale.
Property-based testing offers a particular advantage, though, that resurfaced throughout Jessica’s talk: through its toughness, it guarantees better problem-solving and better code design. It allows us to break parts of our system into smaller pieces and change functionality while relying on bounded tests for reassurance, instead of expecting them to break as sometimes happens with example-based tests.
In that way, property-based tests move us toward an ideal that we can all agree on: tests that enable change, rather than preventing change by breaking when we want to refactor or change functionality.
Jessica imparted some lessons on property-based testing:
- test the API
That can be hard to do with example-based tests, whose full input is hardcoded with the full output: all the tests have to be changed when one thing in api changes. Instead, we can now bound the API and make sure it acts correctly all the time.
Want to try it? You aren’t stuck doing it on your own. The gem rantly helps you make property-based tests in Ruby—an initial stab at a Rubyish version of the inspiring testing framework (yep, those three words just got written together), scalacheck.
In doing this, Jessica learned another trick:
2. write the tests backwards.
She starts with what she wants to assert and goes backwards until it becomes executable.
For example, if she wants to assert that a purchase was made correctly, she’ll first need a purchase and an order. She’ll need, before that, a product that gets ordered, for which she will need a generator. That generator is the root level. Some options to help you make generators: the factory_girl we know, or even generatron, a gem by Jessica.
Why her own gem? Because she wanted to handle her corner cases individually with custom edge cases. We’ll get more into why. As for the next lesson:
3. generators compose from small to large
Compose is another one of those words that means something different to object oriented and functional programmers. In OO, “composing” means taking two things, adding them together, and making a third, different, thing. In functional programming, it’s taking two of the same thing (like generators) and creating one of the same thing (a bigger generator).
So why are we going through all this? Well…
4. generators are worth the time (and pain) to build them.
They’re extremely reusable for different kinds of tests and even in irb. And they become even more useful when you employ a few tricks to supercharge your property-driven tests:
5. specify valid input—completely.
When you test total purchases plus events that influenced it, your testing allows doing a lot of verifications, maybe even a complete specification of the given method. It’s more work than just using the data and dumping it on the floor: it’s returning everything you need to determine the output is correct.
That’s useful for a few reasons:
- business reasons: you have context around the data. So your software is not a black box to your business people anymore, and they can make more intelligent feature requests.
- personal reason: you have more visibility to know if your code is working not only in tests, but also in production.
What if you don’t know al of your input values? Well, you can establish relative properties that compare two properties and determine whether things are moving correctly relative to one another. It’s an incomplete property, but it creates bounds that you can use to check functionality with uncertain data. For example, if you know relevance should be lower on fewer interactions on an ecommerce site, you can test for that relationship.
BONUS: if marketing ever changes and the above assertion is no longer true, your marketing team will want to know that!
A few things to know as you embark on this kind of test:
A) You want to expect failure—by creating failure in your tests.
Conveniently(?), generative tests do not let you leave your method half-implemented.
B) Ask the hard questions about what to do in the cases of failure.
Generative testing will drive you to do this because, if there’s a way to have a failure case, these tests will help you find it. Don’t worry—there’s a way to go about this:
C) Fix the simplest case that fails.
In fact, your gems can help you: rantly starts with the first thing that failed and then gets simpler and simpler until it finds the minimum that will make the test fail. As soon as ou fix that, it will give the next simplest thing that fails. This is called shrinking. The problem is, it doesn’t work for custom types. So you have to define for your custom types…what is simpler? This is where custom generators come in handy.
This is all fine for starting a test suite from scratch. But how do you retrofit existing example-based testing?
Don’t rehaul the whole thing. Instead, as soon as a test bothers you, start upgrading it. When a test gets in your way, dont hardcode something else. Make it focus. And then later you can add functionality if you want it to, as described above (giving all the data you need to provide context and such).
In this way, the test documents for you what doesn’t matter—valuable to know as you identify the things that do.
This sometimes creates a need for 2 ways to calculate the same thing. In these cases, one option is to generate an output, use it to generate input, use that to regenerate output.
So, for an overall generative TDD workflow:
- start with expected actuals
- narrow assertions
- check assertions that don’t matter
- check assertions that do
When an edge case comes up, more often, your test is just wrong. And fixing it is not a waste of time, nor should it make you feel like you misplaced your trust in this type of testing. That test broke because you wrote it wrong, and you wrote it wrong because you were thinking wrong. Fixing the test helps you fix your mental model of the problem.
After all, the point of testing is to stop us from coding a solution to a problem we don’t fully understand. And by this method, we are forced to understand it.
Care plus randomness turns trial and error into something that we can use as the basis for deductive logic. -Jessica Kerr
So is there a place for example-based tests, then?
Yes. Indubitably. Example tests behave like anecdotes that humans can easily understand to glean to expected functionality. They function as excellent documentation. And, I would add, they allow you to check first for the specific, common cases most likely to arise in the event that you want to deploy, but you’ve still got a corner case to figure out.
But as much as we need anecdotes to understand, we need evidence to verify. And for that, generative tests make an ideal tool.
So rather than use one type of test or the other, we can use both in alignment with their respective strengths.
If you want to see more from the expert herself, Jessica has discussed these ideas for TDD extensively over at her blog.