“Red-Green-Refactor” is a familiar slogan from test-driven development (TDD), describing a popular approach to writing software. It’s been both popular and controversial since the 2000’s (see the recent heated discussions between David Hansson, Bob Martin, and others). I find that it’s useful but limiting. Here I’ll describe some interesting exceptions to the rule, which have expanded the way I think about tests.
The standard three-step cycle goes like this. After choosing a small improvement, which can be either a feature or a bug fix, you add a failing test which shows that the improvement is missing (“Red”); add production code to make the test pass (“Green”); and clean up the production code while making sure the tests still pass (“Refactor”). It’s a tight loop with minimal changes at each step, so you’re never far from code that runs and has good test coverage.
By the way, to simplify things, I’ll just say “tests” and be vague about whether they’re technically “unit tests”, “specs,” “integration tests,” or “functional tests”; the main thing is that they’re written in code and they run automatically.
Red-Green-Refactor is a very satisfying rhythm when it works. Starting from the test keeps the focus on adding value, and writing a test forces you to clarify where you want to go. Many people say it promotes clean design: it’s just easier to write tests when you have well-separated modules with reasonable interfaces between them. My personal favorite part, though, is not the Red but the Refactor: the support from tests allows you to clean things up with confidence, and worry less about regressions.
Now for the exceptions.
Sometimes I see code that just needs refactoring. Maybe I’m preparing to make some changes, but it’s hard to see what’s going on because the old code is painfully cluttered. So I start by cleaning it up. I do run the tests periodically to keep them green, but there is no Red step here.
This includes refactoring of test code as well as production code. We forget sometimes that tests are code and they can rot. In fact they may be more liable to rot: they tend to be longer than the corresponding production code, stuffed with a bunch of very similar cases, and not closely reviewed. So sometimes it’s worth pausing to clean up a test which is redundant or confusing.
A similar case is when I want to refactor a method and then find out that it has no test coverage. In this case I’ll try to add a test, even if it’s just a crude one. Rough tests are better than no tests.
Sometimes I never write the test. Say the task is to move a button to the right by 20 pixels in a web page. That’s a change in production code, perhaps even an important change. So how would I capture that change in a test? Maybe the test could interpret the stylesheet well enough to check the button’s horizontal position. Or maybe a browser simulation tool, driven by a script, could load the page and check the position, or compare the rendered image with a screenshot from the spec.
But there’s a trade-off here. The risk of subtle side effects for this change is low, and the change is easy to test by loading the page manually and eyeballing it. In this case, writing a test is not worth the effort. So again there are no Red and Green steps here. I would still run the tests when I’m done, just to verify that the change has no impact on them.
On the other hand, if I’m fixing a bug that made it into production, that’s a good indication that we need the test and I should not skip it.
Sometimes the tests are really slow. This is not a disaster; tests that run only once a day are still providing useful information. But I can’t run a 10-minute test suite every time I change the code. Some people really want their tests to run in under a second; I find I can keep a reasonable flow going even if the tests takes 30 seconds, but not much more than that. So I make an educated guess about which tests are relevant, and I run only those tests. That way, at least I know that the tests are “greenish.” I try to run the full tests pretty often, say for every major commit.
Sometimes I add a failing test, but it fails for the wrong reason. So even after I implement the change correctly, the test is still failing…because the test was broken. Then there’s an extra step for fixing the test. This is generally easy, though, because all the changes were small.
Sometimes the tests are broken, but in a subtle way. Maybe they passed yesterday but they don’t pass today. Or they pass on your local machine but not on the continuous-integration server.
Once I mocked up some behavior in a Rails model class by modifying the class itself, forgetting that this was a permanent change that would leak into subsequent tests that used the same class. So the tests would fail depending on the order in which they ran, which might be different in different environments.
Another time, I found a test was failing because it depended on the current system time, rather than mocking up a test clock. A test like this might fail only on Mondays.
One more example: there was a race condition between two items in a hash. Both items had the same key, but one value was good and the other was bad. The insertion order depended on the hash table algorithm, and each would win half the time, so the test was a random coin toss.
Obviously at a time like this, I’d get back to a stable test suite before adding any more features.
Clearly the relationship between tests and code is more complicated than the basic Red-Green-Refactor suggests. This doesn’t mean TDD is broken, but it does mean we need to add some nuance.
- A tight loop is still important. You should have your hands on both the tests and the production code often, and run them, even if you’re not using the classic cycle. If you’re spending all day without touching one or the other, that’s a red flag.
- Tests are code too. Tests and production code are not completely different beasts. Tests need debugging and refactoring just like regular code does. In Rails Test Prescriptions, Noel Rappin says, “Your code is verified by your tests, but your tests are verified by nothing.” But when you’re running the tests to verify the production code, the production code is also verifying the tests. You run them together and they verify each other. Without getting too philosophical, we can take this a bit further…
- Tests and production code are full partners. When you’re maintaining both tests and production code, it’s tempting to think of one of them as the core of the product and the other one as support.
You might say that the production code is really the core, because there’s no product without it. It’s nice to add the tests if you have time, but they don’t bring any real value to the product, because the user will never run them.
Or you could look at it the other way around, taking the name “test-driven” seriously, and say that the tests are the core. Although production code is important, the tests come first because we need them to be confident that the product is correct and that we can maintain it in the future. (Some say that code is mostly to communicate with other programmers, and the fact that it runs on the machine is just a happy side effect — this is a similar idea if we think of tests as executable documentation.)
I think neither one is the core. There is no single main artifact. Tests and production code are two different perspectives, two complementary ways to describe the product. Another perspective is the documentation; another is the product knowledge inside your team members’ heads. All these perspectives are important, and for the same reason: they have value because they contribute to the product quality and make the user’s life better.
As long as the code and tests are growing together, it doesn’t matter how they grow.
Photo via Wikimedia Commons.