The Case for Test-Driven Development

An interview with O'Reilly author Harry Percival

Harry Percival, author of Test-Driven Web Development with Python, discusses how he got into TDD, why you should too, and shares some tips. In the podcast above, listen to Harry talk candidly about the types of tests that make sense, what and what not to test, and at what point a program becomes complex enough to warrant testing. Below is a mostly matching text version of the same interview. Let us know your thoughts on TDD in the comments—your own war stories and what convinced you (or didn’t!).

Why write tests? How do you know it’s not a waste of time?
The theory is that it’s an investment—the time you spend writing tests will get paid back in time you don’t have to spend debugging. Also, the theory goes that tests should help you to write code that’s easier to work with, as well as code with less defects. Because having tests encourages you to refactor, and to think about design, your code should end up cleaner and better architected, and so it should be easier to work with, and your investment pays off because you’re more productive in future as well.

So that’s the theory. But the problem is that there’s delayed gratification—it’s hard to really believe this when the reward is so far off and the time required is now. So in practice, what was it that convinced me?

I first learned about testing from a book called “Dive Into Python”—it’s a popular book, maybe some of the people listening will have read it too? They may remember that Mark Pilgrim introduces testing, in fact he introduces TDD, in chapter 10. He uses the classic TDD example, which is a Roman Numeral calculator, and he shows how, by writing the tests before we even start writing the code, we can really get some help in how we implement our calculator. So he writes his tests, I should be 1 and II should be 2 and IV should be 4, and so on, and he shows how it helps us to build a really neat implementation of a Roman numeral calculator.

So I read that, and I thought “that sounds great!”. It sounds like a thing you really should do, like flossing your teeth or something. And what happened next was that I got my first real project, and I had my first real client, my first real projects, and there were deadlines, and all my good intentions went straight out of the window. And besides, from the full height of my 3 weeks of Python programming experience, I thought I was pretty hot stuff. “I can handle it without tests”, I thought, “I’ll be fine”.

And you know what? I was fine. At first.

At first I knew I didn’t really need TDD because it was a small website, and I could easily test whether things worked by just manually checking it out. Click this link ‘here’, choose that drop-down item ‘there’, and ‘this’ should happen. Easy.

But the project grew. Parts of the system started to depend on other parts. I did my best to follow good principles like DRY, but that just led to some pretty dangerous territory. Soon I was playing with multiple inheritance. Class hierarchies 8 levels deep. `eval` statements.

I became scared of making changes to my code. I was no longer sure what depended on what, and what might happen if I changed this code over here, oh gosh, I think that bit over there inherits from it—no, it doesn’t it’s overriden. Oh but it depends on that class variable. Right, well, as long as I override the override it should be fine. I’ll just check—but checking was getting much harder. There were lots of sections to the site now, and clicking through them all manually was starting to get impractical. Better to leave well enough alone, forget refactoring, just make do.

Soon I had a hideous, ugly mess of code. New development became painful.

Not too long after this, I was lucky enough to get a job with a company called Resolver Systems (now PythonAnywhere), where they do Extreme Programming, and really rigorous TDD.

Although my previous experience had certainly opened my mind to the possible benefits of automated testing, I still dragged my feet at every stage. I mean, testing in general might be a good idea, but really?. All these tests? Some of them seem like a total waste of time… What? Functional tests as well as unit tests? Come on, that’s overdoing it! And this TDD test / minimal code change / test cycle? This is just silly! We don’t need all these baby steps! Come on, we can see what the right answer is, why don’t we just skip to the end?’

So I second-guessed every rule, I demanded justifications for everything, but my colleagues were very patient with me, and as the months passed I started to see that delayed payoff—I started to see the quality of the code we were able to write, the way whenever we wanted to refactor we would just dive in, safe in the knowledge that the tests would pick up on any mistakes we made.

So that’s what convinced me, and that’s why I’m trying to share that experience with the world with this book. I’m no expert, at best I’m an enthusiastic intermediate-level tester, but I hopefully still remember enough of the beginner’s perspective to share it with everyone.

What should you test? And what should you not test?

So the zeroth rule , the simplest rule, is “test everything”. And the subsequent rules are about exceptions to that I guess. So one of the first rules I learned about was “don’t test constants”. In other words, if you have some code that says “wibble = 3”, there’s no point in writing a test that says “assert wibble == 3”. Tests that exactly duplicate your code, line for line, aren’t much use.
The second exception is: “don’t test presentation”—if we’re doing web development, you don’t want to write tests that check on, say, the exact font size you use for the headings, or the precise color of you text boxes. Those tests aren’t going to be very useful, and they’re going to be annoying when you want to make changes to your design.

With that said, there are some elements of presentation you might want to test in some way—those elements that are important to the functionality of the site. So, if it’s important, say, that your text boxes resize correctly when you resize the page, you might test that. Or if it’s important that hyperlinks should stand out and be different from regular text, you might have a test for that – testing that the formatting is different, but without saying it has to be a specific font size or color.

So that’s one way of thinking about what to test and what not to. The other debate is: are there any things that are just so simple, that they don’t need tests. What I advocate in the book is that there are benefits to having a small, placeholder test for every single function and class—no matter how simple. There’s two reasons for this.

The first is: if it really is such a simple function, or such a minimal class, then its test will be very simple too, so it shouldn’t take you that long to write.

The second is a more psychological argument. The thing is that, once you’ve decided to only test functions of a certain level of complexity, then you’re dragged into a game of deciding where the line should be. And you’re in danger of putting off the moment until it’s too late. A simple function that’s just one line then grows an if… But because it doesn’t have any tests, and it doesn’t seem that much more complex, you don’t bother to add a test yet. Later it might get more complex, and at each stage you only have small incremental increases in complexity, but the burden of doing “proper” tests becomes greater… You’re in danger of becoming like the frog in the pot of water slowly being brought to the boil. Instead, if you’ve already got a placeholder test, then there’s a much lower psychological barrier to adding a new test when you add a new bit of complexity to the function.

How do you deal with dependencies in your tests? What about the database, for example?
The classic answer to this question is: you use mocks—also known as test doubles. You use these to isolate your tests from external dependencies, and from each other. But, like many other people, I’ve come to see some of the downsides of mocks—you often end up with tests that are tightly coupled to the implementation, that just duplicate your production code. “assert that this function calls that method on this mock and then passes the result to another mocks’ and calls another method on it…”. There’s also a danger that you end up in a world that’s all mocks, and you don’t check that the individual units of your code actually operate together, as well as with the mocks from your tests. You have to start to think carefully about what additional integration tests you need, on top of your mocky unit tests.

So my reaction to that, in the past, has been “try not to use mocks”. If that means unit tests that are less isolated from one another, then so be it. If that means unit tests that end up actually talk to the filesystem, or talk to an actual database, then fine! In the Django world, you can use an in-memory sqlite database to run your tests against, and it’s very fast. But people out there (and I should say these people are probably wiser than I am) will tell you that they’re not “real” unit tests any more. They worry that your test suite will grow to be slower and slower, and that you’ll stop running it, but I’ve never found that to be a problem. In my world, the full test suite with all the functional tests as well as the unit tests takes about 10 hours to run, so we have to rely on a CI system to run all the tests anyway. Day to day, you can get away with running a subset.

I want to stress that that is just my experience however. The idealized, perfect, accepted answer to this question is a design pattern called “Functional Core, Imperative Shell”. I heartily recommend a couple of talks by Gary Bernhardt on the subject, entitled “Boundaries” and “Fast Test, Slow Test”, from Pycon 2013. The theory is that you try and separate all your business logic out from “boundaries”, ie. dependencies on the database or the filesystem, and then you make that core business logic follow functional programming patterns, and that will let you write highly isolated, perfect unit tests that won’t need a lot of mocks.

My problem is that, in the applications I’ve run across, following that pattern has never seemed worth the effort. Either the business logic has been minimal—web apps can often be very simple CRUD wrappers around a database or filesystem—or it’s something like PythonAnywhere, where we basically deal with boundaries all the time, every single function seems to involve writing to a filesystem or starting a process or calling an external API or something—that the whole “Functional Core Imperative Shell” pattern has never seemed practical. But the use cases I’ve come across may be highly unusual.

So that’s something I can only say people should try out for themselves, and see what works for them, in their own application.