Contents
Have you ever worked with a suite of unit tests that took an hour or so to run? I have. And I have not. Because they were only called “unit tests” but most were integration and system tests. So what was the problem?
The Situation
The whole suite we called our “unit test” suite had a little less that 3000 test cases. Some where sufficiently fast, a few milliseconds each, and some took several minutes. A few took extremely long time to run, up to half an hour.
There was another problem: Some of the tests were dependent on others. There would be times when a test would fail, simply because another test had or had not been executed before. This is clearly not what you would expect from something called a unit test.
Problem Sources
There was at least one big problem with most of those tests. They were testing parts of a program that was not modularized. They were extremely tightly coupled to other parts of the program, so it was near to impossible to test them in isolation. Many of those tests were designed to be unit tests and even looked like good unit tests, testing only a single class. The problem were hidden dependencies.
Most of the business logic was implemented in classes that would access a whole army of singletons which in turn depended on other singletons. There was a huge pile of global state that got modified during the tests. That was the cause for the dependency between different tests.
The reason for some of the long test run times was the same: Hidden dependencies. Some of the singletons were tightly coupled to a system configuration database. And because database access is slow, the tests were slow.
However, the longest running tests were no unit tests and never had been planned to be unit tests. They were simply consistency tests for the configuration database and had nothing to do with unit tests in the first place. It probably was just convenient to put them into one of the test suites, because if your tests run forty minutes anyways, it does not matter if you add five more minutes, right?
Consequences
When you have long running unit test suites like those described above, the consequences are easy to imagine. You won’t run the test suites as often. Maybe you run them once a day. Maybe you pick a part of the suite that you think is enough to cover most of the classes you touched during the day. But mostly you will wait until the next morning if the unit tests running during the nightly build have found something.
The next consequence is that you don’t write unit tests any more. Because why should you? They can’t help you much, because the unit test suite is not really usable. The consequence is that bugs get introduced needlessly, because de facto you don’t have any real unit tests.
What is a Unit Test?
So if all those tests are not unit tests, what are they? And what is a unit test? If you write tests to a unit test framework you can call them unit tests, right? Well, you can call them “unit tests”, but that does not make them good unit tests. And if you tell people you do unit testing with a suite like the one described above, prepare to see them very disappointed once they see it.
Tests that take long time to run, depend on each other and test large parts of the program and how they work together are called system tests and integration tests. They are as important as unit tests, but they are used differently and therefore should be separated in different test suites.
There are many conceptions out there about what a unit test is and what it not is. There is no single definition, so I will give you a few points about my idea of unit tests, derived from many different sources on that matter.
A unit test’s purpose is to frequently prove that you haven’t broken anything with your code changes.
That means, they don’t test your configuration files or how your libraries play together, they test the code you write, and they do so as often as possible. That has a few implications:
- Unit tests have to be fast. Some sources talk about running your unit tests every two or three minutes or so. If your tests take up a majority of those two minutes, you are not going to be very productive. Usually unit tests should complete in a matter of milliseconds.
- Unit tests should cover everything that is likely to break, but not more. Adding a test for a setter method that can’t be broken by a sane programmer is just a time burden, so don’t do it.
- Each unit test should test only one thing. That may be a single function or class, or a small group of classes that work together. In some cases, it might even be a whole library, if that whole library is necessary to do a single thing (but remember the time constraints). This is not only about the Single Responsibility Principle, but also about knowing what is broken without having to enter a lengthy debugging session if the test fails.
There are two keys to achieve these goals: Clean and simple setup and teardown functions for your test cases and mock classes that replace the parts of the program you are not interested in.
With mock classes you can control the inputs a class or function gets from its dependencies, and you can do so reliably. They decouple the class under test from the rest of the program. No waiting on slow I/O, no dependencies on other classes that might be source of a test failure, no debugging through layer and layer of code in search for the problem.
To be actually able to replace parts of your program by mock classes, the program has to have good modularization. In addition, your single classes have to be well defined in order to be able to test a single functionality and mock the rest away.
A good modularization and clean design are crucial not only for maintainability, but also for testability and therefore for stability.
Permalink
Permalink
Permalink
Permalink
So, will you write some articles about how to write good unit tests in C++?
Permalink
Yes, that’s one of the topics I’d like to cover, since good unit tests are part of the clean code philosophy 🙂
Permalink
I would just add one more point to the UT characteristics:
4. They should be written in shuch way, that the utnit tests themselves can be printed out and act like a documentation for your calss. And there should be as few comments in the UT code as possible ( preferably 0 ). Taht implies well written and easy to read tests that actually say what they test, and what are the expectations.
Permalink
Hi Kris, thanks for the addition. I totally agree with your point, but in this post I was only going for plain usability of unit tests.
I want to write another post about the documentation aspect, because that has several implications on how you structure tests and which test frameworks you will want to use. That’s where your point will come in.