In the last two weeks I have already written about legacy applications, about how it has to be a team effort and how to plan the refactoring. This week’s post is about the tests we need, about modularizing our code base and rewriting instead of refactoring parts of it.
In order to make refactoring a safe thing and not risk breaking our software everywhere, we need tests. The better our test coverage, the safer we are applying all kinds of changes to our code.
The problem is, that to write good unit tests that deal with small parts of our application, we need to have good modularization. Without that, testing a single class can easily require us to set up tons of dependencies, including database connections and other stuff that make our tests not only slow but also a pain to write.
Modularization typically is one of the first things that goes over board when a code base grows unchecked for several years. We may still have modules or packages by name, but they all depend on each other and the reality is that we have a Big Ball of Mud architecture.
So what we need to get good modularization is to refactor our architecture.
What we need for refactoring are tests.
What we need for tests… You get the dilemma.
Start big, start small
So obviously we can not bring small units under test. What we can do is bringing larger units under test. If there is nothing smaller, start with the whole application. Testing the big picture is better than testing nothing at all.
We also should not start with the big and crazy refactorings, because those are more likely to break something. We can however apply small, provably correct steps. Invert the condition of an
if statement and swap the then/else blocks. That kind of thing. They won’t get you very far, and you better have them double checked by a reviewer, but it’s better than nothing.
We can use these small steps to break the huge pieces into big pieces, the big pieces into smaller modules, the modules into units. Whenever we have separated a piece of code from the rest, we can write faster and more detailed tests for it.
This can seem like a slow and tedious process – and it sure can be. The problem it that not doing it will eventually make the actual maintenance of our software an even slower and more tedious process.
I know, in the last post I wrote that rewriting the whole application usually is not an option. The risks involved and the investment needed are both too high. But what about rewriting a single component, a handful of classes?
If the component is not too big, the investment may be manageable. If it is exceptionally messy or badly designed, rewriting may incur a lower cost than refactoring and give a better result. In such cases rewriting can be worth the higher risks.
In some cases there might even be no alternative to rewriting some parts at some point. This can be the case if we have to get rid of the dependency on a specific framework that is used throughout the component, or if the component is only the actual wrapper around the framework.
Cons of rewriting
The higher risk of rewriting code lies especially in the errors that you can make. The old code in legacy applications usually is time tested. It may be badly designed, messy code, but it usually mostly works. Writing something equivalent from scratch gives plenty of opportunity to mess it up at some point.
To make sure that the old and new code do the same thing, one can define a common API for both versions and test the old and new component through that API. To do so, the old component has to be decoupled from the rest of the application first. This however is necessary anyways – we have to decouple it to a certain degree if we want to get rid of it.
Another thing to consider is that writing the new version of the component takes time. During that time we have to maintain the old component, i.e. we need to fix bugs and add new features in both versions until the new component can take over. This double maintenance effort has to be taken into account when planning the rewrite.
Starting from scratch can be motivating after having crawled through the swamps of legacy code for years. We can use modern techniques and clean code practices from the start. The only thing we are bound to is that interface the new version shares with the old version of the component.
One of the most rewarding experiences I had in the last years was such a rewrite: I designed a DSL framework (with an embedded and an external version) to replace a lot of historically grown spaghetti code. Since I started from scratch, I could develop the framework using TDD.
A switch in front of the old code enabled us to route calls to the functionality to the new code gradually. In the end we could replace thousands of lines of code with DSL code about 10-15% of the size. With the external DSL we even had the possibility to change program behavior without having to recompile the application.
Both securing our refactoring efforts with tests and rewriting whole components has the prerequisite of modularizing our code base. Identifying possible modules and properly separating them from the rest of the application is crucial in legacy code bases. We simply can not deal with a multi million line code base all at once.
Next week I’ll get back to something more C++ specific. The last installment of this series will be about the tooling available to us to deal with large legacy applications.
I would add: it’s totally worth equipping yourself with coverage data and write tests against that coverage data.
I feel like people don’t measure enough what code paths new tests are exercising for real. This way you discover tests that barely scratch the surface or opportunities to remove redundant tests and improve compile/run times.
BTW, muted on Twitter?
thanks for the comment. It is indeed good to have some coverage data instead of just having a “good feeling” about the test coverage. And no, I haven’t muted you on Twitter 🙂
I wrote an article on the test suite property you are describing. https://www.madetech.com/blog/semantically-stable-test-suites
The key thing here is that it is hard to make test suites semantically stable using test coverage alone, you need something more.