Contents

1 Refactoring large applications: Where and how to start
2 Conclusion

This is part 2 of my small series about cleaning up large code bases. After we get our team together and all are determined to clean up that mess, where do we start? Time for some planned refactoring!

Usually there are two possible actions we can take if we want to clean up our application code: Rewriting and refactoring. If the code base is large enough rewriting is not an option. We may be wiser having written our application once, but writing it a second time still will take its time.

Remember that we’re talking about millions of lines of code here. In addition to take time, writing from scratch also will introduce bugs. Users do not tend to tolerate bugs popping up in parts of an application that have been stable for years.

So there is basically only one option left: Refactor the code.

Refactoring large applications: Where and how to start

Let me quote Wikipedia about refactoring:

Code refactoring is the process of restructuring existing computer code […] without changing its external behavior.

“Great” some might say, “so you want to work without getting anything visible done? Who’s going to pay for that?” There’s a point. While refactoring, we are not adding functionality. To justify our work (and to get paid), we need to add value otherwise while refactoring.

Planned refactoring

Usually the argument for refactoring is that making the code more readable and maintainable pays off in the future. We still have to make sure that the time we invest in our code actually does pay off. If we don’t do the right refactoring in the right place, we can burn days, weeks, even months without having a measurable impact on the overall maintainability of our application. Therefore it is crucial that we not just open our IDE and hack away at the first file we find. We have to plan where to start and what to focus on.

Obviously, if code is good enough, we should not start to refactor it. “Good enough” can be a rather low bar. We need to find the areas that are written really badly. But even if code is an unmaintainable, disgusting mess, we need not necessarily improve it. If it just works and does never need to be changed, just leave it alone. Maintainability is only worth investing time into if someone actually has to maintain that specific piece of code.

Since we’re at the start of our mission to clean up the code base, we still might have to convince management that they get something for the time we invest. Go for the low hanging fruit. If there is bad code that you can’t figure out how it works, put it aside. If it’s impossible to test properly, leave it for now. In a large code base of years of legacy code rot you might find easier targets to improve.

Find the hot spots

To get the most value out of your refactoring, invest the time in the areas that benefit most from more maintainability. Code tends to have hot spots of activity, where most of the work happens during your usual development.

Finding those hot spots can be done simply by asking your coworkers. Everyone knows a few places in the code they frequently have to touch and that are a pure mess. Often the files and classes that have to altered frequently also are the ones that really need improvements. A file that has endured hundreds or thousands of edits has had lots more of opportunities to get ugly than a file with only five commits.

If you get too many hints where you could start or want to have a more scientific approach to finding the hot spots, there’s good news. Adam Tornhill has written his book “Your Code as a Crime Scene” on exactly that topic. If you google for that title you’ll also find his conference talks on Youtube. Use the data of your version control system to figure out where the most and larges commits have happened in the past.

Focus on a goal

Even if you concentrate on one or two hot spots, there probably is still more than enough to do. Determine the main pain points in the code you are about to refactor. Then pick a single goal you want to achieve, and stick to it.

Don’t sidetrack. Let’s pick an example: we are refactoring a god class to break it apart into several smaller classes. While we are at it, it’s tempting to fix all those errors in const correctness, naming issues and switch all those old school pointer parameters to references. Don’t go down that road. It’s too easy to get lost.

There are lots of possible goals. Improving class design is only one of them. Bringing your code under test is another, and an important one. We can’t reliably refactor code that is not covered by tests. Another important goal might be a shorter compile time. Often there’s a lot of code that depends on hot spot classes. In that case even smaller refactorings can trigger the recompilation of large parts of the application, and a single refactor-compile-test-commit cycle can last very long.

Refactoring time management

The goals we pick for our refactorings should be manageable. If a goal seems too big, try to split it into smaller goals. For example, instead of splitting that large class into all its responsibilities, factor out one responsibility at a time. That way a single refactoring task can be done in less time, fit into a sprint and leave time for others that have to work on the class.

It is very likely there will be others that are affected by the refactorings. After all, it’s a hot spot you’re working on. The daily business of fixing bugs and adding new functionality will very likely have to access the very same spots that you want to refactor. It is one thing to work with a class that looks very different because someone refactored it last week. It is another thing to implement new features in a class that is constantly changing while you are at it or merging your changes into the changed class.

For that reason it is good practice to separate maintenance and refactoring as much as possible. If you can, plan dedicated refactoring sprints. If not, make clear to the team that it is not a good idea to add functionality to class X from Monday through Wednesday because you are taking it apart.

Conclusion

Refactoring a large application is a huge task. As any huge tasks, it should be done with a plan what to do, where to do it and when to do it. And remember that this is a team game. The planned refactoring should be done in the team, or the team should be at least aware of the details of your refactoring plans.

3 Comments

Large Legacy Applications: Tests and Modularization - Simplify C++!
8 years ago Permalink
Krzysztof Kirsz
8 years ago Permalink

How did this planned refactoring work for you? I have some mixed feelings. From my experience those planned refactorings have never been successful. Several thing can fail.
The most spectacular was when after a lot of negotiations with stakeholders we got the green light to do the big refactoring and then, in the middle of work, the managers decided that they changed their minds and made us stop and go back to implementing new features. As a result, the code was arguably left in even worse state then before the half-finished refactoring. But i guess that was also the effect of bad planning.
Other thing i see is that the developers tend to neglect the efforts to write the clean code, knowing that in a month or two there will be some big refactoring projects, so they can fix everything then.

I feel like the only way to have the reasonably clean code is by convincing the team to apply boy-scout rule on a daily basis. This means: “while you make a new feature, try fixing something else a bit”. It doesn’t force anyone to do a lot, Just one improved variable name at a time makes a difference in the long run.

Btw. i work on 4M+ loc project, so i totally get the problems you describe here.

1. Arne Mertz
  8 years ago Permalink
  
  The crucial part really is to break those refactoring packages down to achievable chunks. Not doing so means committing to spend too much time on a single step, with a larger margin of error in the estimate, larger risk and so on.
  It also helps communicating progress to management: getting a tiny bit of work done every day or two is conceived very differently than working on the same huge task for weeks, even if the actual work is the same. Management is more likely to pull out of a situation where they don’t see progress, especially if the concept of cleaning up a chide base is new to them.
  
  I actually had some unexpected success in a situation where management had me stop in the midst of a larger refactoring: Many the team did not practice or even know the concept of the Boy Scout Rule. After the interrupted refactoring I wrote a mail explaining what parts I had completed and what I had not come around to do, and that those latter parts had left an ugly mess in some places. I asked them to look out for those places in the future whenever they were working on affected code and fix them – I explained how. They actually started to not only look out for these specific issues but also for other problems and fix them as they went.

Write clean and maintainable C++

Large C++ Legacy Applications: Planned Refactoring