Making Yourself at Home in Legacy Code, Part 1: Options

Have you ever come across a legacy code base with dark and dusty corners that have been untouched for years? Many of us have, because green field development of shiny new applications is much rarer than one would think.

So what can you do if you get dropped into a pile of old code that has been there since the beginning of times (at least as far as the version control system can tell)? Here is what options you have in my experience. 

The First Reaction

It’s your first day on the job. At least the first serious day, which means you have installed most of your development environment, got a little introduction in how the architecture of the project is (read: how it once was meant to be) and changed your desktop background. Now it’s time to get your hands on the code and fix a little bug.

You open the source file that your colleague told you is likely to contain the bug and are instantly paralyzed. The code looks so… ancient. The style is “C with classes” form the 90s. Plain pointers everywhere. Handwritten linked lists and other absurdities. Nothing looks like the “modern C++” you know from books. A little sob escapes your mouth.

I Can Fix This!

The little bug is all but forgotten. With a mess like that, it’s a wonder there are no bigger problems, right? If you clean up the code, everything will be better, and probably the bug will just go away. So you start a huge refactoring session and clean up the complete class, and a the functions that use it.

You find a small unit test suite that looks like it belongs to the library you are working in, run it and it passes all tests. You start the complete test suite for the project, just to be sure, and since it runs for over an hour you go home, it’s been a long day.

The next day you come back to work and the there are lots of failed tests in the suite. You try to launch the program but it won’t start. It’s completely broken. You ask the colleague that sent you to that class in the first place, but he can’t help you. The code does not look familiar to him any more. So you roll it all back and start again.

Take it Slow

Rushing in and ripping it all apart usually does not help. It rather breaks things. If code has not been touched for a decade then there are reasons for that. It might either be that the code is stable and just did and does not need to change, or it is fragile and touching it will probably break it.

In the latter case the code might be a known source for trouble. Everybody knows it’s there, and that it’s not a beautiful piece of code, but nobody has ever had the time to fix it. Whenever someone passes it during a debugging session he speeds up a little just to get past its ugliness.

So before you start to refactor something, you should get to know it as it is. You can’t fix it if you don’t know how broken it really is. You can’t assert it still does the right thing the right way, if you don’t truly know what the right thing and the right way are. So at first, just fix that little bug. By doing so, you get to know the code a little better.

Approaching the Code Again

So after you have taken some time to get to know the code, after you know what it does and how it does it, what are your options?

Giving in

This is the easiest option. Once you know the code and how it works, there seems no need to change it. You know it’s a source for trouble, You know it’s there, and that it’s not a beautiful piece of code, but there is so much else to do and no time to fix it. Whenever you pass it during a debugging session you just speed up a little…

Sounds familiar? Right. You just have become part of the problem. Giving in is just surrender. You loose the drive to change anything. In fact you are likely to port that poor style to new code you write in the vincinity of the old. I often have seen bad, seemingly old code that was in fact very new, but copied and pasted from other older places without much care.

Don’t go that way. You as the new team member are the most likely candidate to turn things to the better. If the code looks so old and there is no plan to change it, your teammates have probably already given up. If you do, too, the next new member will have an even harder time to change things, because now there is one person more to give a kick in the butt.

Little Changes

This option is also easy at first, but it needs some patience. You don’t do the big changes, but whenever you come across a little imperfection, you fix it. Not all day long, but for each thirty minutes of work you spend five minutes in doing some small, straight forward fixes.

Such fixes can be of many different kinds. Factor out a function if someone spent ten lines of code to produce a string value in a twenty line function. Rename a variable of function. Convert a plain pointer to a smart pointer or make it a stack variable. Change a function parameter from pointer to reference.

Such things don’t seem to have much impact in the big picture. But small things matter. The cleaner code is easier to understand and when the day comes to make the big changes, each bit of cleanliness and understandability counts.

The other positive thing about small changes is that they are good for your own wellbeing. You just feel a little better if you cleaned up a bit of the big mess instead of just passing by, looking away. And if you come back, you see small signs of “I have been here” in the code, sings that not all is lost. Signs that keep you from giving up.

If you repeatedly make such small changes to the parts of the code you most often have to touch or debug through, it really feels like make yourself at home in the foreign code, like decorating your new flat. Just keep in mind, you don’t live here alone, your colleagues live here as well, so don’t go overboard.

If you don’t get sloppy and don’t get carried away by making the changes too big, you can get by without unit testing the small steps. Of course, if there is a unit test suite covering the code you change, it is mandatory to run it. And if there is a unit test suite and you can get your changes under test easily, you should do so.

Bigger Refactorings

Sometimes small steps just don’t cut it. Sometimes a whole class or set of classes is just designed badly in the first place or the needs have changed and the original design is not appropriate anymore. In such a case a bigger refactoring is needed.

Bigger refactorings are a topic on its own that I will cover in the next post. Don’t get started just yet. Get to know the code and your colleagues first, you will need their support.

Previous Post
Next Post

9 Comments



  1. Good post. I felt like reading my own diary 😉

    @Gregory: Refactoring is not something you do for your own wellbeing. And it is even more important in the code written ( and read ) by many people. If fact the more people see the code, the bigger the cost of bad code gets. Imagine big project with one 100-line function full of void pointers, casting and what not, that 100 developers debug once a week + 1 of them introduces a fix/new feature from time to time. That will cost each of them some time that can sum up to some big money lost in the long run ( not mentioning the number of bugs the bad code generates ). Technical debt is a serious issue.

    Reply


  2. I’ve witnessed “cosmetic” refactorings, fixing non broken things, that introduced e.g. temporary copies yes.

    So somehow, replacing “legacy cruft” with “newest C++isms” isn’t always and immediately synonymous for “more quality, more maintainable code and less bugs”.

    I’m not saying you’re entirely wrong but:

    “The other positive thing about small changes is that they are good for your own wellbeing. ” <– That's where I start to disagree. That may be true in a personal code base where after all it's ok if code is an end and not a mean. Otherwise, maybe you're just personalizing (erm "improving") some code that really didn't need it.

    Reply

    1. I am not advocating messing with code that does not need it. But since this blog is about writing simpler to read, maintainable and clean code, it considers unnecessarily hard to read or easily breakable old code less valuable than “perzonalized” code with “new C++isms” if that is simpler. So hard-to read code does need a touch up, no matter whether it is old or new.

      And I do so “in the real world”, too. Even if I introduce some temporary copies. At least I do so 90% of the time, because it is much more valuable to have a clear to read class than quenching a few CPU cycles out of a function that gets executed only a few times per hour.

      C++ code can be very performant, and it often can be made even more performant. But that does not mean it has to be as performant as possible in parts of a program where performance simply does not really matter.

      Reply

        1. I am sorry, I did not want to give the intention I was calling you a perf zealot. I was pointing at those people out there who would sacrifice anything to get some nanoseconds out of unimportant code.

          My approach is the exact opposite: I would luckily sacrifice a few milliseconds for ease of maintenance as long as it does not render the program unresponsive or hurt overall performance of critical parts too much.

          Reply

  3. To sum up, after few weeks, you fixed a little bug (the one you were assigned to) and you refactored everything because?

    Congratulations, now the code looks nice (the most subjective metric). But it doesn’t necessarily do anything more or better. I assume you took your profiler of choice and made sure the new shiny code performs equally well right? Right ? 😉

    Reply

    1. Erm, no. After a few days on my first job I tried to refactor a class instead of just fixing the bug, and it did not go well. I learned the lesson and rolled back the changes like I described in the story above.

      Often refactoring is not about “looking nice” but about maintainability. About fixing things that are not a observable bug in program behavior but weaknesses in the code, about making the code easier to understand and harder to break.

      The small steps mentioned in this post are usually not affecting performance in any measurable way. Mostly because they are not changing anything in terms of algorithms and data structures, but also because most of the old crufty legacy code tends to be not in the performancne sensitive parts of a program.

      So no, I don’t start a profiler whenever I change a variable name or factor out a little function in performance outback. Do you?

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *