I recently got an email, asking me how I approach an unknown code base. Here’s an answer.
It’s a common situation we face when we join a new project. We rarely do green field development and start from scratch. Instead, we are presented with an existing code base and now need to find our way through it. For large code bases, this may be quite a big task. Where should we begin to look, what parts of the code base are the most important?
Find someone to answer your questions
No matter how clear the code and how good the documentation, it is always beneficial to have someone who has worked on the same code base. They may or may not be still in the team or organization, but it’s always good to know someone to ask if everything else fails.
That does not mean that asking them should always be your last resort. If they’re still on the team, it’s usually a good idea to work closely with them for some time. Do some pair programming or discuss with them how you approach typical tasks.
In some cases, there is only one person who knows about the code you are going to work on, and that person may even have left the company. It may be hard to get a hold on them, and they may not have time to help you much. Still, if what they left behind is an unintelligible mess of code with no documentation and other clues, it’s their fault and they have the obligation (moral, professional, sometimes even legal), to make good for it.
Get to know the problem domain
This one seems to be overlooked fairly often. In order to understand unknown code, you have to understand what the software does – or is supposed to do. In order to do that, you have to understand the problem the software solves.
You obviously don’t need to become a complete domain expert up front, but some basic knowledge will go a long way to understand what is happening in your code base. Playing around a bit with the software can give us a feeling what it’s about.
Study the documentation
True, the cliche says developers don’t like to write documentation. Still, usually, someone did the work and has written down something.
User documentation may give you additional information about the problem domain, but there also should be some architecture docs giving you the big picture about how the software is structured. Ideally, someone has written a specific onboarding document for new team members.
Besides the obvious documents, there are other artifacts that can give us more insight into the software. For example, issue trackers can be a great source of information, e.g. if we can filter for features that have been implemented in the past.
A good suite of automated tests will show how parts of the code and the whole program are supposed to be used. A good suite of system tests should cover typical use cases, integration tests can show which components there are and how to use them. Unit tests should document how classes and functions work.
Writing tests can also be a good way to get to know the code base. If you think you know how some piece of code behaves but don’t see it documented somewhere, consider to write an automated test that verifies your assumption. That way you not only get the insight about the behavior, you also gain knowledge how to call the functionality. As a bonus, you just have contributed to the stability of the software.
The unknown code base itself
With what we know from the documentation, our domain knowledge, and larger scale tests, we can have a look at the directories and source files. The former should reflect the software architecture, while some the files should have class names that are related to domain concepts. That way, we should be able to spot the most important source files and packages.
Debugging can also help: Debug a manual run of the application or one of the system tests. Function names can give you a sense of their importance, so you know which calls to skip and where to dig deeper. Debugging a typical use case will usually bring you to the source files that are most important. Take your time to look around to get to know what happens where.
We can use statistical profilers to get a feeling which parts of the code base are most important, e.g. via flame graphs. Profile a few typical use cases from the set of system tests to get a good sample.
In addition, looking at the source code history can show us where the most work has been done. Those usually are the most interesting parts of the software for new developers, since we likely to have to change them frequently as well. We can find those hot spots by treating our code like a crime scene.
Dealing with unknown code can be difficult and hard to start with, but there are lots of different things we can do to get to know it better.