After having had to clean up some unusual #include techniques, I’ll put together some advice on how not to use #include – and how to use it.
Last week I set out to clean up a header I found in one of the projects of the team I work with. The particular code base is relatively old and has its warts and peculiarities. In other words, it’s legacy code, historically – or hysterically – grown over the years.
The header in question had a lot of includes – over 20 – at its end, after a few enums and typedefs. After some analysis of the source code, the following picture emerged: The project contains about 300 source files and the corresponding headers, distributed over a handful of directories.
| + include
| | + some_util.h
| | + someother_util.h
| | + ...
| + some_util.cpp
| + someother_util.cpp
| + utilities.h
| + ...
| + ...
utilities.h header – it was the one with the many includes. It included everything in the utilities/include directory. The other directories had a similar structure, including a “master header” that would #include everything in the corresponding include directory. Plus the utilities.h header.
All the source files would basically #include stdafx.h and their corresponding directory master header. Only occasionally, if something from another directory except utilities was needed, they would also #include the master header of that directory. Since the single class headers would be #included only once, into the master header, they would not even need include guards. Only the master headers had a
What’s wrong with that?
At first glance, this sounds very convenient. If we add a new class to one of the directories, just #include the header into the master header and we can use it everywhere in the directory. We can also use everything in that directory in our new class since we just included the master header in its source.
However, there is a bunch of problems that come with this technique. I won’t go into the (inconsistent) use of separate “include” directories because that’s mostly a matter of taste and convention. And of typing something like
#include "../../utilities/include/some_util.h a lot while cleaning up.
Imagine we add a new class definition that depends on a header that is #included at the end of the master header. We can not simply #include that other header in the header of our new class because it has no include guard. It also would break the pattern described above. Instead, we have to #include the new header in the master header below the one it depends on.
Then we change another class that is #included at the top of our master header to depend on the new header. That’s a problem – we now have to shuffle around all the #includes until the dependencies are ordered correctly again. Maybe we introduce a few forward declarations in the process to break cyclic dependencies that have emerged. The whole process is needlessly cumbersome. And no, include guards alone will not fix it, we still have to order our #includes in the master header.
It seriously inflates compile times.
With the pattern above, every source #includes the master header, and through that all the other headers in the directory. In addition, there’s a very good chance that one of those #includes the master header of the utilities and at least one other master header. The bottom line is that every source file #includes every single header in the project transitively. And it makes not really a difference that the precompiled header #includes one of the master headers.
All those headers contain thousands of lines of code that has to be parsed and compiled, even if the functions defined in the source file are not using those classes. By replacing only three or four of the master headers with the actually needed #includes, we could reduce the full build time of the project from 15 minutes to under 4 minutes. There’s still a lot of potential to reduce that further.
There are almost no incremental builds in this scenario
Imagine we change some code in this project. Unless the change is restricted to source files, the change will affect every translation unit. It won’t affect the behavior or the generated code, but since the headers we touched are transitively #included everywhere, the build system will recompile everything. 15 minutes compile time for another attribute of a class that is used in one single place. That’s a lot of coffee.
Don’t get fancy when it comes to #includes. Use the common pattern that has proven to work well:
* Use an include guard in every single header
* #include only the headers that contain definitions you use
* #include all the headers that contain definitions you use – don’t rely on transitive #includes
In my next post, I will go further into reducing compile time dependencies to speed up compile times.
It is not all that obvious; in fact, having larger (but common, including common order – and included into pre-compiled header(!)) headers can allow for faster compile times than having ad-hoc stuff for each cpp (which is unique and so has to be re-compiled every time). Of course, keeping only stdafx.h as pre-compiled (which is IIRC default behaviour) won’t allow it, but hey – it is not a requirement to stop after stdafx :-); from what you said – stopping pre-compiled headers after utility master could be a good thing for them to try.
In general, I’d say it should be a balance of both (BTW, on the opposite side of spectrum is having each function in a separate header, which is about as silly as include-everything policy).
P.S. I was always wondering – how do people manage to get those atrocious compile times? 😉
Nice article. One can use the static analyzer include-what-you-use (https://include-what-you-use.org) to help enforcing the suggestions.