In many legacy code bases we encounter functions that get their parameters passed by plain pointers. Often those pointers are expected to be not null. In this post I am going to discuss why that is a bad practice and how such code can be refactored.
A plain pointer found in legacy code can have different semantics. It can carry ownership of the object it points to. However, in that case it should be changed to be a smart pointer, to explicitly state the ownership in the code and to prevent exception safety issues.
In this post I am going to assume that such issues have been taken care of and that any plain pointer found does not have ownership, so what remains is whether the pointer may be null or not.
Difference of Pointer Versus Reference
There are only two real differences between pointers and references in C++:
- Pointers can be null, references can not.
- Pointers can be redirected to point to another object, which is not possible with references.
These two points are the cause for some other differences: References must be initialized with the object they have to refer to, because of the second point. References are automatically dereferenced, which is not possible for pointers because of the first point and because the dereferencing of a null pointer would cause undefined behavior. The different access via the dot operator as compared to the arrow operator is just syntactic sugar that clarifies that fact.
Yet another difference is the slightly different behavior of `dynamic_cast`: because references can not be null, the cast throws an exception in case of failure when applied to a reference, while it returns a null pointer when applied to a pointer. For further information about `dynamic_cast` read my post about casts.
Pointers as Function Parameters
A plain pointer passed to a function usually means that the function should somehow use the object. Changing the pointer to refer to another object does not make much sense, so the only useful difference of a plain pointer and a reference as parameters is that the pointer may refer to an optional value that is not always present, i.e. it might be null.
In such a case, the plain pointer is the right thing to use, and the function should check for null unless the argument is just passed on to another function. The null case should be handled correctly and have a meaning. Just throwing an error or doing nothing is usually useless. If a function does not work with null pointers, the parameter should be a reference instead to safe a possibly needless check for null.
Pointers are often passed through several functions where each function tests for null, so the same pointer gets checked multiple times. Changing a parameter to reference and thus giving the responsibility to check for null out of the function can therefore reduce the count of such checks greatly in a code base.
If a function parameter can or may not be null, use a reference instead of a pointer.
Plain Pointers Stored Inside Classes
Sometimes plain pointers get stored inside a class, either as single pointers or in a container. Usually that is the case when it is clear that the objects whose pointers get stored outlive the objects that store the pointers, otherwise some kind of smart pointer should be considered to avoid dangling pointers.
There are several issues to be considered fur such a pointer storage. They affect the methods that accept new pointers to be stored, methods that return stored pointers and methods that work on those stored pointers internally.
The interface of such methods should be defined and documented in a way that clarifies if and when null pointers are accepted and/or returned. If null pointers are not possible or allowed, accept and return references and convert them to and from the stored pointers internally.
A special case are functions that search for an object and possibly fail. It is a design decision whether such a function returns a pointer that is possibly null, or if it returns a reference and throws an exception in case of failure. Both ways can be reasonable, and the decision often depends on whether it is normal for such a function to fail.
A consistent and well documented interface of such a class can greatly facilitate the reasoning about the possibility of null pointers in code that uses or gets used by the class.
Refactoring From Pointer to Reference
If I find a function that has a pointer as parameter that can or may not be null, I perform a series of steps to refactor it to accept a reference instead:
First change the function signature from pointer to reference. Of course, the constness of the pointer should be preserved, or if possible . After that, find any use of the parameter in the body and change the member access from `->` to `.`, or add an address of operator, where the pointer value was used, e.g. in function calls. The function should compile now.
Remove any null checks of the parameter, they are not needed any more. However, make a note what happened when the original pointer was null, if it affects the callers (throw an exception, return an error code etc.).
Compile all code. The compiler will tell exactly where the function gets called with a pointer. Fix any call site by dereferencing the pointer. If needed, add a check for null first. In that check, add the error handling you found in the original function.
Do not further refactor the call sites for now. If they need refactoring, take a note to return later. It is better to refactor one function at a time instead of starting multiple refactorings at different sites.
Return to the original function and find the spots where the parameter you changed is used. If it is passed to other functions, check if they could use the same refactoring. Put the ones that need refactoring on the top of the list, i.e. refactor the callees before the callers.
Compile, run your tests. Clean up any mess you find in the original function. Test again, check in.
Such a refactoring can be done in little time, depending on the size of the code base and the number of call sites. If you stick to refactor only one function at a time, it is relatively save, even if the test coverage of the function is not good. I do such refactorings in down times, e.g. when I have to wait for some automated task to be finished.