Contents
I’ll try something new today: I pick a piece of code from the web and see what improvements I would make to it, using small refactoring steps.
I got across the code on Twitter: Joshua Ogunyinka asked about the safety of the deletion in the CompoundStatement
destructor. He posted the code on ideone, but as far as I can see it is a simplification of part of his “MaryLang” compiler. You can find the project on GitHub.
Please note: this means the code is taken out of context. It may be simplified to an extend that makes some constructs seem unnecessary complex, so I might oversimplify in my refactorings. In addition, it is a single text, the original would be separated into at least a header with the definitions and a main.cpp
.
Follow the steps on GitHub
I put the code on GitHub and commited every single step, as I would have in a real refactoring session. The single commits may feel very small sometimes, but larger commits can mean that you have to repeat a lot of work if you go down a wrong path. With a better test coverage I would probably have been bolder, but it is better to be safe than sorry.
The original code
Here is the original code from IdeOne, except that I changed the indentation to two spaces instead of four and put the opening curly braces of class and function definitions on the same line as I usually do on this blog.
#include <iostream> #include <vector> #include <memory> template<typename T> struct List { List(): _list() {} virtual ~List() {} inline void Append( T const * t ) { _list.push_back( t ); } typedef typename std::vector<T const *>::const_iterator const_iterator; inline const_iterator cbegin() const { return _list.cbegin(); } inline const_iterator cend() const { return _list.cend(); } private: std::vector< T const * > _list; }; // struct List struct DoubleWord { DoubleWord( double c ): c_( c ){} double c_; }; struct Word { Word( int i ): i_( i ) {} int i_; }; std::ostream & operator<<( std::ostream &os, Word const & t ) { return os << t.i_ << " "; } std::ostream & operator<<( std::ostream &os, DoubleWord const & t ) { return os << t.c_ << " "; } struct Statement { virtual void Analyze() const = 0; Statement(){} virtual ~Statement(){} }; struct YetAnotherStatement: Statement { inline void Analyze() const final { std::cout << t << std::endl; } YetAnotherStatement( int i ): t{ ( double ) i * ( 10.6 / 0.7 ) } {} DoubleWord t; }; struct OtherStatement: Statement { inline void Analyze() const final { std::cout << t << std::endl; } OtherStatement( int i ): t{ i } {} Word t; }; struct CompoundStatement: Statement, List<Statement> { CompoundStatement(): Statement(), List(){} ~CompoundStatement(){ for( auto b = cbegin(), d = cend(); b != d; ++b ) delete const_cast<Statement *>( *b ); } void Analyze() const final { for( auto b = this->cbegin(); b != this->cend(); ++b ){ (*b)->Analyze(); } } }; struct Declaration { Declaration( Statement const * const s ): s_( s ){} inline void Analyze(){ s_->Analyze(); } Statement const * const s_; }; int main() { auto s = std::make_unique<CompoundStatement>(); for( int i = 1; i <= 10; ++i ){ if( i % 2 == 0 ) s->Append( new OtherStatement( i ) ); else s->Append( new YetAnotherStatement( i ) ); } Statement const * const p_s = s.get(); Declaration d( p_s ); d.Analyze(); return 0; }
A light start
For the start I like to skim the code to see if I see any obvious trivial things that can be simplified. That is nothing I would do to a large code base at once, because it just takes a lot of time and does only marginally affect the code, i.e. the big problems, if there are any, remain untouched. However, if I am to work on a specific small subset of source code it is a good start to get familiar with the code and make life a little easier later.
Wrappers
At first sight, the two structs Word
and DoubleWord
seem to make not much sense. The may be remainders of more complex structures or placeholders for something more complex in the original code. However, they serve no visible purpose here, so I just replace any occurrence by the wrapped types int
and double
, respectively. The wrapper classes including the stream operators can be removed.
Constructors and destructors
Right on the first class template List
, we see a default constructor that is explicitly implemented to do nothing, i.e. we should use the keyword default
. The same goes for the destructor. Since that one is virtual, we can not leave it away. That means we should also have a look at the move and copy operations.
List
contains only a vector
, which is fully copy/moveable, so we can default all special members there. Statement
is empty, so it is obvious what the default does and it is sensible to loosen the reigns of the rule a bit and only default the virtual destructor. For all other classes except CompoundStatement
the Rule of Zero applies, they need not be changed.
CompoundStatement
itself has a nontrivial destructor due to the fact that it manages the lifetimes of the List
elements. If we look closer it becomes apparent that were we to copy a CompoundStatement
with a nonempty List
, the pointers in that list would get copied as well and deleted twice eventually.
The move constructor will work, but not the move assignment since the old contents will not be deleted and therefore leak. So default and move constructor can be defaulted, the rest has to be deleted, except of course the nontrivial destructor.
Single line blocks
Blocks that consist of a single line, e.g. of function bodies and for loops, should be wrapped in their own curly braces and put on their own line. Putting things on their own line visibly separates the two separate parts of the loop – the header and the loop body. Adding the braces even on one-liners prevent errors that arise from adding more lines to the apparent block without adding the braces then.
This is somewhat a matter of taste and coding style, but many style guides stick at least to the own line for loop bodies. Most people seem to favor the separation over terseness.
inline
In the past, the keyword inline
has been a hint to the compiler that it might try to inline a function. Modern compilers usually ignore it completely and it is only used to obey the One Definition Rule. In other words, use it only if you feel the need to define non-template functions outside a class definition.
In this code, all the functions declared inline
are defined inside a class definition, which means that they are already implicitly declared inline. Therefore the explicit inline
is superfluous and we should simply remove it.
private vs. public:
The member variables of Declaration
and all subclasses of Statement
are public. This does not seem to be necessary, and since the classes are more than plain data containers their members should be private. In fact I like to distinguish classes versus data structures by using the keywords class
and struct
accordingly, but I will leave those as they are in this case.
Another case is the List
base of CompoundStatement
which is in fact more a data member than a base class, so I should make it private, too. However, the main()
function calls Append
, so it’s not that trivial. This misuse of inheritance will be the next thing to go.
Here is the code we have now:
#include <iostream> #include <vector> #include <memory> template<typename T> struct List { List() = default; List(List const&) = default; List(List&&) = default; virtual ~List() = default; List& operator=(List const&) = default; List& operator=(List&&) = default; void Append( T const * t ) { _list.push_back( t ); } typedef typename std::vector<T const *>::const_iterator const_iterator; const_iterator cbegin() const { return _list.cbegin(); } const_iterator cend() const { return _list.cend(); } private: std::vector< T const * > _list; }; // struct List struct Statement { virtual void Analyze() const = 0; virtual ~Statement() = default; }; struct YetAnotherStatement: Statement { void Analyze() const final { std::cout << t << std::endl; } YetAnotherStatement( int i ): t{ ( double ) i * ( 10.6 / 0.7 ) } {} private: double t; }; struct OtherStatement: Statement { void Analyze() const final { std::cout << t << std::endl; } OtherStatement( int i ): t{ i } {} private: int t; }; struct CompoundStatement: Statement, List<Statement> { CompoundStatement() = default; CompoundStatement(CompoundStatement&&) = default; CompoundStatement(CompoundStatement const&) = delete; CompoundStatement& operator=(CompoundStatement const&) = delete; CompoundStatement& operator=(CompoundStatement&&) = delete; ~CompoundStatement(){ for ( auto b = cbegin(), d = cend(); b != d; ++b ) { delete const_cast<Statement *>( *b ); } } void Analyze() const final { for ( auto b = this->cbegin(); b != this->cend(); ++b ) { (*b)->Analyze(); } } }; struct Declaration { Declaration( Statement const * const s ): s_( s ){} void Analyze() { s_->Analyze(); } private: Statement const * const s_; }; int main() { auto s = std::make_unique<CompoundStatement>(); for ( int i = 1; i <= 10; ++i ) { if( i % 2 == 0 ) { s->Append( new OtherStatement( i ) ); } else { s->Append( new YetAnotherStatement( i ) ); } } Statement const * const p_s = s.get(); Declaration d( p_s ); d.Analyze(); return 0; }
A first impression
After we have gone through the code for the first time, what have we learned about it? We have a generic container class called List
. It contains a std::vector
which makes it’s naming rather odd, so we’ll have a closer look at it later.
We have a little class hierarchy of Statement
s, with two trivial concrete classes and a little more complex CompoundStatement
. The trivial classes seem to be there for testing and example purposes only, at least that is the impression I get from the identical use of std::cout
and their naming.
We have the CompoundStatement
on our list for refactoring next, since it seems to have some issues with the ownership management of the container elements. The Declaration
, as it is shown here, seems to be only some sort of container or handle for a single Statement
. We will touch it briefly as we go through the code a second time in more detail.
The main()
function seems to be just an example of intended use of the classes, I won’t pick too much on it. In addition, it is the only thing that can be used as test – I used it to check that the refactored code still compiles and does not change its behavior.
Refactoring CompoundStatement
CompoundStatement
looks odd enough to be the next point on our list: Multiple inheritance including a container is dubious, and the manual management in the destructor should be fixed by some RAII class.
Fixing the inheritance
Fixing the inheritance is relatively easy. There is no need for it, we can as well use composition, which should be preferred over inheritance. Replacing the public inheritance with a private data member breaks the compilation:
- The compiler complains about the calls to `cbegin()` and `cend()` in the destructor and the `Analyze()` method. They are no longer inherited, so we have to call them on the new member.
- The `Append()` method which is called from outside is no longer inherited, so we have to write a method that simply routes the call through to the new member.
struct CompoundStatement: Statement { // constructors etc... ~CompoundStatement(){ for ( auto b = _statements.cbegin(), d = _statements.cend(); b != d; ++b ) { delete const_cast<Statement *>( *b ); } } void Analyze() const final { for ( auto b = _statements.cbegin(); b != _statements.cend(); ++b ) { (*b)->Analyze(); } } void Append(Statement const* statement) { _statements.Append(statement); } private: List<Statement> _statements; };
Fix the for loops
The for loops beg to be replaced by a range based for. However, the interface of List
is somewhat minimal, so that is not possible. However, before we jump in and augment it with the needed begin()
and end()
methods, let’s have a closer look at List
– we had that one on our list anyways.
As it turns out, List
is only a wrapper around std::vector
. It is not very intuitive, since for once, we kind of know what a list is from the standard library – and that is not vector
. In addition, a List<X>
is in fact a vector
of pointers to X
, so that fact is obfuscated via the template parameter, too.
When I first looked at the destructor of CompoundStatement
I thought “how can this even compile when he calls delete
on Statement
, that’s not a pointer?”. Don’t mislead your readers like that.
The only thing about List
that made it more than just a vector
was the virtual destructor. However, it is not needed any more, since we don’t derive from List
any more. We didn’t need it back then either, because we did not destroy CompoundStatement
via a List
pointer.
Now, we dismantled List
alltogether. There is no need for it any more after we have replaced the inheritance with composition. So, we can just replace the List
member of CompoundStatement
with the vector
that it is and then we are free to use range based for loops. The List
template itself can be removed completely.
struct CompoundStatement: Statement { // constructors etc. ~CompoundStatement(){ for ( auto&& b : _statements ) { delete const_cast<Statement *>( b ); } } void Analyze() const final { for ( auto&& b : _statements ) { b->Analyze(); } } void Append(Statement const* statement) { _statements.push_back(statement); } private: std::vector<Statement const*> _statements; };
Use RAII
We said we wanted to get rid of the manual memory management in the destructor of CompoundStatement
. We also have the copy constructor and assignment operators delete
d because the compiler generated versions would have lead to leaks and double deletes.
The solution to dilemmas like that usually are RAII classes. For memory management that means we should use smart pointers. It is clear from the implementation of the destructor that CompundStatement
takes full ownership of the Statement
s we append, so the right class to use would be unique_ptr
.
After we replace the vector<Statement const*>
with a vector<unique_ptr<Statement const>>
we can obey the Rule of Zero and remove all constructors, the destructor and the assignment operations from the class:
- The generated destructor will destroy the `vector`, which in turn will destroy every `unique_ptr`, deleting the `Statement`s in the process.
- The generated move assingment will now do the right thing, cleaning up the `Statement`s in the target before the move. No more leaks.
- The copy constructor and copy assignment will still be deleted because the compiler can not generate them due to the deleted `unique_ptr` copy operations.
The only thing that is left to do for this refactoring is converting the raw pointer we take as parameter for Append()
to a unique_ptr
. This has to be done explicitly, and it brings us right to a code smell.
Take ownership explicitly
The parameter of Append()
is a raw pointer. That interface does not make it clear, that CompundStatement
takes unique ownership. From all we can tell from the interface, we could do something like this:
OtherStatement statement{22}; CompoundStatement compound; compound.Append(&statement); compound.Append(&statement);
Have you ever tried to delete a stack based object, twice? Don’t.
To fix this just fix the interface of the Append()
method by explicitly demanding that any client pass it a unique_ptr
. It will also make the implementation of that method much more natural. Doing that will enable us to use make_unique
instead of new
in the main()
function – so in addition to the clearer interface, we also get some exception safety for free. Great!
struct CompoundStatement: Statement { void Analyze() const final { for ( auto&& b : _statements ) { b->Analyze(); } } void Append(std::unique_ptr<Statement const> statement) { _statements.push_back(std::move(statement)); } private: std::vector<std::unique_ptr<Statement const>> _statements; }; int main() { auto s = std::make_unique<CompoundStatement>(); for ( int i = 1; i <= 10; ++i ) { if( i % 2 == 0 ) { s->Append( std::make_unique<OtherStatement>( i ) ); } else { s->Append( std::make_unique<YetAnotherStatement>( i ) ); } } Statement const * const p_s = s.get(); Declaration d( p_s ); d.Analyze(); return 0; }
What is left
There still are a few issues left. One of it is naming: b
, t
and s_
are pretty poor names. The Declaration
taking a pointer as constructor parameter and using it before any check for null is another. The main()
function and most of its content looks rather unpleasant. However, much of this is owed to the example nature of the code and is not an issue in the original sources.
For this post, I wanted to concentrate on the CompoundStatement
and the issues with the List
template. Those were the core classes of this code snippet. We simplified one of them and got completely rid of the other, so we can be content for now.
There is one thing I really like about the original code: The use of final
is something that can give us some more certainty about the correctness of our code, yet I haven’t seen it used too often in real code.
I have to leave a word on test here: The modifications made were fairly simple, and they were done in small steps that we could reason about. For anything more complex we should have brought our code under test first. That main()
function dies not count; it was enough to see if the main use case compiled but not more.
Here is the complete refactored code:
#include <iostream> #include <vector> #include <memory> struct Statement { virtual void Analyze() const = 0; virtual ~Statement() = default; }; struct YetAnotherStatement: Statement { void Analyze() const final { std::cout << t << std::endl; } YetAnotherStatement( int i ): t{ ( double ) i * ( 10.6 / 0.7 ) } {} private: double t; }; struct OtherStatement: Statement { void Analyze() const final { std::cout << t << std::endl; } OtherStatement( int i ): t{ i } {} private: int t; }; struct CompoundStatement: Statement { void Analyze() const final { for ( auto&& b : _statements ) { b->Analyze(); } } void Append(std::unique_ptr<Statement const> statement) { _statements.push_back(std::move(statement)); } private: std::vector<std::unique_ptr<Statement const>> _statements; }; struct Declaration { Declaration( Statement const * const s ): s_( s ){} void Analyze() { s_->Analyze(); } private: Statement const * const s_; }; int main() { auto s = std::make_unique<CompoundStatement>(); for ( int i = 1; i <= 10; ++i ) { if( i % 2 == 0 ) { s->Append( std::make_unique<OtherStatement>( i ) ); } else { s->Append( std::make_unique<YetAnotherStatement>( i ) ); } } Statement const * const p_s = s.get(); Declaration d( p_s ); d.Analyze(); return 0; }
Conclusion
This was a first try to provide a new kind of posts for my blog. After over 70 posts about clean C++ and similar topics with made up examples, I thought it would be good to show some examples on (more or less) “real world” code.
I’d like to do more of this in the future, but I need some help: Please leave a comment what you think about this format. I would also be grateful if you point me to some open source code that you think would be a good candidate for the next refactoring session.
Permalink
Permalink
Hi! Great post, as always. I really like what you did to the original code. I would like to suggest one more thing that can be done:
Statement const * const s_;
This variable in Declaration is a little bit confusing for me, but maybe that’s just because I’m not used to see “const pointers to const”. Don’t you think it would be better to use “const &” for this purpose? It conveys “I don’t own it and I don’t want to change it” intent more clearly in my opinion. Then, Declaration constructor would be:
Declaration( Statement const & s ): s_( s ){}
and in the main():
Declaration d{ *s };
Permalink
Hi Krzysztof, thanks for your comment.
As I mentioned in the “What is left” section, the Declaration class and that statement pointer would definitely be on my todo list.
A reference would be my choice in that case, too.
Permalink
Hi,
This is a very nice post. Something we should be doing more often.
In the process of refactoring, you had a step where List is copiable (and movable) and has a virtual destructor. A virtual destructor is usually a sign a public inheritance and we know this doesn’t mix well with copying (well, assignment actually) as we are asking to get hit by slicing.
In other words, I would have tried to get rid of inheritance sooner and for other reasons.
BTW, I’d also make Statement non copyable.
Something else that was fishy (and that may deserve discussion with the “OP”), is that List was just a vector of pointers with no responsibility over the pointers. In a pre-C++11 world, I would have enforced the responsibility and used something like a boost::ptr_vector. But still, I wonder why writing this wrapper. In case it’s used elsewhere, it may be useful to provide something — I know some people can’t get used to SL interfaces, the capitalized Append may be a sign of that or something else entirely. It’s easy to simplify this part in a blog post context. On a concrete project where the wrapper may be used dozen of times, fixing List design may be less trivial.
Permalink
Hi Arne!
I think you chose is a very good small example.
What is your stand on using more STL algorithms, even it doesn’t reduce the numbers of characters?
In
CompoundStatement::Analyze
I’m tempted to writestd::for_each(_statements.cbegin(), _statements.cend(), std::mem_fn(&Statement::Analyze));
because it gets rid of all the type and reference concerns like: Do i use
auto&&
orconst auto&
.Permalink
Hi, thanks for your question!
Algorithms can make code clearer, because you give a known name to whatever you are doing to a range of objects. In that sense,
for_each
is not a real algorithm but a mere helper function.Since we have range based for I would usually not use
for_each
on complete containers. It remains useful when we have only a partial range. I’d also refrain to use things likestd::mem_fn(&Statement::Analyze)
– it’s just ugly to read.There should be no type and reference concerns – just use
auto&&
by default. See also my post on range based for loops.Permalink
Hi,
I am just a new reader of your blog and read about the last 10 articles. So this post does not come as suprising to me as it might feel for you, having written over 70 articles.
Also, I am rather new to C++. I started learning it over 10 years ago but worked on an unrelated but time-consuming job since then. Now I want to “refresh” and extend my memory with modern C++.
Your article is a nice read because it gives some more substance to the theory gained from books. Especially the use of unique_ptr to do RAII gave me some more solid undertanding. From the books it all feels rather abstract. Now to see it in your post, it is starting to make more sense.
I know that “doing” is worth more than “reading” in general but reading something practical is also worth more than reading something theoretical.
Anyway, I would look forward to an article like this once in a while.
Regards
Permalink
Good post!
although refactoring examples are probably the hardest to write (because you have to make big problems fit on a page) they are also very important for people learning to code better.
A few notes of taste:
Wrapping basic types can be quite helpful in statically enforcing correctness. At the end of the day nothing is an int, everything is a something 😉 Although this is probably not a very good example because there are no function that otherwise take 8 ints and there are no special rules (e.g. pointer – pointer is offset, offset + offset is offset, pointer + offset is pointer, pointer + pointer is error etc.)
Rather than making everything derive from Statement why not make a statement “interface” wrapper a la concept based polymorphism. This elides problems like slicing, multiple inheritance etc. and make writing a new statement (of which I assume there are many) super easy, basically anything that is regular and has an analyze() member function is a statement.
Permalink
Hi Odin, thanks for your thoughts!
I agree that refactoring examples are not too easy to write – I used a rather small piece of code, left a lot of things unresolved and still the post is longer than most of my other posts.
As for the wrappers, I agree that they can help when it comes to correctness. In this example I did not see much of a benefit, because they were used only internally in the statement classes but not in their interfaces. In addition, the statement classes themselves don’t do much more than stitching the data to the
Analyze()
method, and the wrapper’s names are pretty generic. Therefore I don’t see much of a benefit for the additional code complexity.I don’t see how the example, especially the
CompoundStatement
could be handled with static polymorphism. A container of arbitrary size with polymorphic elements is a standard use case for dynamic polymorphism. Slicing should not be an issue here since the statements are created on the free store and passed around as pointers or references.Permalink
God forbid someone make a local copy of one of those references 😉 In other words you have to know that its a reference to a polymorphic base class, otherwise it may bite you (ok the chance is small, its always small but small is not 0%). I hate having to know stuff, it makes me look stupid when I don’t.
Permalink
Great post.
Looking at a how code can be refactored is IMHO a great learning tool, so if this kind of post is an experiment, I hope you’ll do it more frequently in the future.
Just a minor nitpick: at the end of the “Constructors and destructors” paragraph, you say “So default and move constructor as well as the destructor can be defaulted, the rest has to be deleted.”, but you (correctly) noted a few lines above that the destructor is far from being trivial, so I think that “..as well as the destructor…” should be removed.
Permalink
Thanks for nitpicking 😉 I fixed that passage.
Also thanks for the encouragement. I will see to doing this more often. However, it is more time consuming than writing a normal blog post, and it is not easy to pick an interesting piece of code to refactor. On the other hand it’s also not always easy to come up with new topics for normal blog posts.