Refactoring Session #1: Statements, Lists and Inheritance

I’ll try something new today: I pick a piece of code from the web and see what improvements I would make to it, using small refactoring steps.

I got across the code on Twitter: Joshua Ogunyinka asked about the safety of the deletion in the `CompoundStatement` destructor. He posted the code on ideone, but as far as I can see it is a simplification of part of his “MaryLang” compiler. You can find the project on GitHub.

Please note: this means the code is taken out of context. It may be simplified to an extend that makes some constructs seem unnecessary complex, so I might oversimplify in my refactorings. In addition, it is a single text, the original would be separated into at least a header with the definitions and a `main.cpp`.

Follow the steps on GitHub

I put the code on GitHub and commited every single step, as I would have in a real refactoring session. The single commits may feel very small sometimes, but larger commits can mean that you have to repeat a lot of work if you go down a wrong path. With a better test coverage I would probably have been bolder, but it is better to be safe than sorry.

The original code

Here is the original code from IdeOne, except that I changed the indentation to two spaces instead of four and put the opening curly braces of class and function definitions on the same line as I usually do on this blog.

#include <iostream>
#include <vector>
#include <memory>
 
template<typename T>
struct List {
  List(): _list() {}
  virtual ~List() {}
 
  inline void Append( T const * t ) { _list.push_back( t ); }
  typedef typename std::vector<T const *>::const_iterator  const_iterator;
 
  inline const_iterator cbegin() const { return _list.cbegin(); }
  inline const_iterator cend() const { return _list.cend(); }
 
private:
  std::vector< T const * > _list;
}; // struct List

struct DoubleWord {
  DoubleWord( double c ): c_( c ){}
  double c_;
};
 
struct Word {
  Word( int i ): i_( i ) {}
  int i_;
};
 
std::ostream & operator<<( std::ostream &os, Word const & t ) {
  return os << t.i_ << " ";
}
 
std::ostream & operator<<( std::ostream &os, DoubleWord const & t ) {
  return os << t.c_ << " ";
}
 
struct Statement {
  virtual void Analyze() const = 0;
  Statement(){}
  virtual ~Statement(){}
};
 
struct YetAnotherStatement: Statement {
  inline void Analyze() const final { std::cout << t << std::endl; }
  YetAnotherStatement( int i ): t{ ( double ) i * ( 10.6 / 0.7 ) } {}
  DoubleWord t;
};
 
struct OtherStatement: Statement {
  inline void Analyze() const final { std::cout << t << std::endl; }
  OtherStatement( int i ): t{ i } {}
  Word t;
};
 
struct CompoundStatement: Statement, List<Statement> {
  CompoundStatement(): Statement(), List(){}
  ~CompoundStatement(){
    for( auto b = cbegin(), d = cend(); b != d; ++b ) delete const_cast<Statement *>( *b );
  }
  void Analyze() const final {
    for( auto b = this->cbegin(); b != this->cend(); ++b ){
      (*b)->Analyze();
    }
  }
};
 
struct Declaration {
  Declaration( Statement const * const s ): s_( s ){}
  inline void Analyze(){ s_->Analyze(); }
  Statement const * const s_;
};
 
int main() {
  auto s = std::make_unique<CompoundStatement>();
  for( int i = 1; i <= 10; ++i ){
    if( i % 2 == 0 ) s->Append( new OtherStatement( i ) );
    else s->Append( new YetAnotherStatement( i ) );
  }
  Statement const * const p_s = s.get();
  Declaration d( p_s );
  d.Analyze();

  return 0;
}

A light start

For the start I like to skim the code to see if I see any obvious trivial things that can be simplified. That is nothing I would do to a large code base at once, because it just takes a lot of time and does only marginally affect the code, i.e. the big problems, if there are any, remain untouched. However, if I am to work on a specific small subset of source code it is a good start to get familiar with the code and make life a little easier later.

Wrappers

At first sight, the two structs `Word` and `DoubleWord` seem to make not much sense. The may be remainders of more complex structures or placeholders for something more complex in the original code. However, they serve no visible purpose here, so I just replace any occurrence by the wrapped types `int` and `double`, respectively. The wrapper classes including the stream operators can be removed.

Constructors and destructors

Right on the first class template `List`, we see a default constructor that is explicitly implemented to do nothing, i.e. we should use the keyword `default`. The same goes for the destructor. Since that one is virtual, we can not leave it away. That means we should also have a look at the move and copy operations.

`List` contains only a `vector`, which is fully copy/moveable, so we can default all special members there. `Statement` is empty, so it is obvious what the default does and it is sensible to loosen the reigns of the rule a bit and only default the virtual destructor. For all other classes except `CompoundStatement` the Rule of Zero applies, they need not be changed.

`CompoundStatement` itself has a nontrivial destructor due to the fact that it manages the lifetimes of the `List` elements. If we look closer it becomes apparent that were we to copy a `CompoundStatement` with a nonempty `List`, the pointers in that list would get copied as well and deleted twice eventually.

The move constructor will work, but not the move assignment since the old contents will not be deleted and therefore leak. So default and move constructor can be defaulted, the rest has to be deleted, except of course the nontrivial destructor.

Single line blocks

Blocks that consist of a single line, e.g. of function bodies and for loops, should be wrapped in their own curly braces and put on their own line. Putting things on their own line visibly separates the two separate parts of the loop – the header and the loop body. Adding the braces even on one-liners prevent errors that arise from adding more lines to the apparent block without adding the braces then.

This is somewhat a matter of taste and coding style, but many style guides stick at least to the own line for loop bodies. Most people seem to favor the separation over terseness.

inline

In the past, the keyword `inline` has been a hint to the compiler that it might try to inline a function. Modern compilers usually ignore it completely and it is only used to obey the One Definition Rule. In other words, use it only if you feel the need to define non-template functions outside a class definition.

In this code, all the functions declared `inline` are defined inside a class definition, which means that they are already implicitly declared inline. Therefore the explicit `inline` is superfluous and we should simply remove it.

private vs. public:

The member variables of `Declaration` and all subclasses of `Statement` are public. This does not seem to be necessary, and since the classes are more than plain data containers their members should be private. In fact I like to distinguish classes versus data structures by using the keywords `class` and `struct` accordingly, but I will leave those as they are in this case.

Another case is the `List` base of `CompoundStatement` which is in fact more a data member than a base class, so I should make it private, too. However, the `main()` function calls `Append`, so it’s not that trivial. This misuse of inheritance will be the next thing to go.

Here is the code we have now:

#include <iostream>
#include <vector>
#include <memory>
 
template<typename T>
struct List {
  List() = default;
  List(List const&) = default;
  List(List&&) = default;
  virtual ~List() = default;

  List& operator=(List const&) = default;
  List& operator=(List&&) = default;
 
  void Append( T const * t ) { 
    _list.push_back( t ); 
  }
  typedef typename std::vector<T const *>::const_iterator  const_iterator;
 
  const_iterator cbegin() const { 
    return _list.cbegin(); 
  }
  const_iterator cend() const { 
    return _list.cend(); 
  }
 
private:
  std::vector< T const * > _list;
}; // struct List

struct Statement {
  virtual void Analyze() const = 0;
  
  virtual ~Statement() = default;
};
 
struct YetAnotherStatement: Statement {
  void Analyze() const final { 
    std::cout << t << std::endl; 
  }
  YetAnotherStatement( int i ): t{ ( double ) i * ( 10.6 / 0.7 ) } {}
private:
  double t;
};
 
struct OtherStatement: Statement {
  void Analyze() const final { 
    std::cout << t << std::endl; 
  }
  OtherStatement( int i ): t{ i } {}
private:
  int t;
};
 
struct CompoundStatement: Statement, List<Statement> {
  CompoundStatement() = default;
  CompoundStatement(CompoundStatement&&) = default; 

  CompoundStatement(CompoundStatement const&) = delete; 
  CompoundStatement& operator=(CompoundStatement const&) = delete;
  CompoundStatement& operator=(CompoundStatement&&) = delete;
  
  ~CompoundStatement(){
    for ( auto b = cbegin(), d = cend(); b != d; ++b ) {
      delete const_cast<Statement *>( *b );
    }
  }
  
  void Analyze() const final {
    for ( auto b = this->cbegin(); b != this->cend(); ++b ) {
      (*b)->Analyze();
    }
  }
};
 
struct Declaration {
  Declaration( Statement const * const s ): s_( s ){}
  void Analyze() { 
    s_->Analyze(); 
  }
private:
  Statement const * const s_;
};
 
int main() {
  auto s = std::make_unique<CompoundStatement>();
  for ( int i = 1; i <= 10; ++i ) {
    if( i % 2 == 0 ) {
      s->Append( new OtherStatement( i ) );
    } else {
      s->Append( new YetAnotherStatement( i ) );
    }
  }
  Statement const * const p_s = s.get();
  Declaration d( p_s );
  d.Analyze();

  return 0;
}

A first impression

After we have gone through the code for the first time, what have we learned about it? We have a generic container class called `List`. It contains a `std::vector` which makes it’s naming rather odd, so we’ll have a closer look at it later.

We have a little class hierarchy of `Statement`s, with two trivial concrete classes and a little more complex `CompoundStatement`. The trivial classes seem to be there for testing and example purposes only, at least that is the impression I get from the identical use of `std::cout` and their naming.

We have the `CompoundStatement` on our list for refactoring next, since it seems to have some issues with the ownership management of the container elements. The `Declaration`, as it is shown here, seems to be only some sort of container or handle for a single `Statement`. We will touch it briefly as we go through the code a second time in more detail.

The `main()` function seems to be just an example of intended use of the classes, I won’t pick too much on it. In addition, it is the only thing that can be used as test – I used it to check that the refactored code still compiles and does not change its behavior.

Refactoring CompoundStatement

`CompoundStatement` looks odd enough to be the next point on our list: Multiple inheritance including a container is dubious, and the manual management in the destructor should be fixed by some RAII class.

Fixing the inheritance

Fixing the inheritance is relatively easy. There is no need for it, we can as well use composition, which should be preferred over inheritance. Replacing the public inheritance with a private data member breaks the compilation:

  • The compiler complains about the calls to `cbegin()` and `cend()` in the destructor and the `Analyze()` method. They are no longer inherited, so we have to call them on the new member.
  • The `Append()` method which is called from outside is no longer inherited, so we have to write a method that simply routes the call through to the new member.
struct CompoundStatement: Statement {
  // constructors etc...

  ~CompoundStatement(){
    for ( auto b = _statements.cbegin(), d = _statements.cend(); b != d; ++b ) {
      delete const_cast<Statement *>( *b );
    }
  }
  
  void Analyze() const final {
    for ( auto b = _statements.cbegin(); b != _statements.cend(); ++b ) {
      (*b)->Analyze();
    }
  }
  
  void Append(Statement const* statement) {
    _statements.Append(statement);
  }

private:
  List<Statement> _statements;
};

Fix the for loops

The for loops beg to be replaced by a range based for. However, the interface of `List` is somewhat minimal, so that is not possible. However, before we jump in and augment it with the needed `begin()` and `end()` methods, let’s have a closer look at `List` – we had that one on our list anyways.

As it turns out, `List` is only a wrapper around `std::vector`. It is not very intuitive, since for once, we kind of know what a list is from the standard library – and that is not `vector`. In addition, a `List<X>` is in fact a `vector` of pointers to `X`, so that fact is obfuscated via the template parameter, too.

When I first looked at the destructor of `CompoundStatement` I thought “how can this even compile when he calls `delete` on `Statement`, that’s not a pointer?”. Don’t mislead your readers like that.

The only thing about `List` that made it more than just a `vector` was the virtual destructor. However, it is not needed any more, since we don’t derive from `List` any more. We didn’t need it back then either, because we did not destroy `CompoundStatement` via a `List` pointer.

Now, we dismantled `List` alltogether. There is no need for it any more after we have replaced the inheritance with composition. So, we can just replace the `List` member of `CompoundStatement` with the `vector` that it is and then we are free to use range based for loops. The `List` template itself can be removed completely.

struct CompoundStatement: Statement {
  // constructors etc.
  
  ~CompoundStatement(){
    for ( auto&& b : _statements ) {
      delete const_cast<Statement *>( b );
    }
  }
  
  void Analyze() const final {
    for ( auto&& b : _statements ) {
      b->Analyze();
    }
  }
  
  void Append(Statement const* statement) {
    _statements.push_back(statement);
  }

private:
  std::vector<Statement const*> _statements;
};

Use RAII

We said we wanted to get rid of the manual memory management in the destructor of `CompoundStatement`. We also have the  copy constructor and assignment operators `delete`d because the compiler generated versions would have lead to leaks and double deletes.

The solution to dilemmas like that usually are RAII classes. For memory management that means we should use smart pointers. It is clear from the implementation of the destructor that `CompundStatement` takes full ownership of the `Statement`s we append, so the right class to use would be `unique_ptr`.

After we replace the `vector<Statement const*>` with a `vector<unique_ptr<Statement const>>` we can obey the Rule of Zero and remove all constructors, the destructor and the assignment operations from the class:

  • The generated destructor will destroy the `vector`, which in turn will destroy every `unique_ptr`, deleting the `Statement`s in the process.
  • The generated move assingment will now do the right thing, cleaning up the `Statement`s in the target before the move. No more leaks.
  • The copy constructor and copy assignment will still be deleted because the compiler can not generate them due to the deleted `unique_ptr` copy operations.

The only thing that is left to do for this refactoring is converting the raw pointer we take as parameter for `Append()` to a `unique_ptr`. This has to be done explicitly, and it brings us right to a code smell.

Take ownership explicitly

The parameter of `Append()` is a raw pointer. That interface does not make it clear, that `CompundStatement` takes unique ownership. From all we can tell from the interface, we could do something like this:

OtherStatement statement{22};
CompoundStatement compound;
compound.Append(&statement);
compound.Append(&statement);

Have you ever tried to delete a stack based object, twice? Don’t.

To fix this just fix the interface of the `Append()` method by explicitly demanding that any client pass it a `unique_ptr`. It will also make the implementation of that method much more natural. Doing that will enable us to use `make_unique` instead of `new` in the `main()` function – so in addition to the clearer interface, we also get some exception safety for free. Great!

struct CompoundStatement: Statement {
  void Analyze() const final {
    for ( auto&& b : _statements ) {
      b->Analyze();
    }
  }
  
  void Append(std::unique_ptr<Statement const> statement) {
    _statements.push_back(std::move(statement));
  }

private:
  std::vector<std::unique_ptr<Statement const>> _statements;
};
 
int main() {
  auto s = std::make_unique<CompoundStatement>();
  for ( int i = 1; i <= 10; ++i ) {
    if( i % 2 == 0 ) {
      s->Append( std::make_unique<OtherStatement>( i ) );
    } else {
      s->Append( std::make_unique<YetAnotherStatement>( i ) );
    }
  }
  Statement const * const p_s = s.get();
  Declaration d( p_s );
  d.Analyze();
  
  return 0;
}

What is left

There still are a few issues left. One of it is naming: `b`, `t` and `s_` are pretty poor names. The `Declaration` taking a pointer as constructor parameter and using it before any check for null is another. The `main()` function and most of its content looks rather unpleasant. However, much of this is owed to the example nature of the code and is not an issue in the original sources.

For this post, I wanted to concentrate on the `CompoundStatement` and the issues with the `List` template. Those were the core classes of this code snippet. We simplified one of them and got completely rid of the other, so we can be content for now.

There is one thing I really like about the original code: The use of `final` is something that can give us some more certainty about the correctness of our code, yet I haven’t seen it used too often in real code.

I have to leave a word on test here: The modifications made were fairly simple, and they were done in small steps that we could reason about. For anything more complex we should have brought our code under test first. That `main()` function dies not count; it was enough to see if the main use case compiled but not more.

Here is the complete refactored code:

#include <iostream>
#include <vector>
#include <memory>
 
struct Statement {
  virtual void Analyze() const = 0;
  
  virtual ~Statement() = default;
};
 
struct YetAnotherStatement: Statement {
  void Analyze() const final {
    std::cout << t << std::endl;
  }
  YetAnotherStatement( int i ): t{ ( double ) i * ( 10.6 / 0.7 ) } {}
private:  
  double t;
};
 
struct OtherStatement: Statement {
  void Analyze() const final {
    std::cout << t << std::endl;
  }
  OtherStatement( int i ): t{ i } {}
private:  
  int t;
};
 
struct CompoundStatement: Statement {
  void Analyze() const final {
    for ( auto&& b : _statements ) {
      b->Analyze();
    }
  }
  
  void Append(std::unique_ptr<Statement const> statement) {
    _statements.push_back(std::move(statement));
  }

private:
  std::vector<std::unique_ptr<Statement const>> _statements;
};
 
struct Declaration {
  Declaration( Statement const * const s ): s_( s ){}
  void Analyze() {
    s_->Analyze();
  }
private:  
  Statement const * const s_;
};
 
int main() {
  auto s = std::make_unique<CompoundStatement>();
  for ( int i = 1; i <= 10; ++i ) {
    if( i % 2 == 0 ) {
      s->Append( std::make_unique<OtherStatement>( i ) );
    } else {
      s->Append( std::make_unique<YetAnotherStatement>( i ) );
    }
  }
  Statement const * const p_s = s.get();
  Declaration d( p_s );
  d.Analyze();

  return 0;
}

Conclusion

This was a first try to provide a new kind of posts for my blog. After over 70 posts about clean C++ and similar topics with made up examples, I thought it would be good to show some examples on (more or less) “real world” code.

I’d like to do more of this in the future, but I need some help: Please leave a comment what you think about this format. I would also be grateful if you point me to some open source code that you think would be a good candidate for the next refactoring session.

Facebooktwittergoogle_plusredditlinkedinFacebooktwittergoogle_plusredditlinkedinby feather

12 Comments



  1. Hi! Great post, as always. I really like what you did to the original code. I would like to suggest one more thing that can be done:

    Statement const * const s_;

    This variable in Declaration is a little bit confusing for me, but maybe that’s just because I’m not used to see “const pointers to const”. Don’t you think it would be better to use “const &” for this purpose? It conveys “I don’t own it and I don’t want to change it” intent more clearly in my opinion. Then, Declaration constructor would be:

    Declaration( Statement const & s ): s_( s ){}

    and in the main():

    Declaration d{ *s };

    Reply
    1. Arne Mertz

      Hi Krzysztof, thanks for your comment.
      As I mentioned in the “What is left” section, the Declaration class and that statement pointer would definitely be on my todo list.
      A reference would be my choice in that case, too.

      Reply
  2. Luc Hermitte

    Hi,

    This is a very nice post. Something we should be doing more often.

    In the process of refactoring, you had a step where List is copiable (and movable) and has a virtual destructor. A virtual destructor is usually a sign a public inheritance and we know this doesn’t mix well with copying (well, assignment actually) as we are asking to get hit by slicing.

    In other words, I would have tried to get rid of inheritance sooner and for other reasons.

    BTW, I’d also make Statement non copyable.

    Something else that was fishy (and that may deserve discussion with the “OP”), is that List was just a vector of pointers with no responsibility over the pointers. In a pre-C++11 world, I would have enforced the responsibility and used something like a boost::ptr_vector. But still, I wonder why writing this wrapper. In case it’s used elsewhere, it may be useful to provide something — I know some people can’t get used to SL interfaces, the capitalized Append may be a sign of that or something else entirely. It’s easy to simplify this part in a blog post context. On a concrete project where the wrapper may be used dozen of times, fixing List design may be less trivial.

    Reply
  3. Richard

    Hi Arne!
    I think you chose is a very good small example.
    What is your stand on using more STL algorithms, even it doesn’t reduce the numbers of characters?
    In `CompoundStatement::Analyze` I’m tempted to write
    `std::for_each(_statements.cbegin(), _statements.cend(), std::mem_fn(&Statement::Analyze));`
    because it gets rid of all the type and reference concerns like: Do i use `auto&&` or `const auto&`.

    Reply
    1. Arne Mertz

      Hi, thanks for your question!
      Algorithms can make code clearer, because you give a known name to whatever you are doing to a range of objects. In that sense, `for_each` is not a real algorithm but a mere helper function.
      Since we have range based for I would usually not use `for_each` on complete containers. It remains useful when we have only a partial range. I’d also refrain to use things like `std::mem_fn(&Statement::Analyze)` – it’s just ugly to read.
      There should be no type and reference concerns – just use `auto&&` by default. See also my post on range based for loops.

      Reply
  4. Fred

    Hi,

    I am just a new reader of your blog and read about the last 10 articles. So this post does not come as suprising to me as it might feel for you, having written over 70 articles.

    Also, I am rather new to C++. I started learning it over 10 years ago but worked on an unrelated but time-consuming job since then. Now I want to “refresh” and extend my memory with modern C++.

    Your article is a nice read because it gives some more substance to the theory gained from books. Especially the use of unique_ptr to do RAII gave me some more solid undertanding. From the books it all feels rather abstract. Now to see it in your post, it is starting to make more sense.

    I know that “doing” is worth more than “reading” in general but reading something practical is also worth more than reading something theoretical.

    Anyway, I would look forward to an article like this once in a while.

    Regards

    Reply
  5. Odin Holmes

    Good post!

    although refactoring examples are probably the hardest to write (because you have to make big problems fit on a page) they are also very important for people learning to code better.

    A few notes of taste:
    Wrapping basic types can be quite helpful in statically enforcing correctness. At the end of the day nothing is an int, everything is a something 😉 Although this is probably not a very good example because there are no function that otherwise take 8 ints and there are no special rules (e.g. pointer – pointer is offset, offset + offset is offset, pointer + offset is pointer, pointer + pointer is error etc.)

    Rather than making everything derive from Statement why not make a statement “interface” wrapper a la concept based polymorphism. This elides problems like slicing, multiple inheritance etc. and make writing a new statement (of which I assume there are many) super easy, basically anything that is regular and has an analyze() member function is a statement.

    Reply
    1. Arne Mertz

      Hi Odin, thanks for your thoughts!
      I agree that refactoring examples are not too easy to write – I used a rather small piece of code, left a lot of things unresolved and still the post is longer than most of my other posts.

      As for the wrappers, I agree that they can help when it comes to correctness. In this example I did not see much of a benefit, because they were used only internally in the statement classes but not in their interfaces. In addition, the statement classes themselves don’t do much more than stitching the data to the `Analyze()` method, and the wrapper’s names are pretty generic. Therefore I don’t see much of a benefit for the additional code complexity.

      I don’t see how the example, especially the `CompoundStatement` could be handled with static polymorphism. A container of arbitrary size with polymorphic elements is a standard use case for dynamic polymorphism. Slicing should not be an issue here since the statements are created on the free store and passed around as pointers or references.

      Reply
      1. Odin Holmes

        God forbid someone make a local copy of one of those references 😉 In other words you have to know that its a reference to a polymorphic base class, otherwise it may bite you (ok the chance is small, its always small but small is not 0%). I hate having to know stuff, it makes me look stupid when I don’t.

        Reply
  6. Andrea Bigagli

    Great post.
    Looking at a how code can be refactored is IMHO a great learning tool, so if this kind of post is an experiment, I hope you’ll do it more frequently in the future.
    Just a minor nitpick: at the end of the “Constructors and destructors” paragraph, you say “So default and move constructor as well as the destructor can be defaulted, the rest has to be deleted.”, but you (correctly) noted a few lines above that the destructor is far from being trivial, so I think that “..as well as the destructor…” should be removed.

    Reply
    1. Arne Mertz

      Thanks for nitpicking 😉 I fixed that passage.
      Also thanks for the encouragement. I will see to doing this more often. However, it is more time consuming than writing a normal blog post, and it is not easy to pick an interesting piece of code to refactor. On the other hand it’s also not always easy to come up with new topics for normal blog posts.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *