isValid()? Establish invariants and avoid zombie objects

Contents

When classes have an “isValid” method or similar, the code using them often is less clear and harder to maintain. If possible, validity should be an invariant that can not be violated.

In many codebases, even in some widely used libraries and frameworks like Qt, some classes have a method named isValid or similar. As the name suggests, these methods usually return a bool indicating whether the object in question is valid. In most cases, having an invalid object means that using the object in certain ways or in any way other than calling said method is forbidden and results in unspecified or even undefined behavior.

Invalid objects may be the result of a default constructor that can not establish a useful state due to the lack of information. In other cases, constructors or setter methods that do not check their inputs thoroughly enough may also lead to semantically invalid values. In all cases, invariants that are needed to operate on objects of the class are not established and maintained.

The implications

Having objects that may or may not be valid at any given time in our code means that, in theory, we have to check the validity of these objects anywhere we use them. Doing that leads to code that is harder to read. It also leads to code that handles that invalidity, wherever we check for it, because returning early and doing nothing often is not an option.

In practice, we often skip these checks and the handling of invalidity because we “know” that the object can not be invalid at that point. A similar pattern is often seen with pointers as function parameters: In theory, we have to check for null pointers. In practice, we skip the check because that method is always called from another function that does the check.

This usually holds until we refactor the code or call the function from a location where we have forgotten the check. The call hierarchies may be deep – functions call other functions, possibly invalid objects (or pointers) are passed along until they are used (or dereferenced) without a check and chaos ensues.

Bottom line, when we work with classes that may be invalid, we have the choice between verbose code that is hard to maintain and brittle code that is hard to refactor and use correctly.

But I simply know which classes may be invalid!

We know that pointers can be null because that’s part of the feature. So we should also be able to know that a Kafurble may be invalid as well as a QVariant and a LeakyAbstractionTitle, right? As well as the other hundreds, maybe thousands of classes in our code base?

You may be smart, but not that smart, trust me. And you probably have to remember more important things than the validity details of all the classes you have not touched for months. And the new colleague in the team who has not worked with those classes for the last years absolutely can not know. And it would not change anything about the maintainability issue.

What we can do

For the pointer example, there is a simple solution: use references instead of pointers. A reference can not be null (unless you dereference a null pointer or similar undefined behavior). It is always valid.

The solution for the isValid conundrum is a similar one: Establish invariants in our classes that make them always valid. An invalid state should not be possible.

Option 1: Make invalid state impossible

I’ve mentioned that default constructors can lead to an invalid state. That is the case when there are no sensible defaults for some of the class member variables. In that case, why have a default constructor at all? If no default constructor exists, it can not produce invalid objects.

Sometimes we also can reduce the possible range of values and get a range that is always valid. Consider the size of a container or something similar. It usually does not make sense for a container to have a negative size, so instead of int for a size member variable and constructor parameter, use size_t or another unsigned integral type.

Option 2: Make any state valid

We can’t have an invalid state if all states are valid. This option often is not easy to achieve but still worth considering. Here are a few examples:

In theory, if there is nothing, we can not count it or iterate over it. This problem has been solved for ages – counting nothing gives 0, iterating over nothing does nothing.

Consider this boiled-down piece of code:

class State {
  Flag* pFlag = nullptr;
public:
  State() = default;
  //...
  bool isValid() const {
    return pFlag != nullptr;
  }

  std::vector<Color> const& getFlagColors() {
    return pFlag->getColors();
  }
};

Calling getFlagColors() on an invalid object will result in mayhem. With a slight tweak of the method’s semantics, we can still fix this:

class State {
  Flag* pFlag = nullptr;
public:
  State() = default;
  //...
  std::vector<Color> const& getFlagColors() {
    if (pFlag == nullptr) {
      static std::vector<Color> const noColors{};
      return noColors;
    }
    return pFlag->getColors();
  }
};

“No flag, no colors” seems reasonable in this case – whether it is, depends much on the problem, of course. Making any state valid, just because you can, may not be the best option and can lead to weird behavior, so keep this option in mind but be careful about it.

Option 3: Fail operations that would produce an invalid state

Very often we can neither reduce the possible inputs of constructors and/or setters to valid types nor can we make all possible states valid. In that case, the only option to not have an invalid state is to fail the operation that would produce it. Trying to produce something invalid should be an error. Sure, we’d have to handle those errors.

But still, handling errors when we want to construct an object is better than constructing something that is not valid and having to handle its invalidity throughout its lifetime.

The standard C++ way to fail an operation is to throw an exception. If the constructor of an object throws an exception because the arguments are not suitable to create something valid, then the object under construction never exists. There simply never is anything that could be invalid.

Failing the operations, most notably constructors, that would result in invalid objects, seems by far the most common option to use.

Alternatives to exceptions

In some contexts, e.g. embedded development, using exceptions is not an option. Often they even are disabled in those situations. For setters, instead of throwing an exception, we can just do nothing and return some kind of error code in case of failure. For constructors, this is not possible because constructors do not return anything.

Having an out-parameter in the constructor signature won’t help, because if the constructor does not throw an exception it succeeds and an object is created. Since we are talking about the error case, that object would have to be invalid and we’re back to square one. We’d either need the isValid method again with all its implications or we’d need to check the out-parameter, and if we forget that we still have an invalid object and no way to notice.

The solution to this conundrum is to make the constructor private and have a static factory method in the class that is responsible to create the object. If it succeeds, it should return the created object, and if not, it returns something else to indicate the failure.

Returning optional, variant & Co.

The simplest type to achieve this cleanly is std::optional: Either we get the constructed value, or we get a nullopt. Other related options include std::variant or similar but specialized types like the proposed std::expected or the result and outcome types of the Boost Outcome library. They all have in common that they contain either a valid object or something else indicating failure.

Note that something like std::pair<bool, T> usually can not be used for this kind of factory function: In case of errors, the pair would have to contain something besides the bool, and that would be that invalid object we want to avoid.

Returning pointers

Instead of returning by value, we can also return a pointer, smart or otherwise. In that case, a null pointer denotes a failure, otherwise, it points to the created object. The question that always comes up with pointers is that of memory management: Who owns the memory for the object, and how is it allocated?

In normal circumstances, the factory method can return a unique_ptr owning the object. But since we are avoiding exceptions, circumstances may not be normal. Heap allocation is costly compared to stack allocation. In embedded contexts, it is often disabled completely. Instead of allocating on the heap directly, all kinds of allocators are thinkable and often used to manage a chunk of memory suitable for the created object.

What if I have to provide a default constructor?

Some frameworks require us to provide default constructors for classes that we use in the framework. If a default constructor does not make sense for our class, that is a contradiction that needs to be solved. One solution would be to use a std::optional in the framework which can be default-constructed.

In other circumstances, we might have to write a dedicated wrapper that provides the interface required by the framework while the wrapped class still maintains the invariants that do prohibit a default constructor.

Conclusion

The possibility of invalid objects is detrimental to the maintainability of our codebases. There usually are options to make objects of our classes always valid so we have a carefree experience using them.

5 Comments

Peter Koch Larsen
4 years ago Permalink

A nice article and I basically agree, but there are exceptions. As an example take iostreams. It is nice that you can open one and let it fail as this is more or less the only way you can assure that a file will exist for the duration of some operation.
I am right now writing a similar class where I sometimes need to know if a resource exist. For that I have a constructor with a std::nothrow_t parameter. Without that parameter, the failing constructor fails.
There could also be the case where you have an object that has possibly been moved from, but I could not really find a use-case that was not a codesmell.

Ignacio Martínez
4 years ago Permalink

I am working in a huge codebase which has this same problem and I had a felling there was something smelly on it, that I could not fully articulate. This article was spot on and will help me to expose the problem to my co-workers. Thanks.

Also, in the sentence: “They all have in common that they container either or valid object or something else indicating failure.”

should it be contain?

Thanks.

1. Arne Mertz
  4 years ago Permalink
  
  Thank you for pointing out the typo. I’m glad the post can be of help!
  
Matthias
4 years ago Permalink

I prefer using IsValid methods for aggregates to support using C++20’s designated initializers (as opposed to builder patterns and their boilerplate, which can postpone validation till before the object construction as well).

One benefit is avoiding too fine granularity of getters, setters and validations. Instead the aggregate is considered a whole; and set and validated as such. This is quite convenient for rendering use cases (e.g., buffers, textures, etc.).

E.g., https://godbolt.org/z/j1qTx77rr.

“What if I have to provide a default constructor?”
* std iterators are an unfortunate example of such a use case.

1. Arne Mertz
  4 years ago Permalink
  
  An aggregate does not have invariants. It can not enforce them since every member is public. It’s just a “bunch of data”. Whether that data can be used for given functionality depends on the functionality.
  
  In your example, I’d put the check into the widget class, not into the descriptor. The descriptor supports the widget’s functionality and it’s for the widget to decide whether the descriptor is usable or not. And then you’re back to what I describe in the post: when the widget constructor is passed unusable data, the constructor has to react in order to no construct a broken widget with an invalid state.
  
  It’s a pattern I have seen a few times since designated initializers and once in a while before that: A class with a constructor that takes a bunch of data factors the data out into a dedicated “construction argument” struct.

Write clean and maintainable C++