Contents

Recently, I had to make a relatively small design decision – which type to use for a small range of values. As usual in C++, there were a number of choices that have their benefits and drawbacks.

Setting the stage

The task was to extend a feature in our code base, where data is serialized into a vector of bytes to be sent send over a network. What the different sections of data contain (there were differrent sources) and where they have to be written into the vector is defined in a human readable file.

Previously, the payload offset, i.e. the position in the vector to write the data into, had to be a positive index. The data would be written at that location. The vector length would be increased if required to fit the data.

Some of the sources could contain data of unknown length, e.g. by reading it from a file. To be able to add further data after serializing data of unknown length, the configuration should be extended to allow a keyword 'append' instead of the fixed positive index for the payload offset.

Parsing the configuration was straightforward, the interesting design decision was which type to use for the payload offset and how to interpret it. In other words, how should we model "A positive index or 'Append'" as a type?

The previously used type was uint32_t, more than enough for values that would typically not exceed two digits, even though there were no other technical limitations given.

The paylod offset value is be created in the parser and then passed on through a few functions to the location where the data is inserted into the payload vector:

const auto requiredSize = payloadOffset + size(newData);
if (payload.size() < requiredSize) {
    payload.resize(requiredSize);
}
std::copy(begin(newData), end(newData),
          begin(payload) + payloadOffset);

With the addition of the 'append' keyword, the code will mostly need an additional line:

const auto payloadOffset = (configuredOffset == APPEND)
         ? payload.size() : getValue(configuredOffset);

That is, after we renamed the output of the configuration parser from simply payloadOffset to configuredOffset

Take a break now and think what type and representation you would have used for the configuredOffset.

(Let's not fret about whether we should have used a size_t or any other unsigned type in the first place)

Possible choices

Use a sentinel value for APPEND

This is probably the lowest effort solution and one that I see in many code bases for similar scenarios. Since uint32_t is way larger than needed, the highest possible values will never be used and can be reused as sentinel values.

A typical solution for this would somewhere define constexpr uint32_t APPEND = 0xFFFFFFFF; or std::numeric_limits<uint32_t>::max or other ways to represent the same value. The getValue function used in the example above would not be needed:

const auto payloadOffset = (configuredOffset == APPEND)
         ? payload.size() : configuredOffset;

While this is quick and will work well in a simple situation like this, it lacks semantics and type safety. We're using arithmetics with the payloadOffset, and while that is not the same variable as configuredOffset, nothing prevents us to use the wrong variable as they are of the same type.

Like above, but as a strong type

To improve the situation a bit, we can use a struct to wrap the parsed value:

struct PayloadOffset_t {
    uint32_t value;

    auto operator<=>(PayloadOffset_t const&) const = default;
};

constexpr PayloadOffset_t APPEND{0xFFFFFFFF};

The call to getValue in the example code above would be replaced by explicitly extracting the wrapped value:

const auto payloadOffset = (configuredOffset == APPEND)
         ? payload.size() : configuredOffset.value;

The type safety of this solution also makes the refactoring much easier as the compiler will tell us all the spots where we have to change the old uint32_t to a PayloadOffset_t.

However, we still use sentinel values here, and since we are introducing a new type anyway, we might as well explore other possible solutions.

Use std::optional

Let's have a look at value ranges. Leaving aside the actual range of values used in our use case, the previous range of values was the range of "numbers" that we can possibly parse, let's call it N.

With the internal use of uint32_t, the technical limitation to N is 2^32, but that can be changed by using a different underlying arithmetic type.

With the 'append' keyword, we conceptually extended the range of possible configurable values from N to N+1, since it is now "either a number or 'append'".

A standard library type that has the N+1 value range is std::optional, since it can be any of the underlying type's values, or std::nullopt. We could use it like this:

using PayloadOffset_t = std::optional<uint32_t>;
constexpr PayloadOffset_t APPEND = std::nullopt;

And later:

const auto payloadOffset = (configuredOffset == APPEND)
         ? payload.size() : configuredOffset.value();

Note tha I could have skipped the definition and use of APPEND completely, e.g. putting optional's methods to good use:

const auto payloadOffset = configuredOffset.value_or(payload.size());

At first, this code looks rather elegant. However, for the uninitiated reader, the reason why we are doing this is completely lost. The only connection of std::nullopt meaning 'append' is buried somewhere in the parser code.

In fact, while the number of possible values of optional happens to be the range we need, equalling std::nullopt and 'append' is a misuse of the type, as it is contrary to std::optional's semantics. std::nullopt means "no value", while 'append' is a value, albeit not a numerical one.

With that in mind, can we construct a type that has the range of values we need and can model the semantics we want it to have?

Use std::variant

The one-to-one translation of what we want to model, specifically "either a number or 'append'", can be achieved using std::variant:

struct Append_t{};
using PayloadOffset_t = std::variant<uint32_t, Append_t>;
constexpr PayloadOffset_t APPEND{Append_t{}};

auto getValue(PayloadOffset_t const& payloadOffset) {
    return std::get<uint32_t>(payloadOffset);
}

Here, I actually wrote the getValue() function as it is a bit more boilerplate than jut calling a value() method or similar. When we also add a comparison operator for Append_t, the line to calculate the actual offset now looks like described earlier:

struct Append_t{
    constexpr auto operator==(Append_t) const {
        return true;
    }
};

/* ... */

const auto payloadOffset = (configuredOffset == APPEND)
     ? payload.size() : getValue(configuredOffset);

Looking at this, there may be a slight discomfort in having the call to std::get inside the getValue function and it leaving to the caller to check that there actually is a uint32_t inside that variant. If they fail to do so, this will result in an exception that we don't have in any of the other possible choices.

While I would argue that we do the check in the same line as the call to getValue, and that our tests should catch such errors, we may also rewrite the line a bit:

const auto payloadOffset = configuredOffset.visit(
    [&payload]<class T>(T value) -> uint32_t {
        if constexpr(std::is_same_v<T, uint32_t>) {
            return value;
        } else {
            return payload.size();
        }
    }
);

This is a bit more verbose but safer. There are variants to this solution, e.g. the fact that we use variant can be considered an implementation detail, so we can wrap it into its own class including a function that encapsulates the offset calculation, but those are finer points that I'd like for you to decide on.

Tradeoffs

I have talked about readability, possible bugs, and semantics already, but what other tradeoffs are there? One is the size of the type we use – with std::optional and std::variant, the type has to be a bit bigger – in my case 8 byte instead of 4 byte for uint32_t. Beyond that, I do not see much of a difference when it comes to performance. All mentioned types are trivially copyable, so there is no overhead in passing them from one function to the next.

What would be your preferred solution? One of the above or a different one? I'd love to read about your thoughts!

2 Comments

Uli
2 months ago Permalink

You wrote “The previously used type was uint32_t, more than enough for values …”. If it has to be uint32_t, the solution with variants seems the best to me.

There is another solution, if you can use int32_t or int64_t (what I don’t know). APPEND then should be defined as -1 which can be easily distinguished from any of the allowed positive values for a position in the vector. Using int is in many cases preferable over using unsigned int and often more safe. I definitively do not want to get into the heated discussion on the Internet, just refer to Bjarne Stroustrup, see https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1428r0.pdf

Your post shows very well some possibilities of modern C++ with all the subtleties to be considered. To know these possibilities can also be helpful in completely different problems.

1. Arne Mertz
  2 months ago Permalink
  
  Thanks for the reply, Uli!
  I have to confess, I had not considered signed integral types.
  Looking at it, I would argue that the benefits and drawbacks of ‘APPEND’ being signed -1 are pretty much the same as it being unsigned 0xFFFFFFFF. We’d still have to be diligent about checking for the sentinel value and not do arithmetics with it. Except I think that there’s an additional drawback: In the end, we would have to properly convert the calculated offset and required size which then would be signed, to size_t.

Write clean and maintainable C++

PayloadOffset_t: A small type design challenge