[fix dev diary] Week 5: std::expected

Contents

In my dev diary blog post series, I document the minutiae of what I am doing for my toy project Fix. The diary can also be found in smaller bites on the GitHub pages of the project. Many of the subheadings link to the commits in question.

August 23rd

Since std::expected is not part of the standard library in C++20, I use Sy Brand’s implementation.

instroduce tl::expected as return value for possibly failing functions

Thanks to Conan, adding Sy Brand’s tl::expected to the project is a one-liner. On the CMake side, it is a dependency of the domain library, albeit a public one since I’ll be using it as a return value from the library’s API.

I started this commit by fixing an oversight I mentioned in the last diary entry: The return value of fix_error_category_t::name() was not implemented with TDD, so I added the necessary test first and then fixed the name.

Next, I added a template alias for tl::expected<T, std::error_code>: it’s a lot to read, and the second template parameter should never change, so writing expected<T> instead in the domain namespace is much clearer. Replacing the return type of title::create with expected<title> did not necessitate any changes to the tests, but the error case had to be changed from std::nullopt to a std::error_code. The error code should be of the fix domain category, so it required a new enumerator in domain_error – MISSING_IMPLEMENTATION clearly was not the correct one. The message for that error code is an empty string for now – no test yet requires otherwise.

Last but not least I added a convenience wrapper to tl::unexpected. At first sight, this only gets rid of the tl:: in the domain code, but it also restricts arguments to domain_error. For the expected alias I use, tl::unexpected can be called with any std::error_code and all enums that are convertible into it.

improve tests with a matcher for tl::expected

With expected introduced as return value, the tests could be extended to not only check for failure but also for a meaningful message associated with that failure. When writing those tests, the most natural and helpful error messages introduced the need to distinguish between titles that are too long or too short. The old “length out of bounds” enumerator had to be replaced by two new ones.

Writing the tests showed a pattern that would be required wherever the correct failure via expected has to be tested: store the result object, check that it converts to false, check the message of the error code:

  const auto result = title::create("");
  CHECK_FALSE(result);
  CHECK(result.error().message() == "title is too short");

Having variants of those three lines repeated all over the tests would be tedious and harder to read than I liked, so I looked for ways to simplify the code. The solution to that challenge is matchers in Catch2: the macro CHECK_THAT(arg, matcher) takes the argument that has to be tested and a matcher object that does the test. So I set up a test case for a new matcher and got to work implementing what was needed.

At first, I tried to use the “new style” matchers mentioned in the link above, until I realized that they do not yet exist in the version of Catch2 that is available via Conan. With those new matchers, the match function testing the argument can be templated, i.e. a single matcher class can check an expected<void>, expected<int>, expected<title>, and so on.

So it had to be either an “old style” matcher with a single match function parameter type suitable for all specializations of expected, or a templated matcher where I would have to explicitly state the argument type when creating the matcher, which would not be ideal.

Trying to find a way to solve things with a single matcher class, I went down the wrong road at first: taking expected<std::any> as the parameter to the match function does not work, because expected<void> can not be converted to expected<std::any> with the same error code, but instead it is silently converted to std::any which in turn leads to an expected<std::any> that has a value, i.e. is not failed.

The solution I settled for in the end is an intermediate inspection type that the matcher function takes as a parameter. The type has a templated conversion constructor that can take any specialization of expected and extracts the information needed for the matcher.

It may seem like a lot of work – it took me some time experimenting and failing until the matcher test case passed. However, from now on, all checks for failure do not have the three lines shown above but instead read like this:

CHECK_THAT(title::create(""), FailsWithMessage("title is too short"));

This is the level of readability I want in my tests. After all, they are not only a means of making sure that the code works but also a form of documentation.

August 25th-27th

Infrastructure matters, and it’s frustrating when it just stops working.

Who needs a network anyway?

On Wednesday morning, I started the day by writing up the dev diary for Monday. git commit. git push – nope, I got errors. Poking around a bit revealed that while Windows still had an internet connection, WSL did not. “Network is unreachable”. Not a DNS issue, I could not even ping localhost.

So, instead of writing the next tests and code, I spent hours googling for a solution. There are dozens of issues for that error message at the WSL GitHub repo, with small tidbits of information hidden between solutions for different problems. No writeup. None of the tidbits worked for me, short of resetting Windows.

So that’s what I did. I spent some hours resetting Windows and reinstalling the most important programs. About as much fun as counting the seeds on a strawberry, especially since I already had to install Windows from scratch on my work machine on Tuesday.

add git aliases for the dev diary

Annoyed as I was, I had, of course, no secured anything in WSL except for the dev diary entry I could not push. No big deal since everything I needed was in my GitHub repositories. Almost. Installing WSL2 with Ubuntu, Docker Desktop for Windows, and CLion was not a big issue.

I cloned the docker4c repository for the build toolchain and the fix repository itself and built the docker container. Something had changed in the script provided by LLVM to install clang & Co., but since clang-12 now can be installed directly in Ubuntu with apt, that was easily fixed.

I copied the saved dev diary entry back into the fix repository and used my git alias to commit it. dev-diary is not a git command. Dang. The alias was saved in the local git config and had disappeared with the reinstallation. Since it took some time to piece it together again, I added the init script to set up the aliases whenever I clone the repo somewhere.

So, now I have my workflow for the dev diary back to what it was: I typically write it first thing in the next morning. So what I did today was:

✔ ~/git/fix [main|⚑ 1]
06:58 $ ./scripts/dev-diary.sh yesterday
✔ ~/git/fix [main|…1⚑ 1]
06:58 $ code docs/

When I am done with this entry, there will be two more commands:

$ git ddy       # or git dev-diary yesterday
$ git push

And back to the code.

August 28th

On Saturday, I finished the title class, at least for now.

add character checks for title

With tl::expected and the domain error code in place, there wasn’t too much new work left to implement the test case sections about the allowed character set for titles. The tests do not cover every single character, but they express the intent well enough that a straightforward implementation will fit the requirements. That means I could still pass all tests while e.g. forbidding characters that should be allowed, but I’d have to specifically add code for that, which no test gives me a reason to do.

I have ordered the checks for the different forbidden characters from general to specific. First comes the check for non-ASCII characters. ASCII characters are in the value range 0-127. The code in question uses an int, i.e. the chars in the string are widened to int. The reason is that char may be signed or unsigned. In the former case, its value can not be larger than CHAR_MAX, while in the latter in can not be negative. Widening to int avoids warnings about either case.

The second check looks for non-printable characters. isprint returns a non-zero value for spaces, digits, letters, and punctuation, which is almost the allowed character set for titles. This check has to come after the ASCII check for two reasons: arguments to isprint have to be presentable as unsigned char. In case char is signed, the ASCII check eliminates negative values and thus avoids undefined behavior. In case char is unsigned, there are non-ASCII characters that may or may not be printable, depending on the locale, which would lead to inconsistent behavior. Having the ASCII check first eliminates that problem as well.

The last check added is for the two special characters that are explicitly not allowed in the requirements.

In addition to the test and implementation code, I also added a bit of code to improve the expressiveness of tests on failure when dealing with expected. Until now, a failure of the FailsWithMessage matcher produced output like this:

  CHECK_THAT(title::create("we \xE2\x99\xA5 unicode. Not."), FailsWithMessage("title may not contain non-ASCII characters"))
with expansion:
  {?} Fails with message 'title may not contain non-ASCII characters'

Here, the {?} stands for the expected<title> object and it does not tell me whether it contains a value or whether the error just produced a different message. Catch2 can not do any better unless I tell it how to serialize that type. Since the serialization is only needed for the tests, the typical ostream operator<< would not be the right solution here. Luckily, Catch2 provides a mechanism for cases like this one, namely the specialization of Catch::StringMaker<T>. With that in place, the expansion in the test output now looks like either the first or the second of these lines:

  unexpected(12: 'title can only contain letters') Fails with message 'title may not contain non-ASCII characters'
  expected({?}) Fails with message 'title may not contain non-ASCII characters'

The first line is shown when the expected contains a failure (but has not produced the expected error message). It shows the value of the error enumerator and the start of the message. In the second line, we still see a {?}, but this time it’s the title object that could not be serialized. For now, it’s sufficient for me to know that the expected object contains a value, i.e. the expected failure has not happened.

enforce use of title::create and store text in title

While I was implementing all the things that were required of the title::create function, I realized at some point that nothing prevented me from default constructing a title, which clearly should not be allowed. In addition, the tests still only called for a hard-coded string returned by title::str.

So I wrote tests to check that titles can only be copied and moved but never constructed except within the title class itself. I also extended the test for title::str(). To have str() return the correct title text, I had to store it, which in turn required an appropriate constructor. Having the constructor disabled default construction and made the corresponding check pass. Making the constructor private made the other check pass.

Since the std::string member is copyable and movable, so is title, and the two corresponding checks passed right from the beginning. The compiler-generated special members are doing the right thing, no need to even mention them.

Last but not least, while str() is a name we may know from the standard library, to_string is more expressive, so I refactored that name.

check that titles are trimmed

Another thing that occurred to me while I was implementing the checks was that the current implementation allowed for a title consisting of a single letter followed by a couple of spaces, enough to pass the length restriction. One might even write a title consisting only of spaces.

My first instinct was to trim the input of the create function before any of the checks. On the other hand, silently creating a title that differs from the input seems wrong. So I decided to allow only trimmed input. I also decided to check for any leading or trailing whitespace, not only blanks, because I consider e.g. a title ending with a line break as “not trimmed” and not as “contains forbidden characters”.

The implementation I chose for the check is done as “would trimming the text do anything to it?”, which, in hindsight, is way too complicated. I could just have checked the first and last character of the text. Lesson learned: reviewing your own code on the next day may yield chances for improvement.

refactor title::create

Over time, the title::create function has grown quite a bit. It now is too long and does several different things in detail, namely checking several different sets of constraints and then creating an actual title object.

The basic structure of refactoring it into several detailed functions was clear, including the names check_trimmed, check_length, and check_charset. The return types of those functions were not as clear, at least not from the beginning.

I started with functions that returned std::error_codes. The create function then contained several blocks that looked like this:

auto const error_code = check_trimmed(text);
if (error_code) {
  return unexpected(error_code);
}

It was a bit unwieldy for my taste. Returning an expected<void> would lead to similar code. Then I remembered reading that Sy Brand’s expected implementation has functionality that is not in the std::expected proposal that is meant for this exact use case. The resulting code looks quite readable to me:

expected<title> title::create(std::string_view text) {
  // clang-format off
  return check_trimmed(text)
      .and_then(check_length)
      .and_then(check_charset)
      .map([](auto text){ return title{text}; });
  // clang-format on
}

Except maybe the last line which looks a bit awkward, but I have an idea for a fix for that as well.

That’s it for this week. Did you spot mistakes I made? Would you have done things differently? I’d love to know what you think, please leave your comments and questions below.

Write clean and maintainable C++