Emerging Patterns – Refactoring Session #4

Contents

In my last refactoring session post, I wrote I had been lazy when it came to refactoring Fix. Actually, I’ve slacked on purpose, repeatedly skipping the refactoring steps of the red-green-refactor cycle of TDD and BDD. Doing this brought me into a little mess. When I refactored my way out of it, I found an emerging pattern.

Factor out methods

This session will focus on a central function of Fix. When a request comes in, the method process checks if the request is a valid one, triggers whatever action is needed and then returns the result in form of an HTTP status and some JSON content. I’ll go step by step through the commits I made back when I refactored this. As usual, I will provide links to the commits on GitHub.

RestApi::Response RestApi::process(std::string const& requestUri, 
        std::string const& requestMethod, 
        std::string const& requestContent) const {
  if (requestUri == "/issue/new") {
    if (requestMethod != "POST") {
      return status400();
    }

    try {
      auto issueJson = Json::parse(requestContent);
      auto parsedIssue = IssueData::parse(issueJson);
      if (!parsedIssue.success) {
        return status400();
      }

      auto requestedIssue = parsedIssue.issueData;
      return {
        storage.insertIssueIncreasedID(requestedIssue.toStorageJson()), 
        200
      };
    } catch(std::invalid_argument &) {
      return status400();
    }
  } else if (requestUri == "/issue/list") {
    if (requestMethod != "GET") {
      return {
          Json{}, 405
      };
    }

    Json data{
        {"issues", Json::array()}
    };
    auto all_issues = storage.allIssues();
    for (Json issue : all_issues) {
      issue["data"].erase("description");
      data["issues"].push_back( std::move(issue["data"]) );
    }
    return {
        Json{ {"data", data} },
        200
    };
  } else {
    std::regex issue_id_regex{"/issue/([0-9]*)"};
    std::smatch id_match;
    if (std::regex_match(requestUri, id_match, issue_id_regex)) {
      auto id_string = id_match[1].str();
      Json issue = storage.issue(std::stoul(id_string));
      if (!issue.empty()) {
        return {issue, 200};
      }
      return { Json{}, 404 };
    }
  }

  return {
      Json{}, 405
  };
}

These are 54 lines of code, which is way too much. There are up to three nested conditionals and lots of returns. I do encourage early returns, but this is a bit much and really messy.

If you look closely, the method consists of three parts. There is one part for every form of resource: /issue/new, /issue/list and /issue/<id>, where <id> is some unsigned integral number. After some checks, the code just jumps into doing whatever is needed for each resource. This is a violation of the Single Responsibility and Separation of Concerns Principle as well as of the Single Layer of Abstraction Principle.

So, the first step is quite simple and gives us some much-needed clarity: Factor out these actual implementations into their own methods. After that, the method looks like this (full commit on GitHub):

RestApi::Response RestApi::process(std::string const& requestUri,
        std::string const& requestMethod,
        std::string const& requestContent) const {
  if (requestUri == "/issue/new") {
    if (requestMethod != "POST") {
      return status400();
    }
    return issue_new(requestContent);
  } else if (requestUri == "/issue/list") {
    if (requestMethod != "GET") {
      return {
          Json{}, 405
      };
    }
    return issue_list();
  } else {
    std::regex issue_id_regex{"/issue/([0-9]*)"};
    std::smatch id_match;
    if (std::regex_match(requestUri, id_match, issue_id_regex)) {
      auto id_string = id_match[1].str();
      return issue_id(id_string);
    }
  }

  return {
      Json{}, 405
  };
}

Down to 27 lines in one commit, and the overall structure just became much more obvious.

Magic numbers

The next commit is not obviously related to the last one. Nevertheless, it is somewhat important to make the method more readable. In some places you see me return an empty JSON together with a 405. In the code we just factored out of the function, there were similar returns with a 404 and a 200, where the JSON was not empty.

These are HTTP status codes, and you also see calls to a method status400, which does the same thing with a different status. Those numbers are not very expressive. Surely anyone has come across a 404, but in the case of a given error, is that the right code to return, or should it be another error code?

I decided to remove the status400 method add some well-named static factory methods to the RestApi::Response class (full commit on GitHub):

struct Response {
  Json content;
  int httpCode;

  static Response ok(Json response) { return {response, 200}; }
  static Response badRequest() { return status(400); }
  static Response notFound() { return status(404); }
  static Response methodNotAllowed() { return status(405); }
private:
  static Response status(int st) { return {Json{}, st};}
};

Missing tests

The next commits were minor. A negative condition in the new issue_id function could be inverted. More importantly, the case for the /issue/<id> resource was inconsistent with the other two because it did not check for the correct HTTP method. So I added a test and fixed it (full commit on GitHub).

Commit number 4 fixes an issue that had become obvious with the clearer names for the HTTP status codes. When the URI did not fit any of the three cases, the process function previously returned a 405 error (method not allowed) instead of the correct 404 (resource not found). Another test, another fix (GitHub).

A similar problem got fixed in the next commit, where for /issue/new a 400 status was returned instead of a 405 when the wrong HTTP method was used. Test and fix… (GitHub)

TEST_CASE( "Bad URIs give 404" ) {
  StorageMock storage;
  RestApi api{storage};

  auto response = api.process("/some/bad/uri", "POST", "");

  REQUIRE(response.content == Json{});
  REQUIRE(response.httpCode == 404);
}

TEST_CASE( "Querying an issue by ID...", "[issue]" ) {
  StorageMock storage;
  RestApi api{storage};

  std::string uri = "/issue/32";

  ...

  SECTION("... returns status 405 if the method is wrong.") {
    auto response = api.process(uri, "POST", "");
    CHECK(response.httpCode == 405);
    CHECK(response.content == Json{});
  }
}

In case you wonder about the syntax in that snippet: I am using Catch for unit tests.

Finding a pattern

Let’s have a quick look at how the process function looks now:

  if (requestUri == "/issue/new") {
    if (requestMethod != "POST") {
      return Response::methodNotAllowed();
    }
    return issue_new(requestContent);
  } else if (requestUri == "/issue/list") {
    if (requestMethod != "GET") {
      return Response::methodNotAllowed();
    }
    return issue_list();
  } else {
    std::regex issue_id_regex{"/issue/([0-9]*)"};
    std::smatch id_match;
    if (std::regex_match(requestUri, id_match, issue_id_regex)) {
      if (requestMethod != "GET") {
        return Response::methodNotAllowed();
      }
      auto id_string = id_match[1].str();
      return issue_id(id_string);
    }
  }

This looks pretty repetitive: If the requestURI matches a specific resource, check if the requestMethod is allowed, then call the method that actually handles the request. New resources will surely be handled exactly the same way. We found an emerging pattern.

The only different is the matching part: while the first two resources use equality, the ID resource makes use of a std::regex. Luckily, string comparison is just a special case for regular expression matching, so we can unify the two. (GitHub)

  {
    std::regex issue_id_regex{"/issue/new"};
    std::smatch id_match;
    if (std::regex_match(requestUri, id_match, issue_id_regex)) {
      if (requestMethod != "POST") {
        return Response::methodNotAllowed();
      }
      return issue_new(requestContent);
    }
  }

  ...

This may feel more verbose than it was before, but bear with me, I had a goal in mind. To achieve it, things had to be still more uniform. Let’s analyze the methods that deal with the requests. Depending on the resource, they either take no arguments (issue_list), or the requestContent (issue_new) or they have to be specified by the content of the regex match.

The goal is to map the regular expressions to a function or function object that has to be executed when the regex matches the requestURI. Therefore, the next commits first wrap the different method calls into lambdas with a uniform signature (GitHub).

Then, the lambda is put into a std::function. To separate the last difference between the cases from the common code, I put the name of the allowed method into its own variable (GitHub):

  using ResourceFunction = std::function<Response(std::string const&, std::smatch const&)>;

  ...

  {
    std::string uriPattern = "/issue/list";
    std::string allowedMethod = "GET";
    ResourceFunction impl = [this](std::string const& requestContent, std::smatch const& id_match) {
      return issue_list();
    };

    std::regex uriRegex{uriPattern};
    std::smatch uriMatch;
    if (std::regex_match(requestUri, uriMatch, uriRegex)) {
      if (requestMethod != allowedMethod) {
        return Response::methodNotAllowed();
      }
      return impl(requestContent, uriMatch);
    }
  }

  ...

Now, the first three rows of this block are unique for every resource, while the regex matching, method checking and the call to the actual implementation is always the same. To get rid of that duplicated code, I extracted a structure for the regex pattern, allowed method and implementation (GitHub):

RestApi::Response RestApi::process(std::string const& requestUri,
        std::string const& requestMethod,
        std::string const& requestContent) const {

  using ResourceFunction = std::function<Response(std::string const&, std::smatch const&)>;
  struct Resource {
    std::string uriPattern;
    std::string allowedMethod;
    ResourceFunction impl;
  };

  std::vector<Resource> resources {
      {
          "/issue/new",
          "POST",
          [this](std::string const& requestContent, std::smatch const& id_match) {
            return issue_new(requestContent);
          }
      },
      {
          "/issue/list",
          "GET",
          [this](std::string const& requestContent, std::smatch const& id_match) {
            return issue_list();
          }
      },
      {
          "/issue/([0-9]*)",
          "GET",
          [this](std::string const& requestContent, std::smatch const& id_match) {
            auto id_string = id_match[1].str();
            return issue_id(id_string);
          }
      }
  };

  for (auto const& resource : resources)
  {
    std::regex uriRegex{resource.uriPattern};
    std::smatch uriMatch;
    if (std::regex_match(requestUri, uriMatch, uriRegex)) {
      if (requestMethod != resource.allowedMethod) {
        return Response::methodNotAllowed();
      }
      return resource.impl(requestContent, uriMatch);
    }
  }

  return Response::notFound();
}

Lessons learned

I did not see the pattern that emerged until after I had extracted the methods in that first commit. I knew that there was something hidden in there, but getting an overview by reducing the function size and fixing all those little bugs with the response codes were crucial to getting the necessary overview.

I did not always only refactor but also fixed some bugs or made slight behavior changes that did not break the tests (much). That some of the changes did not break tests shows that there was behavior not covered by tests. This, in turn, shows that I can’t have used TDD properly because in that case there would not be any behavior without tests.

Doing some bug fixes in a refactoring session can be OK, especially if those bugs prevent the refactoring. However, before fixing a bug it is important to either finish the current refactoring or roll it back.

Going over it again for this post revealed a few more things to do for me. That std::vector inside the function “works (meaning the test are green), but allocating memory and building it every time the function is called is unnecessary. In addition, the regex for /issue/<id> is wrong.

Previous Post
Next Post

7 Comments


  1. allocating memory and building it every time the function is called is unnecessary.

    Yep. I’d probably rather go for a constant array of (pattern,method,function-pointer-or-object-with-virtual-function) kind of stuff; there is no real need to abuse lambdas here (building three closure objects when at most one will be used, and putting them into an allocated vector – even if not really hurting because of being executed relatively rarely, it still kinda feels like a premature pessimization).

    Reply

    1. I definitely stopped too early with the refactoring there. A static array with std::function will be what I go for. Objects with a single virtual function would be overkill for now, and function pointers would mean the actual functions need to have signatures that don’t reflect what they actually use.

      Reply

  2. I’d argue for keeping 404 on readability grounds. It is IMO a case of those-magic-numbers-which-are-more-readable-than-their-named-counterparts; everybody knows what “404” is (in this context) – but interpretations of “notFound()” can vary (so when reading – I’ll need to check that notFound() is actually 404, and to keep this mapping in mind while reading). Same stands for “GET”.

    Reply

  3. When adding the static responses is there a reason you added them as methods instead of static (const) fields?

    Also another thing I’d classify as “magic numbers” are “magic strings” which in your case are the method names “GET”, “POST”, …
    I’d also refactor them out into “global” string constants.

    And last (but not least) a word of respect: I wouldn’t have been able to restrict myself of doing the YAGNI-change and made it one step more generic: A dictionary of HTTP-method to function, so a pattern can support more than one HTTP-method semantically.

    Reply

    1. The static responses are not all with empty JSONs, e.g. Response::ok has a parameter. Tho, for uniformity, I made the others static methods as well. It’s largely a matter of taste I think.

      The “GET”, “POST” etc are on my list, but one layer higher: The requestMethod parameter should not have been a string in the first place.

      You’re right about the YAGNI – I was tempted but it would have made the lookup so much more complicated, so I left it for when I actually have a case that needs it.

      Reply

    2. I think, in general, Markus is right, that “GET” and “POST” are somehow magic values, but what are the alternatives? And are they more readable? In this special case, I could live with the string literals.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *