Contents
Sometimes, a fixed set of string values is allowed as input. Often these string values are then stored, used for control flow etc. Enumerators are a better alternative.
The example
This week I paired with a colleague on a relatively simple task. A testing framework was able to deal with two kinds of messages coming from an API: errors and warnings. The API also emits info messages, and the framework should be improved to deal with those as well. No big surprise there, that trinity of info, warning, error is very usual.
Getting the functionality to work was relatively trivial. There were two Cucumber step definitions for warnings and errors, and we had to add the third one. There were two functions called by those step definitions, we added the third one. This way we added the support for info messages all the way down to the layer that accesses the API.
Everything worked. We could have checked in the code and gone for a coffee, but that wouldn’t have been clean at all. We had encountered duplicated code and added even a third copy. time to refactor.
Don’t repeat yourself
Starting at the top again, we unified the three Cucumber step definitions. Instead of having one for each type, we now had one that had an additional parameter: a string that could be one of `”info”`, `”warning”` or `”error”`. The function called from the step definition also got the message type string as parameter. Rinse and repeat, down to the API access level again.
Now we had one function or class on each level instead of three. But there still was work to do: The message type string was used on all those levels, which is not a good thing to have.
Enumerators instead of strings
We should convert those message type strings into message type enums. Having enumerators instead of strings has several advantages. Here are a few, in no specific order.
Comparisons and typos
Strings may contain any sequence of characters, even if they don’t make sense. If we have a typo somwehere in a comparison, it may be hard to spot. In contrast, enumerators are identifiers and the compiler will complain if we use one it does not recognize. Take for example this little function:
void printMessage(string const& msg, string const& messageType) { if (messageType == "waring") { std::cout << "WARN: "; //! } //... }
In our example, the marked line would never be reached, because `messageType` never can be `”waring”`, obviously. I made this typo, and my pairing partner was vigilant enough to spot it. Else I would have had to debug the code to find the problem later. Had I used an enum, the IDE and the compiler would have told me that there is no such enumerator.
Type safety
Consider again the function above. Let’s call it:
printMessage("error", "Something bad happened!");
Woops. We just tried to print a message with the text `”error”` and the message type `”Something bad happened!”`. With the message type being an enum the compiler would have warned us about this mistake.
To be fair, we should wrap the messages in their own class or structure, since we will in most cases have to pass and use the message type and the text together. We’d then still have to construct the objects of that class, probably again passing a message type and a text, and the disambiguation of both by the compiler will help us.
Switch/case
In C++, we can not use switch statements on strings. Instead we have to use tedious if/else cascades. The use of an enum allows us to use a switch/case statement instead. The added benefit is that we can get compiler warnings or warnings from the static analyzer if we forget an enumerator.
Performance
I don’t recommend to do something only because it brings a performance benefit. But in this case we get a performance benefit in addition to the improved maintainability. It comes as an extra, so it can be worth mentioning.
Comparisons of enums usually are faster than comparisons of strings. Enumerators have the size of the underlying integral type, while strings can be many characters long. In addition, switch/case statements can be translated as jump tables which can be more effective than if/else cascades.
Enums may not be guaranteed to give a better performance than strings. We can however be pretty sure that the performance won’t be worse, and that is all we should care about before our profiler tells us otherwise.
Conclusion
When you have a closed set of strings as input and/or output of a module, only use it in the interface and convert it to an enum inside the module.
Permalink
The presence of the printMessage function suggests you should probably cut out the half-way house approach of an enum and go straight to a little class hierarchy – OO rather than switches.
I’ve fixed a lot of stringly-typed code and I originally did this way – creating enums – at first happy to see switches rather than ifs but almost always end up hating the code again later because the switches should really be virtual function calls.
Permalink
Hi Pete, thanks for your thoughts!
The string-to-enum refactoring can be the first of a series of many. The enum-to-class-hierarchy could be the next, although it depends on the circumstances if it really is a good option. I plan to write about that one next week – stay tuned!
Permalink
too bad you can’t quirrey the declared enum values with some sort of compile time reflection or you could write an enum based visitor
Permalink
Well you can write an enum based visitor, but you have to write the dispatch manually. OTOH if you really need to write a visitor with different handling of each enumerator, that might actually be a case for a class hierarchy.
Permalink
Good explanation.