Sometimes we need unformatted data, simple byte sequences. At first glance,
std::string might be a fitting data structure for that, but it is not.
Think about data we get from networks, a CAN bus, another process. Serialized binary data that has to be interpreted before it can be used in our business logic. The natural way to manage this kind of data is having sequence containers like
std::byte or, lacking C++17 support,
unsigned char. Sometimes we also see
uint8_t, which on many platforms is
However, there is another contiguous container for 8-bit values that seems tempting to be used as a means to transport byte sequences:
std::string. I am not sure about the reasons to do this apart from
std::string being slightly less to type than
std::vector<unsigned char>, meaning that I can not see any reason at all. On the contrary, it is a bad idea for several reasons.
Many string operations rely on having zero-terminated character sequences. That means that there is exactly one null character, and that is at the end. Plain byte sequences, on the other hand, can contain an arbitrary number of null bytes anywhere. While
std::string can store sequences with null characters, we have to be very careful to not use functions that take
const char*, because those would truncate at the first null character.
The major reason not to use
std::string is semantics: When we see that type in our code, we naturally expect a series of readable characters. We expect some text. When it is misused as a series of raw bytes, it is confusing to maintainers of our codebase. It gets even worse if we expose the use of
std::string as a raw data container via an API that has to be used by someone else.
Especially in locations where we convert text to serialized raw data or vice versa, it will be very confusing to determine which
std::string is text and which is raw data.
Apart from confusing the developer, having the same type for two nontrivial uses can be error prone as it neglects the safety mechanisms the strong typing of C++ gives us. Imagine for example a function that takes some text and some serialized raw data – both would take
std::string and could easily switch places by accident.
std::vector<unsigned char>. While this already nicely says “sequence of bytes”, consider using a typedef. For even stronger typing, use a wrapper structure with a meaningful name.