Smelly std::pair and std::tuple

Depending on their use, std::pair and std::tuple can be code smells. That’s why we should be careful around these two.

Having a code smell is not a no-go, it’s more like a red flag. It’s one of those things that are not a problem themselves but rather a hint that there might be a less obvious problem hidden in the code.

The “Data Class” smell

In object orientation, there is a code smell named “Data Class”. It says that having a class that does not contain any logic is a hint at a violation of design principles.

In C++, std::pair and std::tuple may or may not constitute the “Data Class” smell, because C++ is not an object oriented language. However, if we find them used in an object oriented context, we should definitely have a closer look.

Cohesion and Coupling

In software, we usually want things that belong together to have high cohesion. It means that all the code that deals with the two things as a conceptually whole should be closely related to them. Usually, there is some logic associated with the data, that specifies how the values relate to each other. Things, that are not closely related should, on the other hand, be loosely coupled, i.e. they should not travel in packs.

These are the principles that might be violated when we see the “Data Class” smell. Usually, there is some logic that belongs to the data, but it’s implemented elsewhere where it does not belong. In the case of pair and tuple, we can not add logic to the class, so when there is more than just a source and a consumer for the data structure, we should definitely consider to refactor it to a proper class. If, on the other hand, the data just happens to be found together by accident, tying it up into a common data structure should be suspicious.

Poor naming

The names pair and tuple are very generic by design. Good names, however, transport a lot of information for readers of our code. Reading std::pair<bool, iterator> does not tell us anything except that there are some boolean value and an iterator crammed together in a single data structure. If on the other hand, we had the name InsertionResult, we would have an idea where those values came from.

The same goes for the access to the single members. first, second for pair and std::get<4>() for tuple tell us something about the position of the data we access, but nothing about their semantics. With named members, we do not even have to know the position, and that’s a good thing. The less we have to memorize such details, the more we can concentrate on things that really matter.

By the way, the insert methods of std::map and std::set don’t really return a std::pair<bool, iterator> – it’s a std::pair<iterator, bool>. My condolences if you spotted that without looking it up – that means you have memorized information the library could give you in a much handier way. I’d prefer to see members success and position in a std::map::insertion_result.

Since I am picking on std::map already: I’d sometimes also like to have map<K,V>::value_type be something else than a pair<const K, V>. Here, the position is much more intuitive than in the result of insert. Still, members named key and mapped would be more consistent to the key_type and mapped_type than the generic first and second.

Having said that, I consider this a gray zone in the case of the standard library. std::map and std::pair are equally generic, and the values are usually not passed around too much but quickly consumed.

Conclusion

Unless it’s a short-lived, purely technical solution with little or no associated logic, we should be wary of uses of std::pair and std::tuple. Way too often the two just are a sign of laziness because the developer who introduced them did not want to introduce a small class that bundles well-named data with the associated logic.

Previous Post
Next Post

15 Comments




  1. A few things just to think about, as I’m no expert.

    You have a machine. How does one configure a machine? Call a method per setting?
    machine.SetSetting1();
    machine.SetSetting2();
    machine.SetSetting3();
    machine.SetSetting4();

    or machine.Configure(config);

    Thought, one may say, the machine is to complicated. It’s a god class. Break it down into finer components… okay, now there’s no cohesion among the components. Should I configure all the components then pass in the components configured into the machine? Now you have a poor interface with lots of configuring to do, but perhaps better for testing and configuring.

    Anyways, the point is, practically speaking, once it becomes confusing, change it. If you can’t handle std::pair<> or tuple, and it makes your code less clear, change it. But if it does make your code clear and concise, it may make more sense. But I think that’s the point of the whole article anyways. 🙂

    Reply

    1. Yes, basically that’s the point. There are cases that are simple enough, and others that are clearly too complex, like in your machine example. In those cases you have to invest some effort to make them more readable and comprehensible. There is no clear border between simple enough and too complex, it’s a grey area. For your configuration example I’d consider something like the builder pattern and one be or maybe several standard/ default configurations. That way your code can focus on the special details that otherwise would drown in the sea of config parameters.

      Reply

  2. What I’m reading from this : For multiple return values, return a struct with members named to denote the semantics, rather than opaque classes like tuple.

    Reply

  3. @Arne: You don’t mention it (is it an introduction to another post?) but your idea about std::map and std::set, is already here in the C++17 standard: std::map::insert_return_type (and the same for ‘set’) is the return type of insert method when used with Node handle (also introduced in C++17).

    I agree on the fact that member name like .first .second are so generic that they hide the intention and decrease the code readability if used widely over the code. pair and tuple are useful, but dangerous 😉

    Reply

  4. Thank you for yet another interesting reflection, Arne.

    I agree completely on the importance of a good interface and packaging logic with data. However, writing generic code is often equally important. Lately, when I want generic code, I find myself putting the data in a generic base class and adding specific accessors and logic. With the C++11 using-directives, it is easy to pass constructors on and hiding generic parts that you do not want to expose.

    So far, it has worked very well.

    Reply

  5. I absolutely agree with Stepanov (and by extension some of what luca said). This whole piece is a solution in search of a problem. There is no such thing as a “data class smell” in C++ except perhaps that a programmer should probably have used the word “struct” rather than “class” to make things clear.

    Transforming data is what computers do. Data should not scare us away.

    The main place where I disagree with luca is that classes in C++ are what we use to implement instantiable modules. Modules are good, and separating interface from implementation is good. So “public” and “private” are useful from that perspective.

    The main place I disagree with Stepanov is that he is specifically referring to the Grady Booch definition of OOP rather than the earlier (and more authentic) Alan Kay definition. Booch defined OOP in terms of inheritance, where Kay defined it in terms of late binding and messaging. Kay’s OOP is good and more systems should be built that way (e.g. signals and slots). Booch’s OOP is bad and we should avoid it if we can.

    Reply

    1. There is no such thing as a “data class smell” in C++ except perhaps that a programmer should probably have used the word “struct” rather than “class” to make things clear.

      +1. It just so happens in the real world that there are (a) entities which DO things (let’s name these entities-which-DO-things “classes”), and there are (b) entities which are used as parameters to the function members of these classes (let’s name these-entities-which-are-used-as-parameters “structs”).

      One extreme example of a “struct” is a (parsed) message exchanged between different parts of the system (threads, computers, etc.); parsed message as such cannot possibly DO anything – it is just a bunch of data, and adding a full set of getters/setters to make it look “class-like” and avoid “data class smell” doesn’t serve any purpose except for making things less obvious (and producing source code bloat for zero reason).

      Reply

  6. You should read some Alexander Stepanov and some STL. They are years ahead of what you are writing.
    Indeed, writing classes instead of structs is the root cause of many problems.

    The whole Object Oriented dogma is often fundamentally wrong. Both from a design perspective and a performance perspective.

    ALL the code i’ve read, that was nicely “encapsulated” with classes, was a real mess that added only sintactical noise to the underlying things. The result was really harder to read, code and mantain than having simple data and functions accepting and returning this data.

    If it was for OO fanatics, vectors would not return items by reference and you could not access elements via their offset.. “This violates encapsulation!” You would need to call a vec.get(i) a la Java style. Who knows how the implementation of a vector could change! We need to use an interface.

    Moreover never Once i needed the shitty programming patterns from OO or GoF.
    I maybe used Visitor twice… Never in prouction environment code.
    Those are just boxes you hide the poor code you are writing in.

    Classes with private members are only useful to hide invariants.. and today it’s almost unneeded (locks, smart ptrs…).

    Last comment: how do you join two strings in python?
    “”.join((“lazy”, “dog”))
    The empty string object uses its method “join” to put together two strings that the method accepts as arguments.
    Like a priest.
    This is the perfect object oriented result.

    Reply

    1. Classes with private members are only useful to hide invariants.. and today it’s almost unneeded (locks, smart ptrs…).

      Hold your horses. Looking at the Stepanov’s implementation of very same std::vector<>, there is a pointer there which is private – and I contend that this encapsulation is a Good Thing(tm). Encapsulation is good (actually – very good) – it is just not as universal as [some of interpretations of] OO dogmas claim (see also my other comment on this page about two types of entities out there).

      BTW, from my communications with Stepanov about 20 years ago – I didn’t notice him having any problems with encapsulation (he’s a very practical guy, i.e. solving problems, and not arguing about dogmas).

      Reply

  7. I completely agree with you. I limit myself using pair to cases where the type is the same and the order not important. For example, if we say that a line segment is two 2D points, a pair<point, point> is fine to me.

    Reply

    1. Exactly the same for me. I also have triangle defined as a tuple of 3 points.
      All the other pairs and tuples I see in our code are developer laziness.

      Reply

  8. The name InsertionResult made me revisit the previous post. When I code, one thing that’s caused me discomfort is as much as I prefer coding with prefixes, I’ve been putting of outlining the specific considerations for when I might use them since some time!

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *