Know your libraries

I often see people use handcrafted loops or write weird workarounds for stuff the standard library has already taken care of. This does not only apply for standard library features but also for any other library, like Boost, other third party libraries and the libraries the code belongs to. That is bad for several reasons, and I am going to lay out what I think every developer should be required to do before he writes production code.

Reinventing the wheel costs time

I will take a standard example: Searching for something in a container, e.g. a `std::vector`. This is the naive approach taken by developers who don’t know the standard library too well:

bool isNumberInVector(std::vector<int> const& numbers, int iLookFor) {
  bool found = false;
  for (auto it = numbers.begin(), it != numbers.end(), ++it) {
    if ((*it) == iLookFor) {
      found = true;
    }
  }
  return found;
}

Most people who have a bit of experience with C++ know that there is a function template `std::find` in ´<algorithm>´ that does such a search for us:

#include <algorithm>
bool isNumberInVector(std::vector<int> const& numbers, int iLookFor) {
  auto pos = std::find(std::begin(numbers), std::end(numbers), iLookFor);
  return pos != std::end(numbers);
}

Or with C++11 algorithms:

bool isNumberInVector(std::vector<int> const& numbers, int iLookFor) {
  return std::any_of(std::begin(numbers), std::end(numbers), 
    [=](int iElem){ return iElem == iLookFor; }
  );
}

“Yeah” some people may say, “but I don’t look for numbers in a `vector`, I look in a `map<string, Foo>` for every key value pair where the `bar` attribute of the `Foo` is `”meow”`. `std::find` can’t give me that!” They are right, but there are other algorithms that can:

#include <iterator>

std::map<string, Foo> getWhereBarIsMeow(std::map<string, Foo> const& myMap) {
  std::map<string, Foo> results;
  
  std::copy_if( 
    std::begin(myMap), 
    std::end(myMap), 
    std::inserter(results, std::end(results)),
    [](std::pair<string const, Foo> const& mapEntry) {
      return mapEntry.second.getBar() == "meow";
    }
  );

  return results;
}

Development time

That call to `std::copy_if` does not look too short, so the function will take time to write. So where is the advantage against a handcrafted loop? I think the handcrafted loop will take more time, because you usually check twice to check if it does the right logic. In general, I prefer calling a function that does all the checks for me, than writing a handcrafted loop where I have more possibilities to mess things up.

It may not be much time I gain when writing the code, in some cases it might even take more time, e.g. if I have to look up the exact order in which i have to pass the parameters. But the time I spend writing a function is usually the least important.

Run time

Although we should not worry about getting each single percent of performance out of a single piece of code when we write it, performance matters. Writing a function performant without decreasing readability is not premature optimization, it is avoiding premature pessimization.

My hand written implementation of the `isNumberInVector` function above is a bit sloppy, can you spot the problem? I haven’t written anything wrong, the function will work like a charm, but I left something out. There should be a `break` in the loop when the value has been found. If you pass a long vector and the value you look for is among the first entries, it will nevertheless loop over the whole vector, wasting time and needlessly adding a bit of entropy to our universe.

Maintenance time

Usually the most precious time resource is maintenance time, and using library features instead of writing code by hand that does the same thing is mostly an issue of maintenance time.

If someone reads a piece of code and sees a call to a library function, he will know what is happening just by recognizing the name of the function. If he encounters a handwritten loop or a bunch of intermediate calculations that do the same thing as the library function would have done, he has to analyze the code in order to recognize what is happening, which will cost more time. He will perhaps even wonder why the original author did not use the library function and waste precious time searching for something that makes the library function inapplicable and which simply is not there.

It’s all about communication

When talking with your customers, you will probably use domain specific terms they are used to. Those terms will be used in the specs, and I bet there are many terms you had to learn when you started working in that domain. For example, I had to learn a lot of insurance related terms for the job I am currently at. It’s just necessary for a clear communication to have a set of terms that everybody involved understands, so nobody has to digress into lengthy explanations.

It is the same with software development. Writing code is communicating with other developers, including your future self. It is mandatory to have a clear set of terms there, too, and libraries, including your own, provide those terms.

Knowing the libraries that are used is essential to write understandable code and to understand code written by others.

Which libraries should I learn?

One can’t possibly learn about all C++ libraries in the world, so a project should have a document that has a list of the libraries that are used. The document should also state where those libraries are used, so when someone works only in a certain part of the system, he has to learn only the libraries used in that part. A database expert might not need to know everything about the GUI library.

Sometimes it is not necessary to learn about the whole library, when only a few features are used. Nevertheless you should know which features you are currently not using play well together with the features you use. If you are using standard library containers you should know about all the algorithms, because you will surely need one of them some time in the future.

Not only know what you are using today, but also what you will be using tomorrow.

How deeply you should learn a library depends on how often you use it. For example, `std::vector` is an essential container, so you best know everything there is to know about it by heart. It might even be useful to know not about its interface but about how it is implemented on your system. On the other hand if you don’t work with custom memory management you probably don’t need to exactly know how you create a `std::function` with a custom allocator – it will suffice to know that it is possible and where to look it up in case you should need it in the future.

You should not blindly introduce some new library into your part of the project, just because you think it is fancy or because one single function of the library is just the thing you could use right now. Everyone involved with your code will have to learn about the library, at least about the part you used, to understand your code. Especially don’t use two libraries that do the same thing.

Introduce new libraries to your project only after careful consideration.

Your own libraries

Although I mostly talked about the standard library, everything I have written above applies to any library you use, including your own code. You will be using your own classes way more often than any library class. In a large code base with many contributors that is only possible with good modularization. You can’t learn thousands of classes, and you should not need to.

Most classes are implemented to hold features that are only used within a certain part of an application. If you don’t work in that part, you don’t need to know the class. Instead, if your project is modularized, you only need to know about the classes that make up the interface of each module, and only of those modules that you are using in the part of the application you work at.

Modularize your applications to provide a clean manageable set of libraries.

Like with any other library you should define which of your own libraries get used in which parts of the code. That way you not only provide reusability and encapsulation by having the libraries, you also prevent uncontrolled dependency growth and you don’t have to worry about the inner workings of all your libraries at once, but only of the one you are currently working in. The others can be thought of as just another library you are using.

Previous Post
Next Post

10 Comments





  1. I agree that using algorithms are self documented and most of the time are a better pick but the range based for loop does not looks so bad to me in that case too. Without compromising on code readability and chance of adding a bug.

    std::map getWhereBarIsMeow(std::map const& myMap) {
    std::map results;
    for( auto & elem : myMap )
    if ( elem.second.getBar() == “meow” )
    results.insert(elem);
    return results;
    }

    Reply

    1. I agree that range based for loops are much more readable that the traditional for loops. Nevertheless one has to analyze their content, which I consider a slightly bigger mental burden than just reading “copy from… to… if…” This is especially true once we get range based algorithms.

      Reply

  2. Which libraries should I learn?

    None, you should be able to Google. In my opinion the most valuable ability a software designer should have.

    Reply

    1. While I agree that googling is a key skill, you still have to know at least the capabilities of your libraries, or else you end up reimplementing everything from scratch. When it comes to the exact name and parameter list of a function you can of course look it up. Unless you have to use it often, then googling each time would take too much time.

      Reply

  3. … “It is avoiding premature pessimization”… The quote of the day.
    An article I should have read before starting a professional career as C++ dev. Thanks for sharing.

    Reply

  4. You could also use ‘std::any_of’ for your first example. I think this describes the functions intent best.

    Reply

    1. Great point! Seems I still have to learn my (C++11 standard-) libraries as well 😉 I added an any_of version.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *