One of C++s strengths is that it is possible to write very performant code. But does that mean we always have to worry about performance and write our everyday code as performant as possible? Should we give up simplicity for performance? Do we have to?
I don’t think so
There are many reasons why I don’t think we should sacrifice simple and clean code to write more performant code per se. In contrary, I have been criticized for advocating the sacrifice of performance for simplicity.
I would prefer if everyone wrote simple and clean code by default. OK, that’s pretty obvious, because that’s what this blog is all about. But what about reasons why I think so? Here are some.
Performance is not efficiency
There is an important point to get out of the way first. We have to distinguish between efficiency and performance. What is the difference? In very simple terms it is how fast you do something (performance) vs. how long it takes to do it (efficiency).
At first glance that might sound like it’s the same, but it’s not. Imagine you have to go from point A to point B. Efficiency means you go the shortest way. Performance means you run instead of walking. So if you run as fast as you can around the whole block to get to your neighbour, you are at high performance, but not very efficient.
In programming, loops often contribute much to program run time. Here, performance would mean that a single loop cycle executes faster. Efficiency means that you have to do less cycles, mostly because you have a smarter algorithm.
Sometimes you can not have the best of both worlds. The steps of a more efficient algorithm may be less performant. However, before you try to squeeze the last bit of performance out of a piece of code, make sure it is efficient. Only if you have tested all possibilities in terms of efficiency it can pay off to worry about performance.
Sometimes it is enough to make your code more efficient to get the desired speed without having to worry about performance details.
We don’t need performance everywhere
This one is obvious, but many programmers, especially new programmers, tend to overlook it. There are tons of questions on forums and stackoverflow, asking about how a certain piece of code can be optimized. When one asks the counter question if the code is really a performance bottleneck it most often turns out that it is not.
There is a saying that 80% of the run time of a program is spent in only 20% of the code. Some say it’s 90/10. The exact numbers are not very important in general. The crucial point is that the program spends a lot of time in a small amount of code.
On the other hand that means that most code does not contribute much to the total run time, and if we optimize the hell out of it we won’t see much of a result, if we see anything at all.
Don’t optimize code that you have not proven to be vital for the overall program performance.
We don’t really know how to write performant code
I know, how can I dare to say something like that. The fact is that one of the major contributions to program run time is the number of instructions the processor has to execute. And those are not written by us, but by the compiler and its optimizer.
Optimizers come in all shapes and colors, and unless you are an expert in the field, you can’t even guess what they will do to a nontrivial piece of code. Optimizers can eliminate temporary objects, they can inline functions, sometimes even at link time, and they can shuffle around and eliminate lots of those instructions.
So, with those super powers in the machine and our complete ignorance of what code will give the best result, what can we do to make our code more performant? At first, nothing. And if we really have to care about performance, we can not rely on our imagination or experience, we have to use a tool.
By default, write code for maintainability, because it might be as performant as anything else. If you really have to improve performance, use a profiler.
Of course this does not mean you should prematurely pessimize. If there are two or more ways to write a piece of code that are equally readable, use the way that will probably give the best performance. For example, use `++iter` instead of `iter++` if you don’t save the result of the expression, and so on.
Performance and simplicity don’t always contradict each other
The other major contribution to program run time, maybe even more than the amount of instructions, is the layout and structure of data in memory. There is a great talk about getting better performance by using the right data structures by Chandler Carruth, so I won’t go further into this.
All I want to say is, if the memory layout of your data is bad, much of the run time goes into the fetching of data from memory and saving a few instructions won’t have as much of an impact as using the right data structures.
Make sure you use the most performant data structures before you start tweaking your code for more performance.
There is another point to writing performant and simple code: Using the libraries you have, and using them right. Those library writers usually are smart guys and know how to write performant code. They particularly know how to use their profilers.
So if you use libraries instead of rolling your own solutions, your code is likely to be not only more robust and simple to maintain, but also to be more performant.
Know and use your libraries, unless your profiler showed them to have bad performance.
Write readable and simple code by default. If you actually have a performance problem and have located it, there are still many options that are more promising than turning your code into a fast but unreadable mess. Sacrifice simplicity for performance only as a last resort, and always use a profiler when you are dealing with performance problems.
Great post. It will help and motivate my clean code mission in my current project.
Thank you! It’s true, much of this does apply to most other languages as well.
In C++ however you often have more control over details that can affect performance. So many C++ devs think they have to exert that control as often as possible, although they do not really know how. As a result I see wild guesses etched in code that is barely understandable and hardly more performant. Or, as you pointed out, even less performant.
Hey Arne what a great post.
I’ve been searching for contents like this since I read the book Clean Code, that Uncle Bob describes how to write good code.
Nowadays I’m reading a book called Code Complete, that talk about among other things the Clean Code too, it’s a great book btw.
The Clean Code it’s a life changing.
Cheers from Brazil.
I liked your article and agree with the main message… good job.
I believe you’ve constructed this argument to hide from the problem. The reality is that most of today’s software is horribly inefficient and slow, for no real good reason, but due to this kind of thinking. Writing performant and readable code is very possible, and the compiler is most certainly not going to do it for you (it can’t, it’s been tried, it isn’t coming anytime soon).
While algorithmic complexity is a big part of it, the majority of the slowness is actually due to IO (easily fixable) and horrid use of memory (usually via terrible cache performance due to needless memory hopping). There is a big speed gain to be had, if everyone, across the board, just cared a bit more about the hardware the code was being run on. This doesn’t have to even be platform specific (the same rough ideas work on a PS3 as an iPhone as a desktop Core i7), it’s just being aware that the code will run on a computer, which is not an abstract machine, but a real piece of silicon.
Here’s a great discussion about why most “modern” code is slow and what to do about it:
Readability is certainly still very very important, but there is simply no need to throw performance away to achieve it. Code will be run by the machine. It should be “readable” with respect to what the machine will be doing, because that is the actual reality of the situation.
Personally I would say write the simplest code you can. Make it painfully readable. Never have a method you cannot see entirely on one screen. If it is bigger then that break it up with another method call. Right inside a loop or an if statement are perfect places to do this. The most important thing for performance is to measure the wall time. Get the start time when you enter a method usually in ms or ns. Get the end time when you leave the method. Subtract the two store it. People will tell you to use profilers, they are wonderful, they are usually not on in production. Guess where you have the real problems you can’t predict or recreate. (We run a billion dollar ecommerce site doing exactly this on Cyber Monday which is our biggest day of the year.) Aggregate the data and keep a history of it. Its called instrumentation, and once you have supported a system with it, you will never want to be without it. Now someone says there is a performance problem in production, you can see exactly what piece of code did it. Guess which piece you optimize. Guess what we are computer scientists, no guessing required. Now when you make a change you can prove if it is faster or slower. Slow code is found and fixed quickly. Iterate. You can do other things, but in my opinion the proof is in the pudding.
Just curious, I always use array[iter++] instead of array[iter];++iter; Is the latter really more performant than the prior version? Or is the method I preferred, in this case, more performant?
A good example is a comparison of the bubble sort and quick sort methods. The bubble sort has a lot less logic needed and it is simpler logic too, so its performance is better than the quick sort. I quit testing it at 150K because it was consistently quadrupling the time when you double the number of items to sort and that took over 3 minutes to sort. The quick sort routine was still taking milliseconds to run. I quit testing at 150M integers because I was running out of memory and it took 49 seconds. Array.Sort took 30 seconds and consistently ran about 60% of the time my code took. (It was just as efficient, but more performant.)
array[iter++]is misleading, because it is an index instead of an iterator 😉
I doubt that the compiler would give you much different output for the two, since the index usually is of integral type and not a user defined class. So I’d prefer to go with the second, because it’s clear: access the array and increment the index are two things, so why not make it two statements?
++iter in C++ is typically slightly faster.
In a previous job I used to specialize in performance improvement. For example, I took a process run that took 24 hours and reduced it to one hour. My experience shows that the two most wasteful thing that decrease performance are:
1. Inefficient use of I/O.
2. Repeatedly calculating the same result instead of calculating it once and then saving it for later. This is usually exacerbated by having I/O as part of the calculation.
Most frequently, the above problems are the result of different programmers writing modules of code that must work together without trying to coordinate their efforts for performance. So, you cannot always make the assumption that the other guy has written performant code if his code does not dovetail well with solving the overall problem.
The word I’m not seeing here is “elegant” — if a program is elegant it is, by definition, both as efficient and as clean/simple as it can be. See anything written by Dijkstra for further reference.
Point of grammar: “per se” has no accent on the E. It’s Latin, not French.
Thanks for the grammar correction 😉
I didn’t know there was an actual definition of elegant code, and I don’t think everyone agrees on what is elegant. I have seen the word together with some really tricky template magic using all kinds of SFINAE and other tricks which was fast but far from simple.
Well… One definition of “elegant” is that Dijkstra would’ve liked it… for example, Euclid’s algorithm is elegant. It’s also elegant, for example, to produce an array of randomly-ordered integers by filling the array in sequence and then swapping each element with a random one, as opposed to the method towards which most people seem to gravitate, which is to pick a different random number for each cell in sequence, comparing it with all the numbers they’ve already picked for previous cells (which could theoretically take forever).
The algorithm is everything: it’s up to another programmer to understand the algorithm; it’s not up to the existing programmer to hobble his algorithm by producing something other people will be more likely to understand.|
An elegant algorithm is something which generally (a) requires lateral thinking to produce, (b) is the best possible expression of the solution to a problem, in terms of both simplicity and efficiency, (c) is obvious to everyone once they’ve seen it, but (d) is apparently impossible for most people to create.
Elegance is rarely taught these days: computers are now fast enough and have enough RAM that such considerations are no longer as pressing as they were back when machines had 256 bytes of RAM and a paper tape. Problems arise, however, when a program is just not running well, and nobody can seem to figure out why: this is usually because the program is inelegant in design, and is thus inefficient and wasteful. But, thanks to the prevalence of APIs and frameworks, a programmer can now patch together a program out of a bunch of calls to one pre-written function or another… without having the slightest idea of what’s going on inside the machine, or of how to write these things himself, or of whether there might be a better way to do it.
Real programming is not bundling together a whole load of other people’s crap because in some labyrinthine way, you can put something into the front of your Frankenstein’s Monster program and what you expect eventually comes out of the back. That’s just horrible.
Oh, and by the way: “performant” isn’t a word: it’s a string of Microsoft letters. The word in English is “efficient”…!
Learn something about linguistics … “performant” is a word. And it doesn’t mean the same thing as “efficient” … did you even read the article?
I would kindly refer a lot of people reading this article to http://en.wikipedia.org/wiki/Amdahl%27s_law. It is applicable almost everywhere, especially in software.
Overall though, I agree with most of what this article has to say. As one commenter said, there are many contributors to code bottlenecks, including I/O. While I agree that most are not directly “curable”, there is one particular contributor that is usually caused by bad programming – cache misses. When you write loops which constantly iterate over a block of data, you should opt to always exploit the locality of reference, If you consistently make sudden jumps in your code to another block in memory, you’re bound to have your efficiency and performane degraded, because DDR3/4 is slow. Really slow. If you think about it, 80% of the code you write consists mostly of memory reads/writes(hence your figure). If you can reduce the distance between the processor and the memory you need to write to/read from (i.e. cache or DDR due to a miss), your program will drastically improve. Cache coherency is also another mechanism that can reduce performance, simply because a lot of multi-processor applications constantly write to a shared memory. If you can do most with what you have at the caches, you will improve the throughput of your code drastically. People generally start to think that this kind of stuff are handled by the compiler/optimizer, but it’s not.
Completely agree with this article.
The problem with optimizations i see is:
Using profilers you are able to find the bottlenecks, but usually profilers will only tell the percentage of time spent in a piece of code, so how do you know if that time spent is acceptable for that piece of code or needs to be reduced, how much could it be reduced?
When you are looking for a performance bottleneck you usually already have determined that there is a performance problem. “Action XY takes too long, I need it to be faster NN percent”.
So, the “is it acceptable” is answered with “no” before you start the profiler. The “how much performance can I get out of it” can not be answered beforehand. Profiling and optimization often is a game of trial and error, especially when it comes to quenching the last few cycles out of a function.
Maybe you can achieve your performance goal by rewriting only a part of the slow functionality, maybe you have to rewrite most or all of it, and amybe you have to completely alter the structure and layout of your program to achieve a specific performance goal.
I must direct attention to one statement in your otherwise fine article. You wrote:
In programming, it is loops that contribute most to program run time.
I must point out that this is a limited truth. Other contributors are:
– I/O in all its forms. Moving a bit block to some hardware takes time, waiting for the user takes more time.
– Automatic code beyond your control. Garbadge collectors come to mind, but anything that is supplied by 3rd. parties (library methods etc.) could be eating performance.
– To a certaing degree, stack operations in recursive methods can be performance degrading.
You simply cannot know in advance when what eats your performance. Sub-conclusion is – as you also wrote – to engage a profiler and find the proper bottlenecks, then direct optimisation there. Man-conclusion is to write all of your code as maintable as possible, thus enabling your chances to improve / optimize it uppon neeed.
Hi Normann, thanks for the feedback! You are right, I simplified that sentence a bit too much. Changed it.
If we simplify a bit, there are generally two kinds of programs: those that will be around for a while and those that won’t. For the first sort, legibility and maintainability are the greatest of virtues, to be sacrificed for no other consideration. For the second, optimal performance will matter greatly only in a minuscule percentage of cases. That seems to point out the “intelligent default path” rather clearly.
True, but alas too simple. Those programs that surely won’t be around for a while eventually ends up as long-lived code anyway – either as programs or as part of other programs. You simply cannot know. By that token, always write your programs as clean as possible, even it they are meant to be thrown away.
Hands up who has worked on a 10 year old “prototype” in production 😉
So true! Everywhere I’ve worked, the best prototypes generate so much interest they’re generally forced to production – it’s flattery to the programmer, but should also induce a little healthy fear during the design and development of “temporary” code!
My boss: “mock up this interface so the customer has something to wrap their head around.”
Me: “Do we have any detail about the system that we are interfacing with?”
My boss: (verbally) “Their field A maps to our field B (repeat for several fields)…” Followed by three sentances of business rules.
Me: mock up a GUI with no code behind.
My boss: shows customer and includes lots of words about how great it is.
Customer: “Great, the interface is done!”
My boss: “The customer thinks the interface is done. I told them we are in testing and it will be a few days. Get it done…”
Me: bang out code based on initial verbal spec.
Customer: “This doesn’t work right…”
repeat for almost every project I was assigned…
My code was new at the time, but it is 10+ years old now… The poor soul that is stuck with it.
For me the whole article reads like: Don’t be a bad programmer, don’t use bad compilers, don’t use bad libraries! General advices based on implicit assumption justified by bad examples, e.g. map vs. vector. The real question is: Why bad programmers write code, why do we have bad compilers, why we write our own libraries? Answer: available/cheap, coding for non-Intel platform, bloathed/expensive/not free/not available.
While I completely agree with everything you’ve written, I wish this point was more widely understood, that we realized the business process efficiency is generally more important than software performance.
An interesting (non-C++) story – we have a web app that allows customers to download their monthly traffic detail. The web app gets this data from the traffic repo (TR) via middleware (MW), in a 2-step process, where the first invocation returns the SQL for the 2nd invocation. Someone decided to optimize this process like this:
1. Web app invokes MW.
2. MW invokes step 1 on TR, sometimes performing “data translation” (e.g., web app invokes MW with customer ID, MW invokes TR with list of customer accounts).
3. TR returns SQL for step 2.
4. MW invokes SQL on TR.
5. MW returns traffic data to web app.
Seems right, we remove an unnecessary trip to the web app. However…
Every time there’s an incident related to incoherent traffic data, here’s what happens:
1. Web app support team (WAST) gets assigned the ticket.
2. WAST assigns the ticket to the MW support team (MWST), asking for the SQL for the execution related to the incident.
3. MWST reassigns the ticket to WAST, along with the SQL.
4. WAST assigns the ticket to the TR support team, which will then investigate.
Because this web app is tied into the monthly billing process, these incidents tend to cluster in a monthly peak. If this peak coincides with any other workload peak for the MW team, steps 2 and 3 may delay the incident-solving process for hours.
So, in order to save a few milliseconds on every invocation, we’re adding hours to one of the most important customer-retentive interactions – solving the customer’s problem.
In all fairness, before working on an Ops team, I probably would’ve done the same “optimization”. There’s no eye-opener like having to wear different hats.
Even performance isn’t the same in all contexts: making a Photoshop command run faster is likely to entail different techniques than making a very heavy traffic website respond faster. Sometimes it’s even a matter of perception: in a GUI it might be more important to reduce button press feedback by a fraction of a second than to make the effect of pushing that button faster by tens of seconds.
Which libraries are you talking about?
AFAIK if you want to max out a modern CPU you basically have to use data-structures that give you the best spatial and temporal memory locality possible, i.e., structs of arrays for sequences, hash-tables, …
The STL doesn’t support structs of arrays; its concepts don’t allow these by design. C++ doesn’t allow you to play with the memory layout of your data in a maintainable way either; it is just not the way the C++ object model works.
As a consequence I have yet to see a real application in which changing any main STL-like
to a struct of arrays did not result in a (close to) full program rewrite. Why? Because improving the performance of 20% of the code requires a change to a “style” of data-structures that are not supported by the language or the libraries.
In C++ the only thing one can do is “figure out which performance you need and write code as simple and maintainable as possible to get it”. So, to me at least, the whole article reads like wishful thinking + generic but useless advice. It might sound like harsh bashing, but i think this is just the awful truth.
I am not writing about maxing out the CPU but on a much lower level. I am writing about knowing what the complexity and performance implications of a certain piece of common libaries are.
I have seen programs using
std::mapas the default container where a
std::vectorwould be perfectly suitable in the most cases, because the order of elements was not important and anything that would be done with the cotnainers was range-based for loops.
If you come from an environment where you need to apply more drastic measures like structs of arrays, then I think you are in a somewhat special position for two reasons: On one hand, most projects don’t have to push performance boundaries that far, and on the other hand to consider such changes you probably already have done tons of optimizations others have not even thought of.
I see, yes, my field might be somewhat special. E.g. the cases you mention are taken for granted (map vs vector).
Most of the time people would even look a std::vector suspiciously if it is not using a stack allocator. And if you are using a std::map/set but they see predictable insertion and reading patterns they would wonder why you didn’t use a flat_map or a flat_set.
Normally tho, unless you make cold code hot (by using something terribly inefficient) nobody cares (and the most maintainable solution is preferred), and that is true for 80% of the code. However to make the remaining 20% really fast, the optimizations one has to apply are not localized, they are structural to the application, and they touch the main data-structures, which are used also by the remaining 80% of the code. This “tight” coupling is a high pain point for us.
I’d rather say use profiling to determine bottlenecks in the program, then do all you can to improve the performance of that parts of code (even if that means smelly code or redundancy). Everything else (and here I agree with the 80/20 ratio) – just follow the usual best practices for coding.
If you really have to quench the last bit of performance out of that code, then yes, smelly code is OK. But as I have written, that should be the last resort. If you get a sufficiently good performance by other means, why reduce maintainability?
By all means, yes. Most time high-level patterns and good design don’t impact performance. It’s only when it comes to running some bits of code zillions of times per second, I start thinking of saving here and there.
I first thought about it many years ago when I had to abandon all I learned about perfect database architecture in one particular project that required pulling thousands of records from the DB at the rate of every 0.2 sec.
covering software requirements and functionality, stability and reliability, security, readability and maintainability, choosing right pattern and algorithms are more important than a wiring a cryptic faster code, where in most cases, not using of proper tools and profilers by developers, leads to only marginally faster code but awfully complex and buggy implementations.
unless we are doing a mission critical / hpc code, the best bet is composing clean and beautiful code, the compiler optimizer will do the rest, enough for most scenarios.