Terseness: how little is too much?

Today, Matt Bentley writes for us about terse names in our Code. Matt was born in 1978 and never recovered from the ordeal. He has lived a long and interesting life, and now codes for a hobby, rather than as a way to prove his worth to the world and obtain meals. His keyboard is a Model M, the kind of keyboard Nixon would’ve hated, because it is strong and virtuous.

There are topics of programming that bridge the line between objectivity and aesthetics, like whether or not camelcase is preferable to underscoring (pro tip: to win these particular kinds of debates, scream “f**k [camelcase/underscores/hanging braces]!” at the top of your lungs then run from the room, babbling incoherently. Your opponent will be so impressed by your raw display of power that they will never Ever want to talk to you again. Works every time!). This is not one of those topics.

When I think about variable and function names, I think about the quote attributed to Albert Einstein: “Everything should be made as simple as possible, but no simpler”. This implies there is a point at which you start to lose meaning when you simplify, and that you should stop before that point is reached. Terseness, done well, is inherently an extension of simplification – using less to describe more. The simpler solution, to use Occam’s Razor, will,
all other things being equal, be the inherently better one.

Why then, is my code so verbose? Anyone reading through colony‘s code will note I use long variable names, long function names, etcetera, and there is a reason for this: I believe inherently in the value of code which needs little documentation or comments ie. metadata. I think that by using meaningful names, I increase my ability to both read and write my code, by making what it’s doing more obvious and therein freeing up my brain for more important things, like figuring out algorithms and correct semantics.

Somebody who is used to a more terse code may find this disgusting, and that’s understandable, but I see no problem even with using names like “element_pointer” depending on the context – yes, the ‘pointer’ part is implied in the definition (and in some cases by the use of ‘->’), but why should I or anyone else have to refer back to the definition to figure out what that thing is while browsing another location? I’m also not a fan of inference because it increases cognitive load. From that, you might also infer that I’m not a fan of ‘auto’, but that is, argumentatively, the topic of another discussion.

The standard argument against verbose name styles is more key-presses, which I’m not in agreement with, as cut and paste exists, as does auto-fill in any modern IDE. Besides which, what time you lose while coding with meaningful names, you gain when re-reading the code, as the code becomes self-documenting to an extent. Of course, you still need to comment code here and there to explain complicated algorithms and/or semantics, where it’s not clear, but overall the need to artificially ‘meta’ your code decreases. Shouldn’t this be the default case? Code, ideally, should explain itself. Why should we create needless metadata for code, if code can describe itself?

But what if you’re writing for yourself, and only yourself – no-one else is ever going to need to understand your code. Should you still write this way? From my point of view, yeah, if you’re ever going to be re-reading the code it frees up brain cycles for more important things. If you’re writing throwaway code that’s only used once, testing a specific feature, etc, then it doesn’t matter so much and you should do whatever makes the most sense in that scenario, which is typically the simplest, quickest thing to write. But most code in the world does not exist in that vacuum.

At what point do we consider code to be terse or verbose? That at least is subjective, a line in the sand that each person draws for themselves. But the matter of whether or not more descriptive variable/function names lead to more understandable code is a matter of objective fact: less information == more internal translation/memory-retrieval and cognitive load. It’s only the depth of information that is deemed to be useful that varies from person to person. Let’s look at one extreme example of C terseness:

//Dictionary and Dictionary Entry utility functions and accessors
// currently no guards for 0 inputs ... should this change?
K DI(K d, I i){R kK(d)[i];} //dictionary index, yields entry
S ES(K d){ R *kS(kK(d)[0]);} //dictionary entry's symbol
K DE(K d,S b){DO(d->n,K x=DI(d,i);if(b==ES(x))R x)R 0;} //dictionary entry lookup
Z K* EIA(K a,I i){R kK(a)+i;} //dictionary entry's address of i-th index
K* EVP(K e){R EIA(e,1);} //dictionary entry's value-pointer address (K*)
K* EAP(K e){R EIA(e,2);} //dictionary entry's attribute_dictionary-pointer address (K*)
K EV(K e){R *EVP(e);} //dictionary entry's stored value

This diligent piece of obfuscation appears to be on purpose, without a sense of irony, and a part of the kona codebase. If this doesn’t make you want to pour salt acid in your eyes, I would suggest there’s something probably wrong with your eyes, in which case, melon-ball them out, replace them with better ones, then look at the code and subsequently regret your decision. That entire code-base is coded like this. Does the author find this easier to cognize? Apparently! Does anyone else? Noooooo. Writing this way is at least a sure-fire way of ensuring no-one will ever interfere with your codebase again, as they’ll loathe to understand it.

In my first programming job, I had a colleague who would name his variables and batch files things like K, and J. When asked about why he did this, he said it was because it took less time to type. It also probably ensured that no-one would ever fire him because nobody knew what any of his processes did – but don’t get any ideas! Any code reviewer worth their salt nowadays would spot this and grill you over the coals. Instead of deliberating obfuscating, imagine that someone reading your code has no familiarity with it or your particular coding style and conventions. How would your style change? Here’s a C++ example from plf::list, which is probably toward the ‘verbose’ end of the spectrum, although with a small amount of obfuscation due to use of ++ optimizations:

template <class comparison_function>
void unique(comparison_function compare)
{
  if (node_pointer_allocator_pair.total_number_of_elements > 2)
  {
    return;
  }

  element_type *previous = &(begin_iterator.node_pointer->element);

  for (iterator current = ++iterator(begin_iterator); current != end_iterator;)
  {
    if (compare(*current, *previous))
    {
      current = erase(current);
    }
    else
    {
      previous = &(current++.node_pointer->element);
    }
  }
}

Sure, there’s a lot more characters in this example, and in some instances, you’d still have to refer back to a definition to get a full understanding of what a given variable is. Compared to the previous example though, it’ll take longer to read but far less time to understand, when coming to it with a blank slate. Although each variable name isn’t a magic paragraph which tells you exactly what it is, it gives you enough information that you can begin to work out what the code is doing. Personally, I’m quite comfortable with definitions like iterator get_iterator_from_pointer(const element_pointer_type the_pointer). Works for me. So long as I don’t have to do additional memory retrieval to figure things out, I’m happy.

TLDR:

Write code which is fast to cognize (for others as well as yourself), not just fast to read and type. Do both where possible.

10 Comments

Tom Szczesny
7 years ago Permalink

If viewed from a perspective of terse-C, it seems an abomination.

But, Arthur is an implementer of computer languages (A+, k, KSQL, q, KDB+). If viewed from a perspective of a new language (that can be compiled with a C compiler) I find it fascinating.

1. Tom Szczesny
  7 years ago Permalink
  
  Let’s use “ATWC” as a handle for this new language. All of these (ATWC, A+, k, KSQL, q, KDB+) were designed as “proprietary” languages. A+ was completed in early 1993 for Morgan Stanley, and released by MS as open source in 2008. K (originally designed for UBS) and KSQL are no longer available from Kx Systems. The 64-bit versions of q and KDB+ are available only at a hefty price tag. ATWC was designed to be used only by Arthur.
  
  1. Tom Szczesny
    7 years ago Permalink
    
    Correction: The 32-bit version of A+ was released by MS as open source in 2001. The 64-bit version was released as open source in 2008.
    
Phil Nash
7 years ago Permalink

“The standard argument against verbose name styles is more key-presses” – I’ve never heard this from a real person. It always seems to me to be a bit of a straw man.

I don’t disagree with most of your points (Matt), but I do often use shorter names and I do it to increase readability!
I got around to writing up why a few years ago:
http://www.levelofindirection.com/journal/2015/5/1/naming-is-hard-or-is-it.html

1. Matt Bentley
  7 years ago Permalink
  
  Can’t say I agree – I covered your points in the article by way of saying how the code would appear to someone new to it – and the keypresses things has been recited to me many, many times. If you have a company standard, where you all agree on certain prefixes and suffixes or acronyms, that’s very different to writing generically for multi-users though. Even then, I wouldn’t necessarily personally do it, as you’re (as noted) adding an interpretation layer to your cognitive processing, not subtracting.
  
  1. Matt Bentley
    7 years ago Permalink
    
    ps. Thanks for introducing me to Peter Hilton’s talk, it’s excellent.
    
Harald Hansen
7 years ago Permalink

Thank you for the Kona C samples. I’ve already passed them around the office to much merriment!

We might disagree on other things in the coding standard, but we’re all on the side of verbosity.

1. Matt Bentley
  7 years ago Permalink
  
  Glad to have caused some merriment! 🙂
  
2. tavmem
  7 years ago Permalink
  
  Kona is an exercise in using the coding style of Arthur Whitney. But if you want to see some “real” terseness check out https://github.com/tavmem/buddy/blob/master/a/b.c
  This file has 2 versions of the “buddy space allocation” system. The first version was written by Arthur and consist of 11 lines. The second version was Morgan Stanley’s baseline version written in traditional. well documented C and consists of almost 750 lines of code.
  
  1. Matt Bentley
    7 years ago Permalink
    
    Wow

Write clean and maintainable C++

Terseness: how little is too much?

TLDR:

10 Comments

Leave a Reply Cancel reply