Encoding Ideas

Encoding Ideas

Essays often begin with a "thesis statement", which is a short summary of the paper's core idea. In the same vein, research papers have abstracts, and book dust jackets have short summaries of the books they adorn.

Why don't we have the same thing for code? A way to summarize the core premise behind a chunk of code in a way that others can understand.

Perhaps we do. If you use version control -- and why wouldn't you? -- your commit message is a great place for a very brief summary of your code changes. Git allows you to write arbitrarily long commit messages, too, so you can have a short title (up to 72 characters) as the first line of the commit and then a long explanation a couple of lines down (source).

That's all very well and good and we should all strive to write good, concise commit titles and even splurge on lengthy but informative descriptions when appropriate, but what about writing something in the code itself?

Well, how about comments? Sadly, there are some problems with relying on code comments as a form of documentation:

  • Comments quickly become stale and wrong as busy engineers change the code around them without touching the comments themselves.
  • Comments are only as good as the writing skills of the person who writes them.

After many years of software engineering, I can say with some confidence that strong writing skills are a rare talent among engineers. As a result, many comments and commit messages are less than helpful.

How about we represent the core idea in the code itself? Let's take a look at two examples:

function bs(arr, value) {

  function bsrec(innerArr, val, lo, hi) {
    var m = (lo + hi) >> 1;
    if (lo === hi) {
      return -1;
    }
    if (innerArr[m] === val) {
      return m;
    }
    if (innerArr[m] > val) {
      return bsrec(innerArr, val, lo, m);
    }
    return bsrec(innerArr, val, m + 1, hi);
  }

  return bsrec(arr, value, 0, arr.length - 1);
}

Well, that's a bit hard to read, isn't it?

I think it's a binary search, but it's confusingly written, hard to follow, and uses all kinds of weird shorthands. Let's see another:

function getMidpoint(lowIndex, highIndex) {
  return Math.floor((lowIndex + highIndex) / 2);
}

function binarySearch(sortedVals, targetVal, lowIndex = 0, highIndex = sortedVals.length - 1) {
  if (lowIndex > highIndex) {
    return -1;
  }

  var midpoint = getMidpoint(lowIndex, highIndex);
  if (targetVal < sortedVals[midpoint]) {
    return binarySearch(sortedVals, targetVal, lowIndex, midpoint - 1);
  } else if (targetVal > sortedVals[midpoint]) {
    return binarySearch(sortedVals, targetVal, midpoint + 1, highIndex);
  } else {
    return midpoint;
  }
}

This is better. The variables are named more clearly (though also very verbosely for the sake of the example). The flow of the program is easier to follow.

You can get the core idea behind the second implementation more easily than you can the first. And that's the point: try to write your code in a way that conveys an idea. Here, perhaps we can summarize the idea as:

binarySearch is a recursive function that works by repeatedly halving the search array until the target value is found or not.

This is "embedded" in the way we've written the code. A programmer can read the code and "get the picture". If you've been lucky enough to work with very skilled engineers, you might be familiar with this feeling: sometimes you read a piece of code and you just get it. You understand what the author was trying to do, how they accomplished it, and sometimes even why they did it the way they did it. It's more than just following style guides or naming conventions. It's telling a story.

Now, admittedly, this isn't as nice as an abstract or a thesis statement that neatly summarizes an idea in a compact form, but in our case, we're not just writing pretty prose, we're also writing instructions for a computer to execute to solve an actual problem. So the fact that we're able to convey an idea at all is pretty darn neat, I'd say. (And of course, your mileage may vary depending on language choice. [Ada, for example, prizes readability as a core tenant.])

We have conveyed an idea with the actual instructions that get the job done, and that's crucially important because the idea being represented will stay in sync with the problem being solved. Your documentation stays up to date.

In summary, code is the best documentation you can write. It solves a problem but it's also an expression of the way you think. So show the world that you think clearly by writing clear code.