This page shows the source for this entry, with WebCore formatting language tags and attributes highlighted.


Elegant Code vs.(?) Clean Code


<n>This article was originally published on the <a href=""><b>Encodo Blogs</b></a>. Browse on over to see more!</n> <hr> A developer on the Microsoft C# compiler team recently made a post asking readers to post their solutions to a programming exercise in <a href="" author="Eric Lippert" source="Fabulous Adventures in Coding">Comma Quibbling</a>. The requirements are as follows: <ol> If the sequence is empty then the resulting string is "{}". If the sequence is a single item "ABC" then the resulting string is "{ABC}". If the sequence is the two item sequence "ABC", "DEF" then the resulting string is "{ABC and DEF}". If the sequence has more than two items, say, "ABC", "DEF", "G", "H" then the resulting string is "{ABC, DEF, G and H}". (Note: no Oxford comma!)</ol> On top of that, he stipulated <iq>I am particularly interested in solutions which make the semantics of the code very clear to the code maintainer.</iq> Before doing anything else, let's nail down the specification above with some tests, using the NUnit testing framework: <code> [TestFixture] public class SentenceComposerTests { [Test] public void TestZero() { var parts = new string[0]; var result = parts.ConcatenateWithAnd(); Assert.AreEqual("{}", result); } [Test] public void TestOne() { var parts = new[] { "one" }; var result = parts.ConcatenateWithAnd(); Assert.AreEqual("{one}", result); } [Test] public void TestTwo() { var parts = new[] { "one", "two" }; var result = parts.ConcatenateWithAnd(); Assert.AreEqual("{one and two}", result); } [Test] public void TestThree() { var parts = new[] { "one", "two", "three" }; var result = parts.ConcatenateWithAnd(); Assert.AreEqual("{one, two and three}", result); } [Test] public void TestTen() { var parts = new[] { "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten" }; var result = parts.ConcatenateWithAnd(); Assert.AreEqual("{one, two, three, four, five, six, seven, eight, nine and ten}", result); } } </code> The tests assume that the method <c>ConcatenateWithAnd()</c> is declared as an extension method. With the tests written, I figured I'd take a crack at the solution, keeping the last condition foremost in my mind instead of compactness, elegance or cleverness (as often predominate). Instead, I wanted to make the special cases given in the specification as clear as possible in the code. On top of that, I added the following conditions to the implementation: <ol> Do not create a list or array out of the enumerator. That is, do not invoke any operation that would involve reading the entire contents of the enumerator at once (e.g. the extension methods Count() or Last() are <i>verboten</i>). Avoid comments; instead, make the code comment itself. Make the code as clearly efficient as possible without invoking any potentially costly library routines whose asymptotic order is unknown. </ol> That said, here's my version: <code> public static string ConcatenateWithAnd(this IEnumerable<string> words) { using (var enumerator = words.GetEnumerator()) { if (!enumerator.MoveNext()) { return "{}"; } var firstItem = enumerator.Current; if (!enumerator.MoveNext()) { return "{" + firstItem + "}"; } var secondItem = enumerator.Current; if (!enumerator.MoveNext()) { return "{" + firstItem + " and " + secondItem + "}"; } var builder = new StringBuilder("{"); builder.Append(firstItem); builder.Append(", "); builder.Append(secondItem); var item = enumerator.Current; while (enumerator.MoveNext()) { builder.Append(", "); builder.Append(item); item = enumerator.Current; } builder.Append(" and "); builder.Append(item); builder.Append("}"); return builder.ToString(); } } </code> Looking at this from a maintenance or understanding point-of-view, I have the following notes: <ul> More novice users will probably not immediately grasp the use of the enumerator. Though it's part of the .NET library, its use is usually hidden by the syntactic sugar of the <c>foreach</c>-statement. The formatting instructions for curly brackets and separators are included several times, which decreases maintainability should the output specification change. The multiple calls to the string-concatenation operator and to <c>StringBuilder.Append()</c> are intentional. I wanted to avoid having to use escaped <c>{}</c> in the format string (e.g. <c>String.Format("{{{0} and {1}}}", firstItem, secondItem)</c> is confusing if you're not aware how curly brackets are escaped in a format string). </ul> Other than those things, it seems relatively compact and efficient. With my own version written, I looked through the comments on the post to see if any other interesting solutions were available. I came up with two that caught my eye, one by <i>Jon Skeet</i> and another by <i>Hresto Deshev</i>, who submitted his in F#. Hristo's example in F# is as follows: <code> #light let format (words:list<string>) = let rec makeList (words: list<string>) = match words with | [] -> "" | first :: [] -> first | first :: second :: [] -> first + " and " + second | first :: second :: rest -> first + ", " + second + ", " + (makeList rest) "{" + (makeList words) + "}" </code> That's so cool: the formulation in F# is almost plain English! That's pretty damned maintainable, I'd say. I have no way of judging the performance of this just-in-time parsing, but it does make use of recursion: lists with thousands of items will incur thousands of nested calls. Next up is Jon Skeet's version in C#: <code> public static string JonSkeetVersion(this IEnumerable<string> words) { var builder = new StringBuilder("{"); string last = null; string penultimate = null; foreach (string word in words) { // Shuffle existing words down if (penultimate != null) { builder.Append(penultimate); builder.Append(", "); } penultimate = last; last = word; } if (penultimate != null) { builder.Append(penultimate); builder.Append(" and "); } if (last != null) { builder.Append(last); } builder.Append("}"); return builder.ToString(); } </code> This one is very clever and handles all cases in a single loop rather than addressing special cases outside of a loop (as mine did). Also, all of the formatting elements---the curly brackets and item separators---are mentioned only once, improving maintainability. I immediately liked it better than my own solution from a technical standpoint. While I'm drawn to the cleverness and elegance of the solution, I'm not the target audience. Skeet's version forces you to reason out the special cases; it's not immediately obvious how the special cases for zero, one and two elements are handled. Also, while I am tickled pink by the aptness of the variable name <c>penultimate</c>, I wonder how many non-native English speakers would understand its intent without a visit to an online dictionary. The name <c>secondToLast</c> would have been a better, though far less sexy, choice. It's very easy to underestimate how little people are willing to actually read code that they didn't write. If the code requires a certain amount of study to understand, then they may just leave it well enough alone and seek the original developer. If, however, it looks quite easy and the special cases are made clear---as in my version---they are far more likely to dig further and work with it. Since the problem is defined as a three special cases and a general case, it is probably best to offer a solution where these cases are immediately obvious to ease maintainability---and as long as you don't sacrifice performance unnecessarily. Cleverness is wonderful, but you may end up severely limiting the number of people willing---or able---to work on that code.