Refactoring a dead-simple progress-bar function
I just saw a neat code example from a Dutch government project (function starting at line 182). The commentator who posted it at Reddit wrote,
“Some people laughed at it and suggested all kind of clever one liners to replace it, but to me, that if statement is perfect. The intent is immediately clear and bugs are easy to spot. This is the kind of code you want in critical apps.”
Here’s the code.
private static string GetPercentageRounds(double percentage)
{
if (percentage == 0)
return "⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪";
if (percentage > 0.0 && percentage <= 0.1)
return "🔵⚪⚪⚪⚪⚪⚪⚪⚪⚪";
if (percentage > 0.1 && percentage <= 0.2)
return "🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪";
if (percentage > 0.2 && percentage <= 0.3)
return "🔵🔵🔵⚪⚪⚪⚪⚪⚪⚪";
if (percentage > 0.3 && percentage <= 0.4)
return "🔵🔵🔵🔵⚪⚪⚪⚪⚪⚪";
if (percentage > 0.4 && percentage <= 0.5)
return "🔵🔵🔵🔵🔵⚪⚪⚪⚪⚪";
if (percentage > 0.5 && percentage <= 0.6)
return "🔵🔵🔵🔵🔵🔵⚪⚪⚪⚪";
if (percentage > 0.6 && percentage <= 0.7)
return "🔵🔵🔵🔵🔵🔵🔵⚪⚪⚪";
if (percentage > 0.7 && percentage <= 0.8)
return "🔵🔵🔵🔵🔵🔵🔵🔵⚪⚪";
if (percentage > 0.8 && percentage <= 0.9)
return "🔵🔵🔵🔵🔵🔵🔵🔵🔵⚪";
return "🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵";
}
This is a cool example because it demonstrates how easy it is to understand the return value when you use a separate constant for each “progress bar” increment instead of using something like new string('🔵', 5)
, which, as we’ll see below, doesn’t even compile.
Still, all but the first condition needlessly checks the lower-bound already guaranteed by the previous step. At the very least, we can reduce it to the following:
private static string GetPercentageRounds(double percentage)
{
if (percentage == 0)
return "⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪";
if (percentage <= 0.1)
return "🔵⚪⚪⚪⚪⚪⚪⚪⚪⚪";
if (percentage <= 0.2)
return "🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪";
if (percentage <= 0.3)
return "🔵🔵🔵⚪⚪⚪⚪⚪⚪⚪";
if (percentage <= 0.4)
return "🔵🔵🔵🔵⚪⚪⚪⚪⚪⚪";
if (percentage <= 0.5)
return "🔵🔵🔵🔵🔵⚪⚪⚪⚪⚪";
if (percentage <= 0.6)
return "🔵🔵🔵🔵🔵🔵⚪⚪⚪⚪";
if (percentage <= 0.7)
return "🔵🔵🔵🔵🔵🔵🔵⚪⚪⚪";
if (percentage <= 0.8)
return "🔵🔵🔵🔵🔵🔵🔵🔵⚪⚪";
if (percentage <= 0.9)
return "🔵🔵🔵🔵🔵🔵🔵🔵🔵⚪";
return "🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵";
}
I would elect to go further, preserving the clarity in constants (or maybe a comment) to avoid repetition in the code.
First, let’s write a few tests with NUnit.
[TestCase(0.00, "⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪")]
[TestCase(0.10, "🔵⚪⚪⚪⚪⚪⚪⚪⚪⚪")]
[TestCase(0.11, "🔵⚪⚪⚪⚪⚪⚪⚪⚪⚪")]
[TestCase(0.19, "🔵⚪⚪⚪⚪⚪⚪⚪⚪⚪")]
[TestCase(0.20, "🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪")]
[TestCase(0.30, "🔵🔵🔵⚪⚪⚪⚪⚪⚪⚪")]
[TestCase(0.40, "🔵🔵🔵🔵⚪⚪⚪⚪⚪⚪")]
[TestCase(0.50, "🔵🔵🔵🔵🔵⚪⚪⚪⚪⚪")]
[TestCase(0.60, "🔵🔵🔵🔵🔵🔵⚪⚪⚪⚪")]
[TestCase(0.70, "🔵🔵🔵🔵🔵🔵🔵⚪⚪⚪")]
[TestCase(0.80, "🔵🔵🔵🔵🔵🔵🔵🔵⚪⚪")]
[TestCase(0.90, "🔵🔵🔵🔵🔵🔵🔵🔵🔵⚪")]
[TestCase(1.00, "🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵")]
public void TestBubbles(double percentage, string expectedOutput)
{
var actualOutput = GetPercentageRounds(percentage);
Assert.That(actualOutput, Is.EqualTo(expectedOutput));
}
Next, let’s give in to our refactoring instincts and see if a shorter formulation of the algorithm is also understandable. The algorithm is now:
- Build constant buffers for
empty
andfilled
. - Calculate the portion of each of these buffers to include in the result (
filledCount
andemptyCount
). - Copy the correct number of characters from the buffers using the C# range-operator.
private static string GetPercentageRounds(double percentage)
{
const string empty = "⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪";
const string filled = "🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵";
var filledCount = (int)Math.Floor(percentage * 10);
var emptyCount = 10 - filledCount;
return filled[..filledCount] + empty[..emptyCount];
}
This doesn’t work, though!
The tests fail. For example, the test for 0.8
returns “🔵🔵🔵🔵⚪⚪” instead of “🔵🔵🔵🔵🔵🔵🔵🔵⚪⚪”. What’s going on?
We would get another hint as to what is going on if we were to refactor the constant declarations to use each symbol only once. For example, I could create the string with a special constructor, as shown below.
var empty = new string ('⚪', 10);
var filled = new string ('🔵', 10);
This avoids repeating the symbol several times but it’s probably also not as clear what’s happening. It also no longer uses constants—initialized once and stored in the app—so we’re allocating new strings each time. We could declare them as static
instance variables so that they are allocated only once. However, that would also mean that we don’t declare them locally in the method, which again decreases readability.
On top of that, though, the second initialization doesn’t even compile!
🔵 is not a single character
Strings are encoded in UTF-16 (the standard for .NET).[1] In this encoding, the “⚪” is represented with one byte, while “🔵” is represented with two bytes. That knowledge, together with knowing that the range operator works with bytes, explains why we only got half as many filled-in symbols as expected.
Knowing this, we can revert to the original constants and fix the algorithm as follows (code-change is highlighted).
private static string GetPercentageRounds(double percentage)
{
const string empty = "⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪";
const string filled = "🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵";
var filledCount = (int)Math.Floor(percentage * 10);
var emptyCount = 10 - filledCount;
return filled[..(2 * filledCount)] + empty[..emptyCount];
}
OK. Now it’s working. We now have two questions:
- Can we avoid the “hack” for UTF-16 in our calculation?
- The code is now more maintainable; is the code still as understandable as before?
Let’s tackle the first one. It turns out that there is a standard way of indexing by grapheme but you have to opt in to it by using a StringInfo
object, which offers a method named SubstringByTextElements()
.
private static string GetPercentageRounds(double percentage)
{
const string empty = "⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪";
const string filled = "🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵";
var filledCount = (int)Math.Floor(percentage * 10);
var emptyCount = 10 - filledCount;
return new StringInfo(filled).SubstringByTextElements(0, filledCount) + new StringInfo(empty).SubstringByTextElements(0, emptyCount);
}
Now our code is no longer making assumptions about how many bytes represent our empty and filled symbols. But is it better? No. It is absolutely less legible than even the previous version. It also allocates two new StringInfo
objects ever time it executes.
Is it even necessary? Also no.
Why wouldn’t it be necessary? In the general case, we have to stay flexible and make sure that we’re extracting the correct number of graphemes (not characters), but we don’t have a general case here. We have two constant strings in a known encoding. We know that we can index by byte into the empty
string and we know that we can index by two bytes into the filled
string. These are constants. They will not change. We can make assumptions based on that.
That means, after this little excursion, that we’ll use our original version but we will also no longer consider it a hack.
This takes us to the final point: is the new version more legible than the original? I think that it is. At first blush, the original looks like it’s very self-explanatory—you can see how the progress bar is built—but you also have many more points of logic to check to verify that it’s actually working as expected. While you can use the test I’ve defined above to check all of the logic, there are many more conditions to check when something goes wrong. We measure the number of paths through a piece of logic as cyclomatic complexity (Wikipedia). The lower the better.
We have learned that, when you program in the original way, you may actually save time! The original formulation didn’t have to concern itself with encodings because it wasn’t slicing strings. The original programmer didn’t even need to be aware that some characters are encoded with multiple bytes whereas others are encoded with a single byte. They didn’t even have to know what a byte was at all!
Food for thought.
As almost always, there isn’t a “best solution” for all situations. There is a solution that minimizes drawbacks for the given requirements, but not for all possible requirements. If one of the requirements were: the reader need not know what a byte is, then the original solution would be more appropriate.
The final version below has lower cyclomatic complexity, uses constants to indicate what the result will actually look like, and explains its algorithm reasonably well, if you understand percentages. I’ve included a comment to explain why we double the number of bytes to select from filled
.
private static string GetPercentageRounds(double percentage)
{
const string empty = "⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪";
const string filled = "🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵";
var filledCount = (int)Math.Floor(percentage * 10);
var emptyCount = 10 - filledCount;
// Each 🔵 is two bytes in UTF-16
return filled[..(2 * filledCount)] + empty[..emptyCount];
}
u8
prefix to make UTF-8 (Microsoft Learn), but that doesn't help because the symbols we want to use are multi-byte graphemes in that encoding as well.↩