This page shows the source for this entry, with WebCore formatting language tags and attributes highlighted.

Title

Designing a small API: Bit manipulation in C#

Description

A usable API doesn't usually spring forth in its entirety on the first try. A good, usable API generally arises iteratively, improving over time. Naturally, when using words like <i>good</i> and <i>usable</i>, I'm obliged to define what exactly I mean by that. Here are the guidelines I use when designing an API, in decreasing order of importance: <dl dt_class="field"> Static typing & Compile-time Errors Wherever possible, make the compiler stop the user from doing something incorrectly instead of letting the runtime handle it. Integrates into standard practices That is, do not invent whole new ways of doing things; instead, reuse or build on the paradigms already present in the language. Elegance Ideally, using the API should be intuitive, read like natural language and not involve a bunch of syntactic tricks or hard-to-remember formulations or parameter lists. Clean Implementation The internals should be as generalized and understandable as possible and involve as little repetition as possible. CLS-Compliance Cross-language compliance is also interesting and easily achieved for all but the most low-level of APIs </dl> Using those guidelines, I designed an API to manage bits and sets of bits in C#. Having spent a lot of time using Delphi Pascal, I'd become accustomed to set and bit operations with static typing. In C#, the .Net framework provides the Set<t> generic type, but that seems like overkill when the whole idea behind using bits is to use <i>less</i> space. That means using enumerated types and the <c>FlagsAttribute</c>; however, there are some drawbacks to using the native bit-operations directly in code: <ol> Bit-manipulation is more low-level than most of the rest of the coding a C#-developer typically does. That, combined with doing it only rarely, makes direct testing of bits error-prone. The syntax for testing, setting and removing bits is heavy with special symbols and duplicated identifiers. </ol> To demonstrate, here is a sample: <code> [Flags] enum TestValues { None = 0, One = 1, Two = 2, Three = 4, Four = 8, All = 15, } // Set bits one and two: var bitsOneAndTwo = TestValues.One | TestValues.Two; // Remove bit two : var bitOneOnly = bitsOneAndTwo & ~TestValues.Two; // Testing for bit two: if ((bitsOneAndTwo & TestValues.Two) == TestValues.Two) { ... } </code> As you can see in the example above, setting a bit is reasonably intuitive (though it's understandable to get confused about using <c>|</c> instead of <c>&</c> to combine bits). Removing a bit is more esoteric, as the combination of <c>&</c> with the <c>~</c> (inverse) operator is easily forgotten if not often used. Testing for a bit is quite verbose and extending to testing for one of several flags even more so. <h>Version One</h> Therefore, to make things easier, I decided to make some extension methods for these various functions and ended up with something like the following: <code> public static void Include<t>(this T flags, T value) { ... } public static void Exclude<t>(this T flags, T value) { ... } public static bool In<t>(this T flags, T value) { ... } public static void ForEachFlag<t>(this T flags, Action<t> action) { ... } </code> These definitions compiled and worked as expected, but had the following major drawbacks: <ul> At the time, we were only using them with <c>enum</c> values, but code completion was offering the methods for all objects because there was no generic constraint on <c>T</c>. Not only that, but much of the bit-manipulation code needed to know the base type of the arguments in order to be able to cast it to and from the correct types. There were a lot of checks, but it all happened at runtime. The <c>ForEachFlag()</c> function was implemented as a lambda when it is clearly an iteration. Using a lambda instead makes it impossible to use <c>break</c> or <c>continue</c> with this method. </ul> This version, although it worked, broke several of the rules outline above; namely: while it did offer compile-time checking, the implementation had a lot of repetition in it and the iteration did not make use of the common library enumeration support (<c>IEnumerable</c> and <c>foreach</c>). That the operations were available for all objects and polluted code-completion only added insult to injury. <h>Version Two</h> A natural solution to the namespace-pollution problem is to add a generic constraint to the methods, restricting the operations to objects of type <c>Enum</c>, as follows: <code> public static void Include<t>(this T flags, T value) <hl>where T : Enum</hl> { ... } public static void Exclude<t>(this T flags, T value) <hl>where T : Enum</hl> { ... } public static bool In<t>(this T flags, T value) <hl>where T : Enum</hl> { ... } public static void ForEachFlag<t>(this T flags, Action<t> action) <hl>where T : Enum</hl> { ... } </code> .NET <c>enum</c>-declarations, however, do not inherit from <c>Enum</c>; instead, they inherit from <c>Int32</c>, by default, but can also inherit from a handful of other base types (e.g. <c>byte</c>, <c>Int16</c>). This makes sense so that <c>enum</c>-values can be freely converted to and from these base values. Not only will a generic constraint as defined above not have the intended effect, <i>it's explicitly disallowed by the compiler</i>. So, that's a dead-end. The other, more obvious way of restricting the target type of an extension method is to change the type of the first parameter from <c>T</c> to something else. However, since <c>enum</c> types don't inherit from <c>Enum</c>, what type do we use? Well, it turns out that <c>Enum</c> is a strange type, indeed. It can't be used in a generic constraint and does not serve as the base class for enumerated types but, when used as the target of an extension method, it <i>magically applies to all enumerated types!</i> I took advantage of this loophole to build the next version of the API, as follows: <code> public static void Include<t>(this <hl>Enum</hl> flags, T value) { ... } public static void Exclude<t>(this <hl>Enum</hl> flags, T value) { ... } public static bool In<t>(this T flags, <hl>Enum</hl> value) { ... } public static void ForEachFlag<t>(this <hl>Enum</hl> flags, Action<t> action) { ... } </code> This version had two advantages over the first version: <ol> The methods are only available for enumerated types instead of for all types, which cleans up the code-completion pollution. The implementation could take advantage of the <c>Enum.GetTypeCode()</c> method instead of the <c>is</c> and <c>as</c>-operators to figure out the type and cast the input accordingly. </ol> <h>Version Three</h> After using this version for a little while, it became obvious that there were still problems with the implementation: <ol> Though using <c>Enum</c> as the target type of the extension method was a clever solution, it turns out to be a huge violation of the first design-principle outlined above: The type <c>T</c> for the other parameters is not guaranteed to conform to <c>Enum</c>. That is, the compiler cannot statically verify that the bit being checked (<c>value</c>) is of the same type as the bit-set (<c>flags</c>). The solution only works with <c>Enum</c> objects, where it would also be appropriate for <c>Int32</c>, <c>Int64</c> objects and so on. The <c>ForEach</c> method still has the same problems it had in the first version; namely, that it doesn't allow the use of <c>break</c> and <c>continue</c> and therefore violates the second design-principle above. </ol> A little more investigation showed that the <c>Enum.GetTypeCode()</c> method is not unique to <c>Enum</c> but implements a method initially defined in the <c>IConvertible</c> interface. And, as luck would have it, this interface is implemented not only by the <c>Enum</c> class, but also by <c>Int32</c>, <c>Int64</c> and all of the other types to which we would like to apply bit- and set-operations. Knowing that, we can hope that the third time's a charm and redesign the API once again, as follows: <code> public static void Include<t>(this T flags, T value) <hl>where T : IConvertible</hl> { ... } public static void Exclude<t>(this T flags, T value) <hl>where T : IConvertible</hl> { ... } public static bool In<t>(this T flags, T value) <hl>where T : IConvertible</hl> { ... } public static void ForEachFlag<t>(this T flags, Action<t> action) <hl>where T : IConvertible</hl> { ... } </code> Now we have methods that apply only to those types that support set- and bit-operations (more or less<fn>). Not only that, but the <i>value</i> and <i>action</i> arguments are once again guaranteed to be statically compliant with the <c>flags</c> arguments. With two of the drawbacks eliminated with one change, we converted the <c>ForEachFlag</c> method to return an <c>IEnumerable<t></c> instead, as follows: <code> public static <hl>IEnumerable<t></hl> GetEnabledFlags<t>(this T flags) where T : IConvertible { ... } </code> The result of this method can now be used with <c>foreach</c> and works with <c>break</c> and <c>continue</c>, as expected. Since the method also now applies to non-enumerated types, we had to re-implement it to return the set of possible bits for the type instead of simply iterating the possible enumerated values returned by <c>Enum.GetValues()</c>.<fn> This version satisfies the first design principles (statically-typed, standard practice, elegant) relatively well, but is still forced to make concessions in implementation and CLS-compliance. It turns out that the <c>IConvertible</c> interface is somehow <i>not</i> CLS-compliant, so I was forced to mark the whole class as non-compliant. On the implementation side, I was avoiding the rather clumsy <c>is</c>-operator by using the <c>IConvertible.GetByteCode()</c> method, but still had a lot of repeated code, as shown below in a sample from the implementation of <c>Is</c>: <code> switch (flags.GetTypeCode()) { case TypeCode.Byte: return (byte)(object)flags == (byte)(object)value; case TypeCode.Int32: return (int)(object)flags == (int)(object)value; ... } </code> Unfortunately, bit-testing is so low-level that there is no (obvious) way to refine this implementation further. In order to compare the two convertible values, the compiler must be told the exact base type to use, which requires an explicit cast for each supported type, as shown above. Luckily, this limitation is in the implementation, which affects the maintainer and not the user of the API. Since implementing the third version of these "BitTools", I've added support for <c>Is</c> (shown partially above), <c>Has</c>, <c>HasOneOf</c> and it looks like the third time might indeed be the charm, as the saying goes. <hr> <ft>The <c>IConvertible</c> interface is actually implemented by other types, to which our bit-operations don't apply at all, like <c>double</c>, <c>bool</c> and so on. The .NET library doesn't provide a more specific interface---like "<c>INumeric</c>" or "<c>IIntegralType</c>"---so we're stuck constraining to <c>IConvertible</c> instead.</ft> <ft>Which, coincidentally, fixed a bug in the first and second versions that had returned all detected enumerated values---including combinations---instead of individual bits. For example, given the type shown below, we only ever expect values <c>One</c> and <c>Two</c>, and never <c>None</c>, <c>OneAndTwo</c> or <c>All</c>. <code> [Flags] enum TestValues { None = 0, One = 1, Two = 2, OneOrTwo = 3, All = 3, } </code> That is, <c>foreach (Two.GetEnabledFlags()) { ... }</c> should return only <c>Two</c> and <c>foreach (All.GetEnabledFlags()) { ... }</c> should return <c>One</c> and <c>Two</c>.</ft>