An analysis of C# language design
This article originally appeared in 2004. In the meantime (it’s 2009 as I write this), a lot has changed and the major complaint—a lack of explicit contracts in C#—will finally be addressed in the next version of C#, 4.0.
A Conversation with Anders Hejlsberg (Artima.com) is a four-part series on the ideas that drove the design of C#. (The link is to the first page of the first section; browse to Artima.com Interviews to see a list of all the sections.)
Virtual vs. Static
I found some points of interest in Part IV, Versioning, Virtual, and Override, which Anders Hjelsberg (designer of both Delphi Pascal and C#) chats about the reasoning behind making methods non-virtual by default.
One answer is performance; he cites method usage in Java: “We can observe that as people write code in Java, they forget to mark their methods final. … Because they’re virtual, they don’t perform as well.” Another cited reason is ‘versioning’, which seems to be another term for formal contracts. Lack of versioning accounts for API instability in most software systems and C#’s approach, or lack thereof, is discussed in more detail later. First, let’s examine the arguments supporting performance as a reason to make methods static by default.
In Java’s case, methods are virtual by necessity; since classes can always be loaded into the namespace and their bytecode interpreted, methods must be virtual in case a descendant is loaded that overrides the method. In C#’s case, assemblies are built with a known ‘universe’ of classes (to borrow a term from the Eiffel world) — there is no need to leave methods virtual in case other classes are loaded.
Leaving methods as statically linked by default puts the burden on the developer. That is, the developer must explicitly decide whether a method should be virtual or not. This prevents you from designing, then optimizing; you are immediately faced with the question: can a descendent legitimately redefine this method?
There are those who claim one can always answer this question. They are the same ones who squirrel variables away in ‘private’ areas, right when you would need it in your descendant most. Private features (data or methods visible only to the current class) limit the number of uses to which a class can be put: if a class has the correct interface, but an unacceptable implementation, a programmer is forced to define an entirely new, non-conforming class or, at the very least, to duplicate code in order to get the desired effect. Inheritance provides ‘is a’ semantics; if a class is another class, why is it valid that it can’t see parts of itself?
Marking methods as ‘final’ (Java) or leaving them non-virtual (C#) and using private fields is akin to saying “I have created an infallible design and the private implementation is beyond reproach”.
This is an especially dangerous attitude to take in library code. Library code is incorporated into other products; clients of the library will often define classes based on library classes. What if some part of a class doesn’t function correctly, or works differently than expected, or desired? Can a client class alter the behavior of the library class enough? Or does the client need to alter library source code, or, worse yet, do without functionality, because the library class doesn’t allow that kind of modification? Is a client forced to simply rewrite the entire class in a non-conforming class to get functionality that the library almost provides?
To this you may say “there are certain things you should not be able to do with a class”. Fair enough, a good design imbues every class with a purpose and provides it with an API that fulfills it. However, what does it mean to say “you should not be able to do” something with a class? Does your class explicitly define an intent? The intent of a class is, at best, stored explicitly in external documentation. At worst, it is defined implicitly in the API; the secret of a class’s purpose is stored in the visibility of features (private/protected/public) and in the redefinablity of methods. Even if the documentation is defined in the class itself, it is stored in comments that are beyond the reach of the compiler. The purpose of a class can’t be verified or enforced at compile-time or run-time.
We come once again to the notion of contracts. Contracts to help the compiler, to help the developer, and to help the end-user or client. Contracts to make documentation simpler and clearer and explicit rather than implicit. All software enforces contracts; almost no programming language provides mechanisms for making these contracts explicit — C# is no exception.
Easy Way Out
Language designers today have no imagination, creating clone after clone after clone. There’s a reason C# looks so much like Java: given the choice between making writing software in the language easy and writing a compiler for the language easy, they go for an easy compiler every time. Neither of these languages lets a programmer express a design without immediately worrying about implementation. Anders Hjelsberg explains why C# took the easy way out:
“The academic school of thought says, “Everything should be virtual, because I might want to override it someday.” The pragmatic school of thought, which comes from building real applications that run in the real world, says, “We’ve got to be real careful about what we make virtual.””
Now it’s clear: whiners who are sick of working for their compilers are “academic, … [not] pragmatic” and don’t know about the “real world”. That’s a pretty specious argument, since he hasn’t backed up his assertion with any evidence (other than the performance argument, which, while perfectly valid, is still addressable on the compiler side, as explained below).
Consider the question of whether to make methods static or dynamic by default. A compiler-friendly language makes everything static, forcing the programmer to explicitly mark redefinable methods with a keyword. A nicer language would make all methods dynamic. If a descendant redefines a method, it is compiled as dynamic. All methods in program (the universe of classes available at compile time) that are not redefined can safely be statically compiled.
A corollary to this is how a language handles function inlining. Inlining replaces a function call with the body of the function itself to increase performance for smaller functions. C and C++ still have an explicit ‘inline’ keyword. C# thankfully does not and has rightly chosen to put the burden of choosing which functions to inline on a compiler that examines the heuristics of the entire program. Since it’s a newer compiler, there are still a few kinks to work out, but C# is headed in the right direction.
Another issue affecting a language’s usability is its redefinition policy. When is a method considered a redefinition of another method? C++ has the absolute worst policy in this respect, assuming a redefinition as soon as a method with the same signature in an ancestor is marked as ‘virtual’. If the signature of the ‘virtual’ method or the redefinition changes, it is simply assumed to no longer be a redefinition. What fun!
C# has thankfully adopted the policy of explicit redefinition, forcing a method with the same signature to be marked as an ‘override’. The method being redefined must, of course, be marked as ‘virtual’ when defined (as explained above).
These are the language features that lighten the load for a programmer. Garbage collection is another such feature that C# got right. Given garbage collection, a developer can freely design structures without immediately considering which object is responsible for what. The accompanying complications of ‘has’ and ‘uses’ falls away in almost all cases and a design can be much more easily mapped to the language without accommodating memory management so early in the process. It is still possible to have memory problems with garbage collection (a dangling pointer no longer causes a crash, but instead causes inconsistent logic and excessive memory usage). Nevertheless, languages that provide garbage collection allow elegant designs that require a lot of scaffolding code in non-memory-managed languages.
Back to versioning
Anders goes on at length about the problem of ‘versioning’:
“When we make something virtual … we’re making an awful lot of promises about how it evolves in the future. … When we publish a virtual method in an API, we not only promise that when you call this method, × and y will happen. We also promise that when you override this method, we will call it in this particular sequence with regard to these other ones and the state will be in this and that invariant.”
What promises? C# has no contracting mechanism, so discussion of promises is limited to non-functional documentation and perhaps the method name, which implies what it does. Though he mentions an “invariant”, which is presumably the class invariant, there is no mechanism for specifying one: how can you prove that code broke an implicit contract?
He continues talking about contracts, noting that “[v]irtual has two sides to it: the incoming and the outgoing”. He talks all around the notion of contracts and documentation and the pitfalls associated with trusting developers to write documentation that shows “what you’re supposed to do when you override a virtual method”. Documentation should include information about “[w]hat are the invariants before you’re called? What should be true after?”. At this point, you’re screaming with frustration that a man so seemingly knowledgeable of Design-by-Contract decided to leave everything implicit in his language. He acknowledges the problem of contracting, then, in the same breath, leaves the entirety of the solution up to the developer. Not only does C# have no way of specifying these obviously important and troublesome contracts, its designer has invented whole new terms (ingoing/outgoing instead of precondition/postcondition) in a seemingly willful ignorance of existing Design-by-Contract theory.
As justification for this somewhat fuzzy ‘versioning’ concept he’s espousing, he mentions that “[w]henever [Java or C++ languages] introduce a new method in a base class, if someone in a derived class had a method of that same name, that method is now an override” Honestly, that has nothing to with contracts or making sure redefinitions enforce the same contracts; that’s simply about explicit redefinition rather than implicit signature-matching. It’s a trivial language feature that C# got exactly right, but, lacking contracts of any sort, how is C# any better at managing valid redefinitions than Java or C++? If a method is marked as ‘virtual’ in C#, a redefinition can do whatever it likes, including nothing.
The ‘versioning’ problem is not solved; it is simply no longer applicable to all methods because many more methods are static. That’s not an advancement; that’s removing functionality in order to prevent programmers from breaking things. Putting the burden on the developer limits the expressiveness of the language and constrains the power of solutions you can build in that language. Just because you might break a window with a hammer doesn’t mean it’s better to build a house without one.
Given a proper contracting mechanism in the language, “ingoing and outgoing” contracts could be explicitly specified in the language. A redefinition of such a method would inherit the ancestor method’s contracts and be forced to support them. A method redefinition is free to weaken the precondition, but must support or strengthen the inherited postcondition. In addition, contracts at the class scope should be defined in a class invariant, which is checked before and after each method call, to ensure that the class is in a stable state before executing code on it.
There is a Design by Contract Framework (The Code Project) for C# available, but it’s only a library extension, and like all non-language implementations of Design-by-Contract, is only a pale imitation of the power afforded by a language-level solution. It’s a real shame to see a language designer who knows so much about the pitfalls of programming and does so little to help the users of his language avoid them.
It’s not the first time this has happened and it won’t be the last. So many programmers are sticking with C++ because it has at least some form of generics (C++ templates are not truly generic, but are nonetheless extremely useful). Java, a language whose programs are littered with typecasts because of a lack of generics, plans to finally introduce generics after ten years. C# also skipped generics in the first version, and introduces them in the next revision, 2.0. One can only wonder when and if either will ever support contracting or whether we have to sit back and wait another ten years for a new language.
If you can’t wait that long and want a language that has real generics, allows no private data, compiles non-redefined methods as static, has automatic inlining, explicit redefinition, garbage collection and incorporates a rich contract mechanism, try Eiffel.