This page shows the source for this entry, with WebCore formatting language tags and attributes highlighted.

Title

Building pseudo-DSLs with C# 3.5

Description

DSL is a buzzword that's been around for a while and it stands for [D]omain-[Specific] [L]anguage. That is, some tasks or "domains" are better described with their own language rather than using the same language for everything. This gives a name to what is actually already a standard practice: every time a program assumes a particular format for an input string (e.g. CSV or configuration files), it is using a DSL. On the surface, it's extremely logical to use a syntax and semantics most appropriate to the task at hand; it would be hard to argue with that. However, that's assuming that there are no hidden downsides. <h>DSL Drawbacks</h> And the downsides are not inconsequential. As an example, let's look at the DSL "Linq", which arrived with C# 3.5. What's the problem with Linq? Well, nothing, actually, but only because a lot of work went into avoiding the drawbacks of DSLs. Linq was written by Microsoft and they shipped it at the same time as they shipped a new IDE---Visual Studio 2008---which basically upgraded Visual Studio 2005 in order to support Linq. All of the tools to which .NET developers have become accustomed worked seamlessly with Linq. However, it took a little while before JetBrains released a version of ReSharper that understood Linq...and that right there is the nub of the problem. Developer tools need to understand a DSL or you might as well just write it in Notepad.<fn> The bar for integration into an IDE is quite high: developers expect a lot these days, including: <ul> The DSL must include a useful parser that pinpoints problems exactly. The DSL syntax must be clear and must support everything a developer may possibly want to do with it.<fn> The DSL must support code-completion. ReSharper should also work with the DSL, if possible. And so on... </ul> What sounds, on the surface, like a slam-dunk of an idea, suddenly sounds like a helluva lot more work than just defining a little language<fn>. That's why Encodo decided early on to just use C# for everything in its <a href="http://encodo.ch/en/quino.php">Quino</a> framework, wherever possible. The main part of a Quino application is its metadata, or the model definition. However, instead of coming up with a language for defining the metadata, Encodo lets the developer define the metadata using a .NET-API, which gives that developer the full power of code-completion, ReSharper and whatever other goodies they may have installed to help them get their jobs done. <h>Designing a C#-based DSL</h> Deciding to use C# for APIs doesn't mean, however, that your job is done quickly: you still have to design an API that not only works, but is intuitive enough to let developers use it with as little error and confusion as possible. I recently extended the API for building metadata to include being able to group other metadata into hierarchies called "layouts". Though the API is implementation-agnostic, its primary use will initially be to determine how the properties of a meta-class are laid out in a form. That is, most applications will want to have more control over the appearance than simply displaying the properties of a meta-class in a form from first-to-last, one to a line. In the metadata itself, a layout is a group of other elements; an element can be a meta-property or another group. A group can have a caption. Essentially, it should look like this when displayed (groups are surrounded by []; elements with <>): <code><macro convert="-punctuation"> [MainTab] ----------------------------------- | <company> | [MainFieldSet] | -------------------------------- | | <contact> | | [ <firstname> <lastname> ] | | <picture> | | <birthdate> | -------------------------------- | [ <isemployee> <active> ] ----------------------------------- </code><macro convert="+punctuation"> From the example above, we can extract the following requirements: <ol> Groups can be nested. Groups can have captions, but a caption is not required. An element can be an anonymous group, a named group or an individual metadata element. </ol> <h>Design Considerations</h> One way of constructing this in a traditional programming language like C# is to create a new group when needed, using a constructor with a caption or not, as needed. However, I also wanted to make a DSL, which has as little <a href="http://www.answers.com/main/ntquery?s=cruft">cruft</a> as possible; that is, I wanted to avoid redundant parameters and unnecessary constructors. I also wanted to avoid forcing the developer to provide direct references to meta-property elements where it would be more comfortable to just use the name of the property instead. To that end, I decided to avoid making the developer create or necessarily provide the actual destination objects (i.e. the groups and elements); instead, I would build a parallel set of throwaway objects that the developer would either implicitly or explicitly create. The back-end could then use those objects to resolve references to elements and create the target object-graph with proper error-checking and so on. This approach also avoids getting the target metadata "dirty" with properties or methods that are only needed during this particular style of construction. <h>Defining the Goal</h> I started by writing some code in C# that I thought was both concise enough and offered visual hints to indicate what was being built. That is, I used whitespace to indicate grouping of elements, exactly as in the diagram from the requirements above. Here's a simple example, with very little grouping: <code> builder.AddLayout( personClass, "Basic", Person.Relations.Contact, new LayoutGroup(Person.Fields.FirstName, Person.Fields.LastName), Person.Fields.Picture, Person.Fields.Birthdate new LayoutGroup(Person.Fields.IsEmployee, Person.Fields.Active) ); </code> The code above creates a new "layout" for the class <c>personClass</c> named "Details". That takes care of the first two parameters; the much larger final parameter is an open array of elements. These are primarily the names of properties to include from <c>personClass</c> (or they could also be the properties themselves). In order to indicate that two properties are on the same line, the developer must group them using a <c>LayoutGroup</c> object. Here's a more complex sample, with nested groups (this one corresponds to the original requirement from above): <code> builder.AddLayout( personClass, "Details", new LayoutGroup("MainTab", Person.Relations.Company, new LayoutGroup("MainFieldSet", Person.Relations.Contact, new LayoutGroup(Person.Fields.FirstName, Person.Fields.LastName), Person.Fields.Picture, Person.Fields.Birthdate ), new LayoutGroup(Person.Fields.IsEmployee, Person.Fields.Active) ) ); </code> In this example, we see that the developer can also use a <c>LayoutGroup</c> to attach a caption to a group of other items, but that otherwise everything pretty much stays the same as in the simpler example. Finally, a developer should also be able to refer to other layout definitions in order to avoid repeating code (adhering to the D.R.Y. principle<fn>). Here's the previous example redefined using a reference to another layout (highlighted): <code> builder.AddLayout( personClass, <hl>"Basic"</hl>, Person.Relations.Contact, new LayoutGroup(Person.Fields.FirstName, Person.Fields.LastName), Person.Fields.Picture, Person.Fields.Birthdate ); builder.AddLayout( personClass, "Details", new LayoutGroup("MainTab", Person.Relations.Company, new LayoutGroup("MainFieldSet", <hl>new LayoutReference("Basic");</hl> )), new LayoutItems(Person.Fields.IsEmployee, Person.Fields.Active) )) ); </code> <h>Implementation</h> Now that I had an API I thought was good enough to use, I had to figure out how to get the C# compiler to not only accept it, but also to give me the opportunity to build the actual target metadata I wanted. The trick ended up being to define a few objects for the different possibilities---groups, elements, references, etc.---and make them implicitly convert to a basic <c>LayoutItem</c>. Using implicit operators allowed me to even convert strings to meta-property references, like this: <code> public static implicit operator LayoutItem(string identifier) { return new LayoutItem(identifier); } </code> Each of these items has a reference to each possible type of data and a flag to indicate which of these data are valid and can be extracted from this item. The builder receives a list of such items, each of which may have a sub-list of other items. Processing the list is now as simple as iterating them with <c>foreach</c>, something like this: <code> private void ProcessItems(IMetaGroup group, IMetaClass metaClass, LayoutItem[] items) { foreach (var item in items) { if (!String.IsNullOrEmpty(item.Identifier)) { var element = metaClass.Properties[item.Identifier]; group.Elements.Add(element); } else if (item.Items != null) { var subGroup = CreateNextSubGroup(group); group.Elements.Add(subGroup); ProcessItems(subGroup, metaClass, item.Items.Items); } else if (item.Group != null) { ... } else (...) } } </code> If the item was created from a string, the builder looks up the property to which it refers in the meta-class and add that to the current group. If the item corresponds to an anonymous group, the builder creates a new group and calls adds the items to it recursively. Here we can see how this solution spares the application developer the work of looking up each and every referenced property in application code. Instead, the developer's code stays clean and short. Naturally, my solution has many more cases but the sample above should suffice to show how the full solution works. <h>Cleaning it up</h> The story didn't just end there, as there are limitations to forcing C# to doing everything we'd like. The primary problem came from distinguishing between the string that is the caption from strings that are references to meta-properties. To avoid this problem, I was forced to introduce a <c>LayoutItems</c> class for anonymous groups and reserve the <c>LayoutGroup</c> for groups with captions. I was not able to get the implementation to support my requirements exactly as I'd designed them, but it ended up being pretty close. Below is the first example from the requirements, but changed to accommodate the final API; all changes are highlighted. <code> builder.AddLayout( personClass, "Details", new LayoutGroup("MainTab", <hl>new LayoutItems</hl>( Person.Relations.Company, new LayoutGroup("MainFieldSet", <hl>new LayoutItems</hl>( Person.Relations.Contact, new <hl>LayoutItems</hl>(Person.Fields.FirstName, Person.Fields.LastName), Person.Fields.Picture, Person.Fields.Birthdate )), new <hl>LayoutItems</hl>(Person.Fields.IsEmployee, Person.Fields.Active) )) ); </code> All in all, I'm pretty happy with how things turned out: the API is clear enough that the developer should be able to both visually debug the layouts and easily adjust them to accommodate changes. For example, it's quite obvious how to add a new property to a group, move a property to another line or put several properties on the same line. Defining this pseudo-DSL in C# lets the developer use code-completion, popup documentation and the full power of ReSharper and frees me from having to either write or maintain a parser or development tools for a DSL. <hr> <ft>On a side note, Encodo recently looked into the <a href="http://sparkviewengine.com/">Spark View Engine</a> for .NET MVC. Though we decided not to use it because we don't really need it yet, we were also concerned that it has only nascent support for code-completion and ReSharper in its view-definition language.</ft> <ft>Even Linq has its limitations, of course, notably when using together with Linq-to-Entities in the Entity Framework. One obvious limitation in the first version is that "Contains" or "In" are not directly supported, requiring the developer to revert to yet another DSL, ESQL (Entity-SQL).</ft> <ft>Before getting the moniker "DSL", the literature referred to such languages as "little languages".</ft> <ft>D.R.Y. = [D]on't [R]epeat [Y]ourself.</ft>